[jira] [Commented] (HADOOP-10614) CBZip2InputStream is not threadsafe

2014-05-16 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000676#comment-14000676
 ] 

Xiangrui Meng commented on HADOOP-10614:


Tested on Spark master and Hadoop 1.2.1.

> CBZip2InputStream is not threadsafe
> ---
>
> Key: HADOOP-10614
> URL: https://issues.apache.org/jira/browse/HADOOP-10614
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 2.2.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Attachments: bzip2.diff
>
>
> Hadoop uses CBZip2InputStream to decode bzip2 files. However, the 
> implementation is not threadsafe. This is not a really problem for Hadoop 
> MapReduce because Hadoop runs each task in a separate JVM. But for other 
> libraries that utilize multithreading and use Hadoop's InputFormat, e.g., 
> Spark, it will cause exceptions like the following:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 6 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.recvDecodingTables(CBZip2InputStream.java:729)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:795)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
>  
> org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:428)
>  java.io.InputStream.read(InputStream.java:101) 
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) 
> org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176) 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43) 
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198) 
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181) 
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:35)
>  scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 
> org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1000) 
> org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) 
> org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1077)
>  
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1077)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) 
> org.apache.spark.scheduler.Task.run(Task.scala:51) 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  java.lang.Thread.run(Thread.java:724)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (HADOOP-10614) CBZip2InputStream is not threadsafe

2014-05-16 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza moved MAPREDUCE-5893 to HADOOP-10614:


  Component/s: (was: mrv1)
   (was: mrv2)
 Target Version/s:   (was: 1.2.2, 2.2.1)
Affects Version/s: (was: 2.2.0)
   (was: 1.2.1)
   1.2.1
   2.2.0
  Key: HADOOP-10614  (was: MAPREDUCE-5893)
  Project: Hadoop Common  (was: Hadoop Map/Reduce)

> CBZip2InputStream is not threadsafe
> ---
>
> Key: HADOOP-10614
> URL: https://issues.apache.org/jira/browse/HADOOP-10614
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.2.0, 1.2.1
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Attachments: bzip2.diff
>
>
> Hadoop uses CBZip2InputStream to decode bzip2 files. However, the 
> implementation is not threadsafe. This is not a really problem for Hadoop 
> MapReduce because Hadoop runs each task in a separate JVM. But for other 
> libraries that utilize multithreading and use Hadoop's InputFormat, e.g., 
> Spark, it will cause exceptions like the following:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 6 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.recvDecodingTables(CBZip2InputStream.java:729)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:795)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
>  
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
>  
> org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:428)
>  java.io.InputStream.read(InputStream.java:101) 
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) 
> org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176) 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43) 
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198) 
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181) 
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:35)
>  scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 
> org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1000) 
> org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) 
> org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1077)
>  
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1077)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) 
> org.apache.spark.scheduler.Task.run(Task.scala:51) 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  java.lang.Thread.run(Thread.java:724)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to support multi directories

2014-05-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HADOOP-10610:


Summary: Upgrade S3n s3.fs.buffer.dir to support multi directories  (was: 
Upgrade S3n s3.fs.buffer.dir to suppoer multi directories)

> Upgrade S3n s3.fs.buffer.dir to support multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Attachment: 10607-2.patch

Removed java 7 reference to fix build failure.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol

2014-05-16 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000254#comment-14000254
 ] 

Arpit Agarwal commented on HADOOP-10376:


Hi Chris, I see nothing objectionable in the proposal but it would be good to 
have more detail on the registry/dispatch mechanism.

Alternatively you could just a patch.



> Refactor refresh*Protocols into a single generic refreshConfigProtocol
> --
>
> Key: HADOOP-10376
> URL: https://issues.apache.org/jira/browse/HADOOP-10376
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Li
>Assignee: Chris Li
>Priority: Minor
> Attachments: RefreshFrameworkProposal.pdf
>
>
> See https://issues.apache.org/jira/browse/HADOOP-10285
> There are starting to be too many refresh*Protocols We can refactor them to 
> use a single protocol with a variable payload to choose what to do.
> Thereafter, we can return an indication of success or failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10489) UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000603#comment-14000603
 ] 

Hadoop QA commented on HADOOP-10489:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644943/HADOOP-10489.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3949//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3949//console

This message is automatically generated.

> UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to 
> ConcurrentModificationException
> 
>
> Key: HADOOP-10489
> URL: https://issues.apache.org/jira/browse/HADOOP-10489
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jing Zhao
>Assignee: Robert Kanter
> Attachments: HADOOP-10489.patch
>
>
> Currently UserGroupInformation#getTokens and UserGroupInformation#addToken 
> uses UGI's monitor to protect the iteration and modification of 
> Credentials#tokenMap. Per 
> [discussion|https://issues.apache.org/jira/browse/HADOOP-10475?focusedCommentId=13965851&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965851]
>  in HADOOP-10475, this can still lead to ConcurrentModificationException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Status: Patch Available  (was: Open)

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999351#comment-13999351
 ] 

Brandon Li commented on HADOOP-10612:
-

In IdUserGroup.java: 
{noformat}
  synchronized private boolean isExpired() {
return lastUpdateTime - System.currentTimeMillis() > timeout;
  }
{noformat}
should be :
{noformat}
  synchronized private boolean isExpired() {
return  System.currentTimeMillis() - lastUpdateTime > timeout;
  }
{noformat}

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not update 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10603:
--

Attachment: HADOOP-10603.1.patch

[~yi.a.liu] Here are some comments from [~tucu00] and me.

*Encryptor and Decryptor interfaces and implementation*

How about a minimal API, something like (we've attached a patch with these 
classes):

{code}
public abstract class Crypto {

  public interface Encryptor {
public void encrypt(long pos, ByteBuffer in, ByteBuffer out)
  }

  public interface Decryptor {
public void decrypt(long pos, ByteBuffer in, ByteBuffer out);
  }

  public static Crypto getInstance(Configuration conf) {
...
  }

  public abstract Encryptor getEncryptor(byte[] key, byte[] iv)
  public abstract Decryptor getDecryptor(byte[] key, byte[] iv)
}
{code}

That way all the cipher initialization/reinitialization is
encapsulated within the implementation.

The Encryptor and Decryptor implementations encrypt/decrypt using the
byte buffers parameters and the absolute position (or in the case of
streams, the stream position). Specifying it like this lets the IV
counter portion (in the case of AES/CTR) be easily calculated inside
of the encryptor/decryptor.

When you're dealing with padding during encrypting/decrypting, the
offset within an AES block (16 bytes) could be done by simply doing a
cipher.update() on a NULL-SINK before passing the real data to
encrypt/decrypt.

The Encryptor and Decryptor work with byte[]. Working with ByteBuffers
(direct) would allow avoiding some array copies if doing JNI to some
native crypto library (i.e. PCKS#11 or OpenSSL).

The AESCTREncryptor/AESCTRDecryptor do update() instead of doFinal()
on the cipher. This could lead to incomplete decryption of the buffer
in some cipher implementations since the contract is that a doFinal()
must be done.

The ivCalc() could be bubbled up to the corresponding Crypto factory
implementation and have a single implementation. The '& 0xFF' is not
needed and the logic could be made a bit more readable like this:

{code}
private static final int CTR_OFFSET = 8;
...
iv[CTR_OFFSET + 0] = (byte) (l >>> 56);
iv[CTR_OFFSET + 1] = (byte) (l >>> 48);
iv[CTR_OFFSET + 2] = (byte) (l >>> 40);
iv[CTR_OFFSET + 3] = (byte) (l >>> 32);
iv[CTR_OFFSET + 4] = (byte) (l >>> 24);
iv[CTR_OFFSET + 5] = (byte) (l >>> 16);
iv[CTR_OFFSET + 6] = (byte) (l >>> 8);
iv[CTR_OFFSET + 7] = (byte) (l);
{code}

The current implementation assumes the counter portion always starts
with zero, right? This may present some interoperability issues if
somebody is using a different cipher stream that takes an arbitrary IV
with the counter portion not set to zero.

The attached crypto.patch file shows an impl with the proposed changes
and makes things get a bit simpler. Then, for the stream
implementations it is a simple integration via a single method
(encrypt or decrypt).

*Stream Implementations:*

The crypto streams are implemented as a class (EncyrptorStream,
subclass (FSDecryptorStream) and a decorator
(CryptographicFSDataInputStream), with an analogous hierarchy in the
output stream path. Why not just have single decorator class for
(each) input/output streams with all the encryption/decryption logic?

The DecryptorStream seems to decrypt based on the ready buffer size
instead of on the DecryptorStream buffer size. It should do it on
stream buffer size.

The FSDecryptorStream seek() and skip() methods don't seem to
propagate the new absolute position to the decryptor (to calculate the
IV correctly). Are we missing something obvious?

I think the write() & read() logic during de/encryption could be a bit
simpler, i.e.:

For encryption:
{code}
  void write(byte[] b, int off, int len) {
IF preEncryptionBuffer has enough space for b[off..len] THEN
  copy b[off..len] to preEncryptionBuffer
ELSE
  WHILE still data in b[]
copy as much as possible from b to preEncryptionBuffer
IF preEncryptionBuffer is full THEN
  encrypt()
FI
FI
  }  
{code}

For decryption:

{code}
  int read(byte[] b, int off, int len)  {
IF decryptedBuffer has data THEN
  drain decryptionBuffer to b[] up to MIN(decryptedBuffer.avail, len)
ELSE
  read encrypted stream into encryptedBuffer.
  decrypt()
  drain decryptionBuffer to b[] up to MIN(decryptedBuffer.avail, len)
fi
return bytes read
  }
{code}

With this logic for write/read we are ensuring that we use the
specified buffer sizes for encryption/decryption. This gets the boost
of speed that bigger buffers give us since there are fewer cipher
init() calls and things like memory prefetch can be maximized.


> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issue

[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Attachment: 10607-4.patch

Fixed javadoc warning

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998893#comment-13998893
 ] 

Alejandro Abdelnur edited comment on HADOOP-10607 at 5/15/14 6:26 PM:
--

I see the point of lot of baggage  in KeyStore that is not needed.

Instead adding a new interface, have you considered doing it in the KeyProvider 
itself? After all the credentials are a key. Then the KMS could easily add REST 
support for that too.


was (Author: tucu00):
I see the point of lot of luggage in KeyStore that is not needed.

Instead adding a new interface, have you considered doing it in the KeyProvider 
itself? After all the credentials are a key. Then the KMS could easily add REST 
support for that too.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000380#comment-14000380
 ] 

Mike Liddell commented on HADOOP-9629:
--

Added a document with information for developers / code-reviewers.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Status: Open  (was: Patch Available)

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000496#comment-14000496
 ] 

Brandon Li commented on HADOOP-10612:
-

It's not straightforward to add a meaningful unit test. The existing test 
TestIdUserGroup#testUserUpdateSetting() can cover the configuration setting and 
I've manually tested the table update on a local machine.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10489) UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException

2014-05-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000585#comment-14000585
 ] 

Aaron T. Myers commented on HADOOP-10489:
-

The patch looks good to me. +1 pending Jenkins (which I've just kicked.)

> UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to 
> ConcurrentModificationException
> 
>
> Key: HADOOP-10489
> URL: https://issues.apache.org/jira/browse/HADOOP-10489
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jing Zhao
>Assignee: Robert Kanter
> Attachments: HADOOP-10489.patch
>
>
> Currently UserGroupInformation#getTokens and UserGroupInformation#addToken 
> uses UGI's monitor to protect the iteration and modification of 
> Credentials#tokenMap. Per 
> [discussion|https://issues.apache.org/jira/browse/HADOOP-10475?focusedCommentId=13965851&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965851]
>  in HADOOP-10475, this can still lead to ConcurrentModificationException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000420#comment-14000420
 ] 

Hadoop QA commented on HADOOP-10607:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645221/10607-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3947//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3947//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3947//console

This message is automatically generated.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10609) .gitignore should ignore .orig and .rej files

2014-05-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated HADOOP-10609:
--

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Sandy. Just committed this to trunk and branch-2.

> .gitignore should ignore .orig and .rej files
> -
>
> Key: HADOOP-10609
> URL: https://issues.apache.org/jira/browse/HADOOP-10609
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.5.0
>
> Attachments: hadoop-10609.patch
>
>
> .gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Attachment: 10607-3.patch

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999289#comment-13999289
 ] 

Hadoop QA commented on HADOOP-10610:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644622/HDFS-6383.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3937//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3937//console

This message is automatically generated.

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10591) Compression codecs must used pooled direct buffers or deallocate direct buffers when stream is closed

2014-05-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998537#comment-13998537
 ] 

Colin Patrick McCabe commented on HADOOP-10591:
---

Thanks, Gopal.  I agree that this is a pre-existing issue, definitely not 
introduced by HADOOP-10047.  And, in fact, that JIRA should improve the 
situation in many cases by eliminating the need for the {{Decompressor}} to 
allocate its own direct buffer.

semi-related: One thing that I notice in the constructor for 
{{ZlibDirectDecompressor}} is that it invokes the superclass constructor 
({{ZlibDecompressor}}) with {{directBufferSize = 0}}, causing us to call 
{{allocateDirect}} with a size of 0.  I do wonder what this actually does... I 
didn't manage to find any documentation for this case (maybe I missed it?).

> Compression codecs must used pooled direct buffers or deallocate direct 
> buffers when stream is closed
> -
>
> Key: HADOOP-10591
> URL: https://issues.apache.org/jira/browse/HADOOP-10591
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Colin Patrick McCabe
>
> Currently direct buffers allocated by compression codecs like Gzip (which 
> allocates 2 direct buffers per instance) are not deallocated when the stream 
> is closed. Eventually for long running processes which create a huge number 
> of files, these direct buffers are left hanging till a full gc, which may or 
> may not happen in a reasonable amount of time - especially if the process 
> does not use a whole lot of heap.
> Either these buffers should be pooled or they should be deallocated when the 
> stream is closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000554#comment-14000554
 ] 

Andrew Wang commented on HADOOP-10610:
--

Hey Ted,

Note that we already have Configuration.getTrimmedStrings, no need to split 
yourself, and thus no need to write unit tests for splitting logic :)

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000518#comment-14000518
 ] 

Hadoop QA commented on HADOOP-9629:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645337/HADOOP-9629%20-%20Azure%20Filesystem%20-%20Information%20for%20developers.pdf
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3948//console

This message is automatically generated.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10564) Add username to native RPCv9 client

2014-05-16 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HADOOP-10564:
---

Attachment: HADOOP-10564-pnative.006.patch

patch lgtm +1
update the patch one line to match latest branch HEAD. 

> Add username to native RPCv9 client
> ---
>
> Key: HADOOP-10564
> URL: https://issues.apache.org/jira/browse/HADOOP-10564
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: HADOOP-10388
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: HADOOP-10388
>
> Attachments: HADOOP-10564-pnative.002.patch, 
> HADOOP-10564-pnative.003.patch, HADOOP-10564-pnative.004.patch, 
> HADOOP-10564-pnative.005.patch, HADOOP-10564-pnative.006.patch, 
> HADOOP-10564.001.patch
>
>
> Add the ability for the native RPCv9 client to set a username when initiating 
> a connection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999348#comment-13999348
 ] 

Hadoop QA commented on HADOOP-10586:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645004/HADOOP-10586.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3940//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3940//console

This message is automatically generated.

> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch, HADOOP-10586.5.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999578#comment-13999578
 ] 

Hadoop QA commented on HADOOP-10612:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645132/HADOOP-10612.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3942//console

This message is automatically generated.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999474#comment-13999474
 ] 

Yi Liu commented on HADOOP-10603:
-

Thanks [~clamb] and [~tucu00] , good comments. Actually I have also refined 
crypto streams using similar interfaces like {{DirectDecompressor}}, I will 
merge your proposed code to mine, and update later today. I will response you 
later today, too.

> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: 3.0.0
>
> Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999484#comment-13999484
 ] 

Aaron T. Myers commented on HADOOP-10612:
-

Hey folks, why not just add a config deprecation so that the change can be done 
compatibly?

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.docx

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-05-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998689#comment-13998689
 ] 

Steve Loughran commented on HADOOP-10597:
-

this could be useful for clients of other services too, where the back-off 
message could trigger redirect.

maybe the response could include some hints of 
# where else to go
# backoff parameter hints: sleep time, growth, jitter. This gives the NN more 
control of the clients, lets you spread the jitter, and grow the backoff time 
as load increases -so reducing socket connection load.

> Evaluate if we can have RPC client back off when server is under heavy load
> ---
>
> Key: HADOOP-10597
> URL: https://issues.apache.org/jira/browse/HADOOP-10597
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Ming Ma
>
> Currently if an application hits NN too hard, RPC requests be in blocking 
> state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
> throw some well defined exception back to the client based on certain 
> policies when it is under heavy load; client will understand such exception 
> and do exponential back off, as another implementation of 
> RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998569#comment-13998569
 ] 

Steve Loughran commented on HADOOP-10610:
-

The overall concept makes sense

# {{LocalDirAllocator}} contains the logic to worry about file capacity, 
writeability &c. The more disks you list, the more likely a disk is to fail, 
the more you need that code. This patch currently just bails out if one dest 
dir isn't there, even if others may be present.
# This would be an ideal time to move {{"fs.s3.buffer.dir"}} from an inline 
string to a constant where it can be referred to in hadoop and external code

Recommended tests 
# a test with > 1 directory in the args
# basic handling of "erroneous" inputs. e.g trailing commas in options
# bad directories in input paths

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10609) .gitignore should ignore .orig and .rej files

2014-05-16 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998336#comment-13998336
 ] 

Sandy Ryza commented on HADOOP-10609:
-

+1

> .gitignore should ignore .orig and .rej files
> -
>
> Key: HADOOP-10609
> URL: https://issues.apache.org/jira/browse/HADOOP-10609
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: hadoop-10609.patch
>
>
> .gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Status: Patch Available  (was: Open)

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10474) Move o.a.h.record to hadoop-streaming

2014-05-16 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000474#comment-14000474
 ] 

Sandy Ryza commented on HADOOP-10474:
-

+1 to reverting this change.  Keeping a few rarely used classes around is a lot 
less of a headache than forcing downstream code to deal with a Public/Stable 
API going away.

> Move o.a.h.record to hadoop-streaming
> -
>
> Key: HADOOP-10474
> URL: https://issues.apache.org/jira/browse/HADOOP-10474
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HADOOP-10474.000.patch, HADOOP-10474.001.patch, 
> HADOOP-10474.002.patch
>
>
> The classes in o.a.h.record have been deprecated for more than a year and a 
> half. They should be removed. As the first step, the jira moves all these 
> classes into the hadoop-streaming project, which is the only user of these 
> classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10585) Retry polices ignore interrupted exceptions

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999010#comment-13999010
 ] 

Hudson commented on HADOOP-10585:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HADOOP-10585. Retry polices ignore interrupted exceptions (Daryn Sharp via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594267)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java


> Retry polices ignore interrupted exceptions
> ---
>
> Key: HADOOP-10585
> URL: https://issues.apache.org/jira/browse/HADOOP-10585
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HADOOP-10585.patch
>
>
> Retry polices should not use {{ThreadUtil.sleepAtLeastIgnoreInterrupts}}.  
> This prevents {{FsShell}} commands from being aborted during retries.  It 
> also causes orphaned webhdfs DN DFSClients to keep running after the webhdfs 
> client closes the connection.  Jetty goes into a loop constantly sending 
> interrupts to the handler thread.  Webhdfs retries cause multiple nodes to 
> have these orphaned clients.  The DN cannot shutdown until orphaned clients 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.1.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999592#comment-13999592
 ] 

Hadoop QA commented on HADOOP-9629:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645104/HADOOP-9629.trunk.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-azure hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3945//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3945//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3945//console

This message is automatically generated.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Attachment: HADOOP-10612.patch

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not update 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Status: Patch Available  (was: Open)

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999390#comment-13999390
 ] 

Chris Nauroth commented on HADOOP-10612:


Hi, Brandon.  The change looks good.  Two questions:

# Is changing this config property name backwards-incompatible with existing 
configs that are already deployed?
# Do you think it's worthwhile to switch to {{Time#monotonicNow}} so that this 
isn't subject to system clock bugs?  (i.e. Someone resets the clock to a time 
in the past, and then updates don't happen for a long time.)

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10609) .gitignore should ignore .orig and .rej files

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999311#comment-13999311
 ] 

Hadoop QA commented on HADOOP-10609:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644862/hadoop-10609.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3939//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3939//console

This message is automatically generated.

> .gitignore should ignore .orig and .rej files
> -
>
> Key: HADOOP-10609
> URL: https://issues.apache.org/jira/browse/HADOOP-10609
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: hadoop-10609.patch
>
>
> .gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10564) Add username to native RPCv9 client

2014-05-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-10564:
--

Fix Version/s: HADOOP-10388
   Status: Patch Available  (was: In Progress)

committed.  thanks, guys

> Add username to native RPCv9 client
> ---
>
> Key: HADOOP-10564
> URL: https://issues.apache.org/jira/browse/HADOOP-10564
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: HADOOP-10388
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: HADOOP-10388
>
> Attachments: HADOOP-10564-pnative.002.patch, 
> HADOOP-10564-pnative.003.patch, HADOOP-10564-pnative.004.patch, 
> HADOOP-10564-pnative.005.patch, HADOOP-10564-pnative.006.patch, 
> HADOOP-10564.001.patch
>
>
> Add the ability for the native RPCv9 client to set a username when initiating 
> a connection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number"

2014-05-16 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10611:
---

 Summary: KeyVersion name should not be assumed to be the 'key name 
@ the version number"
 Key: HADOOP-10611
 URL: https://issues.apache.org/jira/browse/HADOOP-10611
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


The KeyProvider public API should treat keyversion name as an opaque value. 
Same for the KMS client/server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999282#comment-13999282
 ] 

Alejandro Abdelnur commented on HADOOP-10586:
-

LGTM, +1 pending jenkins.

> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch, HADOOP-10586.5.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10586:
--

Attachment: HADOOP-10586.5.patch

Add a unit test.

> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch, HADOOP-10586.5.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10583) bin/hadoop key throws NPE with no args and assorted other fixups

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998222#comment-13998222
 ] 

Hudson commented on HADOOP-10583:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
HADOOP-10583. bin/hadoop key throws NPE with no args and assorted other fixups. 
(clamb via tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594320)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyShell.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyShell.java


> bin/hadoop key throws NPE with no args and assorted other fixups
> 
>
> Key: HADOOP-10583
> URL: https://issues.apache.org/jira/browse/HADOOP-10583
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: HADOOP-10583.1.patch, HADOOP-10583.2.patch, 
> HADOOP-10583.3.patch, HADOOP-10583.4.patch, HADOOP-10583.5.patch
>
>
> bin/hadoop key throws NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000392#comment-14000392
 ] 

Hadoop QA commented on HADOOP-10612:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645268/HADOOP-10612.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3946//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3946//console

This message is automatically generated.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.pdf

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-10588) Workaround for jetty6 acceptor startup issue

2014-05-16 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HADOOP-10588.
-

   Resolution: Fixed
Fix Version/s: 2.5.0
   0.23.11
 Hadoop Flags: Reviewed

Thanks for the reviews, Sangjin and Jon. I've committed this to branch-0.23 and 
branch-2.

> Workaround for jetty6 acceptor startup issue
> 
>
> Key: HADOOP-10588
> URL: https://issues.apache.org/jira/browse/HADOOP-10588
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 0.23.11, 2.5.0
>
> Attachments: selector.patch, selector23.patch
>
>
> When a cluster is restarted, jetty is not functioning for a small percentage 
> of datanodes, requiring restart of those datanodes.  This is caused by 
> JETTY-1316.
> We've tried overriding isRunning() and retrying on super.isRunning() 
> returning false, as the reporter of JETTY-1316 mentioned in the description.  
> It looks like the code was actually exercised (i.e. the issue was caused by 
> this jetty bug)  and the acceptor was working fine after retry.
> Since we will probably move to a later version of jetty after branch-3 is 
> cut, we can put this workaround in branch-2 only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10566) Refactor proxyservers out of ProxyUsers

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999016#comment-13999016
 ] 

Hudson commented on HADOOP-10566:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HADOOP-10566. Adding files missed in previous commit 1594280 (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594282)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ProxyServers.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/authorize/TestProxyServers.java
HADOOP-10566. Refactor proxyservers out of ProxyUsers. Contributed by Benoy 
Antony. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594280)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ProxyUsers.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/authorize/TestProxyUsers.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


> Refactor proxyservers out of ProxyUsers
> ---
>
> Key: HADOOP-10566
> URL: https://issues.apache.org/jira/browse/HADOOP-10566
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HADOOP-10566.patch, HADOOP-10566.patch, 
> HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch
>
>
> HADOOP-10498 added proxyservers feature in ProxyUsers. It is beneficial to 
> treat this as a separate feature since 
> 1> The ProxyUsers is per proxyuser where as proxyservers is per cluster. The 
> cardinality is different. 
> 2> The ProxyUsers.authorize() and ProxyUsers.isproxyUser() are synchronized 
> and hence share the same lock  and impacts performance.
> Since these are two separate features, it will be an improvement to keep them 
> separate. It also enables one to fine-tune each feature independently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998893#comment-13998893
 ] 

Alejandro Abdelnur commented on HADOOP-10607:
-

I see the point of lot of luggage in KeyStore that is not needed.

Instead adding a new interface, have you considered doing it in the KeyProvider 
itself? After all the credentials are a key. Then the KMS could easily add REST 
support for that too.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation

2014-05-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998707#comment-13998707
 ] 

Steve Loughran commented on HADOOP-10400:
-

Andrei points out something else: more scale tests. There's something in 
swiftfs that does many-file operations, which picked up throttling problems on 
some services

> Incorporate new S3A FileSystem implementation
> -
>
> Key: HADOOP-10400
> URL: https://issues.apache.org/jira/browse/HADOOP-10400
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Jordan Mendelson
>Assignee: Jordan Mendelson
> Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, 
> HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch
>
>
> The s3native filesystem has a number of limitations (some of which were 
> recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
> the aws-sdk instead of the jets3t library. There are a number of improvements 
> over s3native including:
> - Parallel copy (rename) support (dramatically speeds up commits on large 
> files)
> - AWS S3 explorer compatible empty directories files "xyz/" instead of 
> "xyz_$folder$" (reduces littering)
> - Ignores s3native created _$folder$ files created by s3native and other S3 
> browsing utilities
> - Supports multiple output buffer dirs to even out IO when uploading files
> - Supports IAM role-based authentication
> - Allows setting a default canned ACL for uploads (public, private, etc.)
> - Better error recovery handling
> - Should handle input seeks without having to download the whole file (used 
> for splits a lot)
> This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
> various pom files to get it to build against trunk. I've been using 0.0.1 in 
> production with CDH 4 for several months and CDH 5 for a few days. The 
> version here is 0.0.2 which changes around some keys to hopefully bring the 
> key name style more inline with the rest of hadoop 2.x.
> *Tunable parameters:*
> fs.s3a.access.key - Your AWS access key ID (omit for role authentication)
> fs.s3a.secret.key - Your AWS secret key (omit for role authentication)
> fs.s3a.connection.maximum - Controls how many parallel connections 
> HttpClient spawns (default: 15)
> fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 
> (default: true)
> fs.s3a.attempts.maximum - How many times we should retry commands on 
> transient errors (default: 10)
> fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
> fs.s3a.paging.maximum - How many keys to request from S3 when doing 
> directory listings at a time (default: 5000)
> fs.s3a.multipart.size - How big (in bytes) to split a upload or copy 
> operation up into (default: 104857600)
> fs.s3a.multipart.threshold - Until a file is this large (in bytes), use 
> non-parallel upload (default: 2147483647)
> fs.s3a.acl.default - Set a canned ACL on newly created/copied objects 
> (private | public-read | public-read-write | authenticated-read | 
> log-delivery-write | bucket-owner-read | bucket-owner-full-control)
> fs.s3a.multipart.purge - True if you want to purge existing multipart 
> uploads that may not have been completed/aborted correctly (default: false)
> fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads 
> to purge (default: 86400)
> fs.s3a.buffer.dir - Comma separated list of directories that will be used 
> to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a )
> *Caveats*:
> Hadoop uses a standard output committer which uploads files as 
> filename.COPYING before renaming them. This can cause unnecessary performance 
> issues with S3 because it does not have a rename operation and S3 already 
> verifies uploads against an md5 that the driver sets on the upload request. 
> While this FileSystem should be significantly faster than the built-in 
> s3native driver because of parallel copy support, you may want to consider 
> setting a null output committer on our jobs to further improve performance.
> Because S3 requires the file length and MD5 to be known before a file is 
> uploaded, all output is buffered out to a temporary file first similar to the 
> s3native driver.
> Due to the lack of native rename() for S3, renaming extremely large files or 
> directories make take a while. Unfortunately, there is no way to notify 
> hadoop that progress is still being made for rename operations, so your job 
> may time out unless you increase the task timeout.
> This driver will fully ignore _$folder$ files. This was necessary so that it 
> could interoperate with repositories that have had the s3native driver used 
> on them, but means that it won't recognize empty 

[jira] [Updated] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-10610:


Component/s: fs/s3

moved from -HDFS to -COMMON, labelled as fs/s3

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10474) Move o.a.h.record to hadoop-streaming

2014-05-16 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000109#comment-14000109
 ] 

Viraj Bhat commented on HADOOP-10474:
-

Due to the above changes, compilation fails in Hive contrib module and further 
with the removal of certain classes in org.apache.hadoop.record 
(https://issues.apache.org/jira/browse/HADOOP-10485) it could cause a problem..

Created: https://issues.apache.org/jira/browse/HIVE-7077 to track this in Hive.

Viraj

> Move o.a.h.record to hadoop-streaming
> -
>
> Key: HADOOP-10474
> URL: https://issues.apache.org/jira/browse/HADOOP-10474
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HADOOP-10474.000.patch, HADOOP-10474.001.patch, 
> HADOOP-10474.002.patch
>
>
> The classes in o.a.h.record have been deprecated for more than a year and a 
> half. They should be removed. As the first step, the jira moves all these 
> classes into the hadoop-streaming project, which is the only user of these 
> classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number"

2014-05-16 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HADOOP-10611:


Description: 
The KeyProvider public API should treat keyversion name as an opaque value. 
Same for the KMS client/server.

Methods like {{KeyProvider#buildVersionName()}} and 
{KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} 

  was:The KeyProvider public API should treat keyversion name as an opaque 
value. Same for the KMS client/server.


> KeyVersion name should not be assumed to be the 'key name @ the version 
> number"
> ---
>
> Key: HADOOP-10611
> URL: https://issues.apache.org/jira/browse/HADOOP-10611
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>
> The KeyProvider public API should treat keyversion name as an opaque value. 
> Same for the KMS client/server.
> Methods like {{KeyProvider#buildVersionName()}} and 
> {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10602) Documentation has broken "Go Back" hyperlinks.

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999575#comment-13999575
 ] 

Hadoop QA commented on HADOOP-10602:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644788/HADOOP-10602.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-tools/hadoop-sls.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3941//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3941//console

This message is automatically generated.

> Documentation has broken "Go Back" hyperlinks.
> --
>
> Key: HADOOP-10602
> URL: https://issues.apache.org/jira/browse/HADOOP-10602
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Chris Nauroth
>Assignee: Akira AJISAKA
>Priority: Trivial
>  Labels: newbie
> Attachments: HADOOP-10602.patch
>
>
> Multiple pages of our documentation have "Go Back" links that are broken, 
> because they point to an incorrect relative path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10566) Refactor proxyservers out of ProxyUsers

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999016#comment-13999016
 ] 

Hudson commented on HADOOP-10566:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HADOOP-10566. Adding files missed in previous commit 1594280 (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594282)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ProxyServers.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/authorize/TestProxyServers.java
HADOOP-10566. Refactor proxyservers out of ProxyUsers. Contributed by Benoy 
Antony. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594280)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ProxyUsers.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/authorize/TestProxyUsers.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


> Refactor proxyservers out of ProxyUsers
> ---
>
> Key: HADOOP-10566
> URL: https://issues.apache.org/jira/browse/HADOOP-10566
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HADOOP-10566.patch, HADOOP-10566.patch, 
> HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch
>
>
> HADOOP-10498 added proxyservers feature in ProxyUsers. It is beneficial to 
> treat this as a separate feature since 
> 1> The ProxyUsers is per proxyuser where as proxyservers is per cluster. The 
> cardinality is different. 
> 2> The ProxyUsers.authorize() and ProxyUsers.isproxyUser() are synchronized 
> and hence share the same lock  and impacts performance.
> Since these are two separate features, it will be an improvement to keep them 
> separate. It also enables one to fine-tune each feature independently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999577#comment-13999577
 ] 

Hadoop QA commented on HADOOP-10607:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645101/10607-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3943//console

This message is automatically generated.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999109#comment-13999109
 ] 

Juan Yu commented on HADOOP-10610:
--

Just wondering why don't cache the tmp dir list? do we have to parse the config 
to get the list for every newBackupFile() call?

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000304#comment-14000304
 ] 

Aaron T. Myers commented on HADOOP-10612:
-

Sure, that sounds fine.

Thanks guys.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10075) Update jetty dependency to version 9

2014-05-16 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000360#comment-14000360
 ] 

Demai Ni commented on HADOOP-10075:
---

[~rrati], do you have a hbase jira/patch available? Thanks... Demai

> Update jetty dependency to version 9
> 
>
> Key: HADOOP-10075
> URL: https://issues.apache.org/jira/browse/HADOOP-10075
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Robert Rati
>Assignee: Robert Rati
> Attachments: HADOOP-10075.patch
>
>
> Jetty6 is no longer maintained.  Update the dependency to jetty9.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10613) Potential Resource Leaks in FileSystem.CACHE

2014-05-16 Thread Nemon Lou (JIRA)
Nemon Lou created HADOOP-10613:
--

 Summary: Potential Resource Leaks in FileSystem.CACHE 
 Key: HADOOP-10613
 URL: https://issues.apache.org/jira/browse/HADOOP-10613
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.4.0
Reporter: Nemon Lou


There is no size limit of the hashmap in  FileSystem.CACHE, which can cause a 
potential memory leak.
If every time i use a new UGI object to invoke FileSystem.get(conf)  and never 
invoke FileSystem's close method,this issue will raise.

If there is a size limit of the hashmap or changing fileSystem instances to 
soft reference,then user's code don't need to consider too much about the cache 
leak issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10389) Native RPCv9 client

2014-05-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000270#comment-14000270
 ] 

Colin Patrick McCabe commented on HADOOP-10389:
---

bq. Neither a server nor a client must implement multiple calls in flight. Both 
the Java client and server do implement this feature. But if a client only ever 
submits requests serially over a connection then it can ignore the response 
callid, since while only a single call is ever outstanding then only that call 
will ever be responded to.

Thanks, Doug.  That is the strategy we're using now in the native client-- one 
call at a time.  I don't think it would be too tough to put multiple calls in 
flight at some point, though... maybe I'll take a look...

> Native RPCv9 client
> ---
>
> Key: HADOOP-10389
> URL: https://issues.apache.org/jira/browse/HADOOP-10389
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HADOOP-10388
>Reporter: Binglin Chang
>Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, 
> HADOOP-10389.004.patch, HADOOP-10389.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HADOOP-10474) Move o.a.h.record to hadoop-streaming

2014-05-16 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened HADOOP-10474:
-


Reopening this as Hive is an important part of the Hadoop stack.  Arguably we 
shouldn't remove something that hasn't been deprecated for at least one full 
major release.  org.apache.hadoop.record.* wasn't deprecated in 1.x so it seems 
premature to remove it in 2.x, especially in a minor release of 2.x.

Recommend we revert this, at least in branch-2.

> Move o.a.h.record to hadoop-streaming
> -
>
> Key: HADOOP-10474
> URL: https://issues.apache.org/jira/browse/HADOOP-10474
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HADOOP-10474.000.patch, HADOOP-10474.001.patch, 
> HADOOP-10474.002.patch
>
>
> The classes in o.a.h.record have been deprecated for more than a year and a 
> half. They should be removed. As the first step, the jira moves all these 
> classes into the hadoop-streaming project, which is the only user of these 
> classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10389) Native RPCv9 client

2014-05-16 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000204#comment-14000204
 ] 

Doug Cutting commented on HADOOP-10389:
---

Neither a server nor a client must implement multiple calls in flight.  Both 
the Java client and server do implement this feature.  But if a client only 
ever submits requests serially over a connection then it can ignore the 
response callid, since while only a single call is ever outstanding then only 
that call will ever be responded to.  Similarly, a server might be implemented 
to serially respond to requests so long as it copies call ids from requests 
responses.

> Native RPCv9 client
> ---
>
> Key: HADOOP-10389
> URL: https://issues.apache.org/jira/browse/HADOOP-10389
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HADOOP-10388
>Reporter: Binglin Chang
>Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, 
> HADOOP-10389.004.patch, HADOOP-10389.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.2.patch

New patch:
- added apache headers to XML files
- fixed the suppression of m2e warning (in pom.xml)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10609) .gitignore should ignore .orig and .rej files

2014-05-16 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998640#comment-13998640
 ] 

Tsuyoshi OZAWA commented on HADOOP-10609:
-

+1(non-binding)

> .gitignore should ignore .orig and .rej files
> -
>
> Key: HADOOP-10609
> URL: https://issues.apache.org/jira/browse/HADOOP-10609
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: hadoop-10609.patch
>
>
> .gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10602) Documentation has broken "Go Back" hyperlinks.

2014-05-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999097#comment-13999097
 ] 

Chris Nauroth commented on HADOOP-10602:


Thanks for taking this, Akira.  It looks like you're taking the approach of 
deleting the links.  I think this is fine, because we have all the direct links 
in the left nav, and the user always has the browser back button too.  It 
streamlines the pages a bit too.  Does anyone else out there object to removing 
the Go Back links?  If not, let's go ahead with this approach.

To make this change comprehensive, we'll need to update some additional files.  
Here is what I turned up in a grep of  'Go Back' across the whole repo:

hadoop-common-project/hadoop-auth/src/site/apt/BuildingIt.apt.vm
hadoop-common-project/hadoop-auth/src/site/apt/BuildingIt.apt.vm
hadoop-common-project/hadoop-auth/src/site/apt/Configuration.apt.vm
hadoop-common-project/hadoop-auth/src/site/apt/Configuration.apt.vm
hadoop-common-project/hadoop-auth/src/site/apt/Examples.apt.vm
hadoop-common-project/hadoop-auth/src/site/apt/Examples.apt.vm
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ViewFs.apt.vm
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
hadoop-tools/hadoop-sls/src/site/apt/SchedulerLoadSimulator.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HistoryServerRest.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/MapredAppMasterRest.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WebServicesIntro.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


> Documentation has broken "Go Back" hyperlinks.
> --
>
> Key: HADOOP-10602
> URL: https://issues.apache.org/jira/browse/HADOOP-10602
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Chris Nauroth
>Assignee: Akira AJISAKA
>Priority: Trivial
>  Labels: newbie
> Attachments: HADOOP-10602.patch
>
>
> Multiple pages of our documentation have "Go Back" links that are broken, 
> because they point to an incorrect relative path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10572) Example NFS mount command must pass noacl as it isn't supported by the server yet

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998998#comment-13998998
 ] 

Hudson commented on HADOOP-10572:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HADOOP-10572. Example NFS mount command must pass noacl as it isn't supported 
by the server yet. Contributed by Harsh J. (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594289)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm


> Example NFS mount command must pass noacl as it isn't supported by the server 
> yet
> -
>
> Key: HADOOP-10572
> URL: https://issues.apache.org/jira/browse/HADOOP-10572
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 2.5.0
>
> Attachments: HADOOP-10572.patch
>
>
> Use of the documented default mount command results in the below server side 
> log WARN event, cause the client tries to locate the ACL program (#100227):
> {code}
> 12:26:11.975 AM   TRACE   org.apache.hadoop.oncrpc.RpcCall
> Xid:-1114380537, messageType:RPC_CALL, rpcVersion:2, program:100227, 
> version:3, procedure:0, credential:(AuthFlavor:AUTH_NONE), 
> verifier:(AuthFlavor:AUTH_NONE)
> 12:26:11.976 AM   TRACE   org.apache.hadoop.oncrpc.RpcProgram 
> NFS3 procedure #0
> 12:26:11.976 AM   WARNorg.apache.hadoop.oncrpc.RpcProgram 
> Invalid RPC call program 100227
> {code}
> The client mount command must pass {{noacl}} to avoid this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10150) Hadoop cryptographic file system

2014-05-16 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-10150:
-

Target Version/s: fs-encryption (HADOOP-10150 and HDFS-6134)  (was: 3.0.0)

> Hadoop cryptographic file system
> 
>
> Key: HADOOP-10150
> URL: https://issues.apache.org/jira/browse/HADOOP-10150
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>  Labels: rhino
> Fix For: 3.0.0
>
> Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file 
> system-V2.docx, HADOOP cryptographic file system.pdf, 
> HDFSDataAtRestEncryptionAlternatives.pdf, 
> HDFSDataatRestEncryptionAttackVectors.pdf, 
> HDFSDataatRestEncryptionProposal.pdf, cfs.patch, extended information based 
> on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.Transparent to and no modification required for upper layer 
> applications.
> 2.“Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.A robust key management framework.
> 7.Support Pread and append operations if the wrapped file system supports 
> them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell reassigned HADOOP-9629:


Assignee: Mike Liddell  (was: Mostafa Elhemali)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998767#comment-13998767
 ] 

Yi Liu commented on HADOOP-10603:
-

In HDFS-6392, Wire crypto streams for encrypted files in DFSClient.  So 
{{Seekable}}, {{PositionedReadable}} ... should be implemented in 
{{CryptoInputStream}} instead of internally in {{CryptoFSDataInputStream}}.

Furthermore, refining to have similar interfaces like {{DirectDecompressor}} 
for Encryptor/Decryptor almost finish, will update the patch later..

> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: 3.0.0
>
> Attachments: HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Status: Patch Available  (was: Open)

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not update 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999231#comment-13999231
 ] 

Andrew Wang commented on HADOOP-10561:
--

I don't think xattrs should be copied by default, since that can lead to 
potential confusion. {{cp}} doesn't by default either, you need to specify via 
"--preserve=xattr" or "--preserve=all", not just "-p".

 How about another flag like "-X" or "-pX"? I think ACLs are doing "-pa" for 
distcp and ACLs, so something along those lines.

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10592) Add unit test case for net in hadoop native client

2014-05-16 Thread Wenwu Peng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated HADOOP-10592:


Attachment: HADOOP-10592-pnative.002.patch

update CMakelists.txt file

> Add unit test case for net in hadoop native client 
> ---
>
> Key: HADOOP-10592
> URL: https://issues.apache.org/jira/browse/HADOOP-10592
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HADOOP-10388
>Reporter: Wenwu Peng
>Assignee: Wenwu Peng
> Attachments: HADOOP-10592-pnative.001.patch, 
> HADOOP-10592-pnative.002.patch
>
>
> Add unit test case for net.c in hadoop native client 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-16 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-10603:


Attachment: HADOOP-10603.3.patch

Hi [~clamb] and [~tucu00], thanks for your good comments. I merge part of your 
attached code to the new patch, it’s good.

Definitions of {{Encryptor}} and {{Decryptor}} interfaces in latest patch are 
as following:
{code}
public interface Encryptor {
  …
  …
  public void encrypt(ByteBuffer inBuffer, ByteBuffer outBuffer) throws 
IOException;
}
{code}
For each encryption, it’s better to invoke few Cipher# methods, these calls are 
expensive. We could treat whole input stream or output stream as 
encryption/decryption unit, instead of each time  encrypt/decrypt as an unit. 

{quote}
The AESCTREncryptor/AESCTRDecryptor do update() instead of doFinal() on the 
cipher. This could lead to incomplete decryption of the buffer in some cipher 
implementations since the contract is that a doFinal() must be done.
{quote}
For the encryption mode which needs padding, we should use {{doFinal}}, for 
CTR, {{update}} is OK. The difference between these two is {{update}} will 
maintain the internal state, we can utilize this, so for each “encrypt/decrypt” 
we only need call one Cipher# method. From {{Cipher#update}} javadoc, we can 
know it decrypts data, for JCE default provider, and diaceros, and even UPDATE 
interface of OpenSSL, they all work in this way.

I also agree with you that some cipher implementation may be not good, and 
{{update}} doesn't not encrypt/decrypt data for CTR mode, we should be able to 
handle this, in this situation we need to invoke {{doFinal}}.
Please review the patch to see how we handle all the possible situations. In 
most cases, we only invoke one Cipher# method for each ecrypt/decrypt.

So in the attached code {{Cipher#init}}, {{Cipher#update}}, {{Cipher#doFinal}} 
in each #process method is not necessary.
{code}
void process(Cipher cipher, int mode, SecretKeySpec key, byte[] originalIV,
…
  cipher.init(mode, key, new IvParameterSpec(workingIV));
  int mod = (int) (absolutePos % BLOCK_SIZE);
  cipher.update(IV_FORWARD_SINK, 0, mod, IV_FORWARD_SINK, 0);
  cipher.doFinal(in, out);
…   
  }
{code}

In the latest patch of stream implementations,
For decryption:
{code}
@Override
  public int read(byte[] b, int off, int len) throws IOException {
checkStream();
if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
  throw new IndexOutOfBoundsException();
}

int remaining = outBuffer.remaining();
if (remaining > 0) {
  int n = Math.min(len, remaining);
  outBuffer.get(b, off, n);
  return n;
} else {
  int n = 0;
  if (in instanceof ByteBufferReadable) {
n = ((ByteBufferReadable) in).read(inBuffer);
if (n <= 0) {
  return n;
}
  } else {
int toRead = inBuffer.remaining();
byte[] tmp = getTmpBuf();
n = in.read(tmp, 0, toRead);
if (n <= 0) {
  return n;
}
inBuffer.put(tmp, 0, n);
  }
  streamOffset += n; // Read n bytes
  return process(b, off, n);
}
  }
{code}
We need to use {{ByteBufferReadable}}, it avoids copy. So for HDFS, it does.

{quote}
The current implementation assumes the counter portion always starts
with zero, right?
{quote}
I will add this in next version of patch.

For other issues, they should be resolved in latest patch, please review the 
patch again, thanks. We also have more test cases to cover them in HDFS-6405.
I will also add test cases in common.


> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: 3.0.0
>
> Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.3.patch, HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-10612:
---

Hadoop Flags: Reviewed

+1 for patch v3 pending Jenkins.  Thank you, Brandon!

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1378#comment-1378
 ] 

Chris Nauroth commented on HADOOP-10612:


bq. How about using this JIRA to track the bug fix, and doing the configuration 
change as part of the fix to HDFS-6056?

That sounds like a good plan to me.  I'll be +1 for the patch after removal of 
the configuration property name change.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999446#comment-13999446
 ] 

Brandon Li commented on HADOOP-10612:
-

Thank you, Chris.
You are right that it's a incompatible change. I've updated the JIRA. This 
property is a hidden property for development/test and not visible in user 
guide or configuration files. 

I've switched to Time#monotonicNow in the new patch and will also file a JIRA 
to update the related code in HDFS project. Thanks for the suggestion.

 

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10586:
--

Attachment: HADOOP-10586.4.patch

The .4 patch removes the description = null.


> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to suppoer multi directories

2014-05-16 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1443#comment-1443
 ] 

Ted Malaska commented on HADOOP-10610:
--

Sorry I have been slow on this.  I will have the new patch sometime this 
weekend.

Thanks again for the help.

> Upgrade S3n s3.fs.buffer.dir to suppoer multi directories
> -
>
> Key: HADOOP-10610
> URL: https://issues.apache.org/jira/browse/HADOOP-10610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.4.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Attachments: HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before 
> getting sent to S3.  Right now this is limited to a single folder which 
> causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write 
> speed by 2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HADOOP-10586:


   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Charles.

> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch, HADOOP-10586.5.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Attachment: HADOOP-10612.002.patch

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Attachment: HADOOP-10612.003.patch

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Hadoop Flags:   (was: Incompatible change)

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999338#comment-13999338
 ] 

Mike Liddell commented on HADOOP-9629:
--

A revised approach is now being used so that the Azure driver is handled the 
same way as the open-stack driver:
 - The Azure FileSystem driver is now a separate project 
hadoop-tools\hadoop-azure

As part of moving to a separate project area, the following have also been done:
- findbugs
- checkstyle
- code-cleanup based on the above and also based on Apache formatting rules 
- remove metrics business for now (it will come back later as a dedicated patch)

Namespace altered from org.apache.hadoop.fs.azurenative -> 
org.apache.hadoop.fs.azure

New approach is HADOOP-9629.trunk.1.patch 


> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1413#comment-1413
 ] 

Brandon Li commented on HADOOP-10612:
-

Uploaded a new patch to address the comments.

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.003.patch, 
> HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10609) .gitignore should ignore .orig and .rej files

2014-05-16 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998698#comment-13998698
 ] 

Akira AJISAKA commented on HADOOP-10609:


+1 (non-binding), make sense to me.

> .gitignore should ignore .orig and .rej files
> -
>
> Key: HADOOP-10609
> URL: https://issues.apache.org/jira/browse/HADOOP-10609
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: hadoop-10609.patch
>
>
> .gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9648) Fix build native library on mac osx

2014-05-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999563#comment-13999563
 ] 

Sangjin Lee commented on HADOOP-9648:
-

+1. It would be good to get this committed. This will help enable a lot of 
desktop/laptop testing of things that require native libraries.

> Fix build native library on mac osx
> ---
>
> Key: HADOOP-9648
> URL: https://issues.apache.org/jira/browse/HADOOP-9648
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 1.0.4, 1.2.0, 1.1.2, 2.0.5-alpha
>Reporter: Kirill A. Korinskiy
>Assignee: Binglin Chang
> Attachments: HADOOP-9648-native-osx.1.0.4.patch, 
> HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, 
> HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch
>
>
> Some patches for fixing build a hadoop native library on os x 10.7/10.8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1348#comment-1348
 ] 

Brandon Li commented on HADOOP-10612:
-

How about using this JIRA to track the bug fix, and doing the configuration 
change as part of the fix to HDFS-6056?

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10594) Improve Concurrency in Groups

2014-05-16 Thread Benoy Antony (JIRA)
Benoy Antony created HADOOP-10594:
-

 Summary: Improve Concurrency in Groups
 Key: HADOOP-10594
 URL: https://issues.apache.org/jira/browse/HADOOP-10594
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HADOOP-10594.patch

The static field GROUPS in Groups can be accessed by holding a lock only.
This object is effectively immutable after construction and hence can safely 
published using a volatile field. This enables threads to access this GROUPS 
object without holding lock



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999269#comment-13999269
 ] 

Larry McCay commented on HADOOP-10607:
--

Some java 7 symbols crept into the patch - I will remove them and resubmit.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10603:
--

Attachment: HADOOP-10603.2.patch

I've attached a better patch with a few minor changes to the file name.

> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: 3.0.0
>
> Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10588) Workaround for jetty6 acceptor startup issue

2014-05-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998866#comment-13998866
 ] 

Jonathan Eagles commented on HADOOP-10588:
--

+1. Checking this into branch-2 and branch-0.23 (not trunk)

> Workaround for jetty6 acceptor startup issue
> 
>
> Key: HADOOP-10588
> URL: https://issues.apache.org/jira/browse/HADOOP-10588
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 0.23.11, 2.5.0
>
> Attachments: selector.patch, selector23.patch
>
>
> When a cluster is restarted, jetty is not functioning for a small percentage 
> of datanodes, requiring restart of those datanodes.  This is caused by 
> JETTY-1316.
> We've tried overriding isRunning() and retrying on super.isRunning() 
> returning false, as the reporter of JETTY-1316 mentioned in the description.  
> It looks like the code was actually exercised (i.e. the issue was caused by 
> this jetty bug)  and the acceptor was working fine after retry.
> Since we will probably move to a later version of jetty after branch-3 is 
> cut, we can put this workaround in branch-2 only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999267#comment-13999267
 ] 

Larry McCay commented on HADOOP-10607:
--

I did consider that [~tucu00] - I actually answered this question before you 
asked it.
GOTO my-original-response-to-wrong-question.
:)

It would certainly be easier to add to the KMS server that way.
However, I still feel that the ability to evolve independent of KeyProvider API 
and the additional baggage for CredentialProviders that don't want to be 
KeyProviders outweighs the benefits of consolidating them. Especially, if we 
make sure that KeyProviders can be used as CredentialProviders with an adapter.

I'm interested in your thoughts on it though.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10133) winutils detection on windows-cygwin fails

2014-05-16 Thread Dinesh Nayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998789#comment-13998789
 ] 

Dinesh Nayak commented on HADOOP-10133:
---

modifying this line in libexec\hadoop-config.sh helped me  to transform path to 
windows format for Java code to pick it up .

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.home.dir="$(cygpath -pw "$HADOOP_PREFIX")""

> winutils detection on windows-cygwin fails
> --
>
> Key: HADOOP-10133
> URL: https://issues.apache.org/jira/browse/HADOOP-10133
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.2.0
> Environment: windows 7, cygwin
>Reporter: Franjo Markovic
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
> Hadoop binaries.
> at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
>  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
> at org.apache.hadoop.util.Shell.(Shell.java:293)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10564) Add username to native RPCv9 client

2014-05-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-10564:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add username to native RPCv9 client
> ---
>
> Key: HADOOP-10564
> URL: https://issues.apache.org/jira/browse/HADOOP-10564
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: HADOOP-10388
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: HADOOP-10388
>
> Attachments: HADOOP-10564-pnative.002.patch, 
> HADOOP-10564-pnative.003.patch, HADOOP-10564-pnative.004.patch, 
> HADOOP-10564-pnative.005.patch, HADOOP-10564-pnative.006.patch, 
> HADOOP-10564.001.patch
>
>
> Add the ability for the native RPCv9 client to set a username when initiating 
> a connection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10612) NFS failed to refresh the user group id mapping table

2014-05-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HADOOP-10612:


Description: Found by Preetham Kukillaya. The user/group id mapping table 
is not updated periodically.  (was: Found by Preetham Kukillaya. The user/group 
id mapping table is not update periodically.)

> NFS failed to refresh the user group id mapping table
> -
>
> Key: HADOOP-10612
> URL: https://issues.apache.org/jira/browse/HADOOP-10612
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-10612.002.patch, HADOOP-10612.patch
>
>
> Found by Preetham Kukillaya. The user/group id mapping table is not updated 
> periodically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated HADOOP-10607:
-

Status: Open  (was: Patch Available)

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607-3.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10586) KeyShell doesn't allow setting Options via CLI

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999298#comment-13999298
 ] 

Hadoop QA commented on HADOOP-10586:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645004/HADOOP-10586.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3938//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3938//console

This message is automatically generated.

> KeyShell doesn't allow setting Options via CLI
> --
>
> Key: HADOOP-10586
> URL: https://issues.apache.org/jira/browse/HADOOP-10586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-10586.1.patch, HADOOP-10586.1.patch, 
> HADOOP-10586.3.patch, HADOOP-10586.4.patch, HADOOP-10586.5.patch
>
>
> You should be able to set any of the Options passed to the KeyProvider via 
> the CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999124#comment-13999124
 ] 

Hadoop QA commented on HADOOP-10607:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644760/10607.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3936//console

This message is automatically generated.

> Create an API to Separate Credentials/Password Storage from Applications
> 
>
> Key: HADOOP-10607
> URL: https://issues.apache.org/jira/browse/HADOOP-10607
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Larry McCay
>Assignee: Larry McCay
> Fix For: 3.0.0
>
> Attachments: 10607-2.patch, 10607.patch
>
>
> As with the filesystem API, we need to provide a generic mechanism to support 
> multiple credential storage mechanisms that are potentially from third 
> parties. 
> We need the ability to eliminate the storage of passwords and secrets in 
> clear text within configuration files or within code.
> Toward that end, I propose an API that is configured using a list of URLs of 
> CredentialProviders. The implementation will look for implementations using 
> the ServiceLoader interface and thus support third party libraries.
> Two providers will be included in this patch. One using the credentials cache 
> in MapReduce jobs and the other using Java KeyStores from either HDFS or 
> local file system. 
> A CredShell CLI will also be included in this patch which provides the 
> ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >