[jira] [Updated] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon

2014-05-21 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HADOOP-10623:
---

Attachment: HADOOP-10623.v02.patch

Added ability to load the config from 
- an arbitrary filesystem  (helps digesting job.xml from a staging submit dir)
- include only a certain key in the   

> Provide a utility to be able inspect the config as seen by a hadoop client 
> daemon 
> --
>
> Key: HADOOP-10623
> URL: https://issues.apache.org/jira/browse/HADOOP-10623
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch
>
>
> To ease debugging of config issues it is convenient to be able to generate a 
> config as seen by the job client or a hadoop daemon
> {noformat}
> ]$ hadoop org.apache.hadoop.util.ConfigTool -help 
> Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ]
>   if resource contains '/', load from local filesystem
>   otherwise, load from the classpath
> Generic options supported are
> -conf  specify an application configuration file
> -D use value for given property
> -fs   specify a namenode
> -jt specify a job tracker
> -files specify comma separated files to be 
> copied to the map reduce cluster
> -libjars specify comma separated jar files 
> to include in the classpath.
> -archives specify comma separated 
> archives to be unarchived on the compute machines.
> The general command line syntax is
> bin/hadoop command [genericOptions] [commandOptions]
> {noformat}
> {noformat}
> $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml 
> ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python 
> -mjson.tool
> {
> "properties": [
> {
> "isFinal": false,
> "key": "mapreduce.framework.name",
> "resource": "mapred-site.xml",
> "value": "yarn"
> },
> {
> "isFinal": false,
> "key": "mapreduce.client.genericoptionsparser.used",
> "resource": "programatically",
> "value": "true"
> },
> {
> "isFinal": false,
> "key": "my.test.conf",
> "resource": "from command line",
> "value": "val"
> },
> {
> "isFinal": false,
> "key": "from.file.key",
> "resource": 
> "hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml",
> "value": "from.file.val"
> },
> {
> "isFinal": false,
> "key": "mapreduce.shuffle.port",
> "resource": "mapred-site.xml",
> "value": "${my.mapreduce.shuffle.port}"
> }
> ]
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-10625:


Status: Patch Available  (was: Open)

> Configuration: names should be trimmed when putting/getting to properties
> -
>
> Key: HADOOP-10625
> URL: https://issues.apache.org/jira/browse/HADOOP-10625
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
> Attachments: HADOOP-10625.patch
>
>
> Currently, Hadoop will not trim name when putting a pair of k/v to property. 
> But when loading configuration from file, names will be trimmed:
> (In Configuration.java)
> {code}
>   if ("name".equals(field.getTagName()) && field.hasChildNodes())
> attr = StringInterner.weakIntern(
> ((Text)field.getFirstChild()).getData().trim());
>   if ("value".equals(field.getTagName()) && field.hasChildNodes())
> value = StringInterner.weakIntern(
> ((Text)field.getFirstChild()).getData());
> {code}
> With this behavior, following steps will be problematic:
> 1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key)
> 2. User try to get "hadoop.key", cannot get "value"
> 3. Serialize/deserialize configuration (Like what did in MR)
> 4. User try to get "hadoop.key", can get "value", which will make 
> inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-10625:


Attachment: HADOOP-10625.patch

Attach a patch for this.

> Configuration: names should be trimmed when putting/getting to properties
> -
>
> Key: HADOOP-10625
> URL: https://issues.apache.org/jira/browse/HADOOP-10625
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
> Attachments: HADOOP-10625.patch
>
>
> Currently, Hadoop will not trim name when putting a pair of k/v to property. 
> But when loading configuration from file, names will be trimmed:
> (In Configuration.java)
> {code}
>   if ("name".equals(field.getTagName()) && field.hasChildNodes())
> attr = StringInterner.weakIntern(
> ((Text)field.getFirstChild()).getData().trim());
>   if ("value".equals(field.getTagName()) && field.hasChildNodes())
> value = StringInterner.weakIntern(
> ((Text)field.getFirstChild()).getData());
> {code}
> With this behavior, following steps will be problematic:
> 1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key)
> 2. User try to get "hadoop.key", cannot get "value"
> 3. Serialize/deserialize configuration (Like what did in MR)
> 4. User try to get "hadoop.key", can get "value", which will make 
> inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)
Wangda Tan created HADOOP-10625:
---

 Summary: Configuration: names should be trimmed when 
putting/getting to properties
 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan


Currently, Hadoop will not trim name when putting a pair of k/v to property. 
But when loading configuration from file, names will be trimmed:
(In Configuration.java)
{code}
  if ("name".equals(field.getTagName()) && field.hasChildNodes())
attr = StringInterner.weakIntern(
((Text)field.getFirstChild()).getData().trim());
  if ("value".equals(field.getTagName()) && field.hasChildNodes())
value = StringInterner.weakIntern(
((Text)field.getFirstChild()).getData());
{code}
With this behavior, following steps will be problematic:
1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key)
2. User try to get "hadoop.key", cannot get "value"
3. Serialize/deserialize configuration (Like what did in MR)
4. User try to get "hadoop.key", can get "value", which will make inconsistency 
problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005481#comment-14005481
 ] 

Yi Liu commented on HADOOP-10603:
-

Thanks Charles for good comments. I'm refining the patch for Andrew's comments, 
will respond you later and also want to address your comments in the new patch 
:-)

> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
> Attachments: CryptoInputStream.java, CryptoOutputStream.java, 
> HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, 
> HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, 
> HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10603:
--

Attachment: CryptoOutputStream.java
CryptoInputStream.java

> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
> Attachments: CryptoInputStream.java, CryptoOutputStream.java, 
> HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, 
> HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, 
> HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005400#comment-14005400
 ] 

Charles Lamb commented on HADOOP-10603:
---

Hi Yi,

Good work so far. I took your latest patch and incorporated it into my sandbox 
and got my unit tests running with it. I have also made some edits to 
CryptoInputStream and CryptoOutputStream. I have attached the whole file for 
those two rather than diffs.

CryptoFactory.java
Perhaps rename this to Crypto.
getEncryptor/getDecryptor should also declare "throws GeneralSecurityException"

Encryptor.java
encrypt should declare throws GeneralSecurityException
decl for encrypt > 80 chars
Consider making this interface an inner class of Crypto (aka CryptoFactory).
Remind me again why encrypt/decrypt don't take a position argument?
I wonder if, in general, we'll also want byte[] overloadings of the methods (as 
well as BB) for encrypt()/decrypt().

Decryptor.java
decrypt should throw GeneralSecurityException
The decl for decrypt > 80 chars
Consider making this interface a subclass of Crypto (aka CryptoFactory).

JCEAESCTRCryptoFactory.java
This file needs an apache license header
Perhaps rename it to JCEAESCTRCrypto.java
getDescryptor/getEncryptor should throw GeneralSecurityException

JCEAESCTRDecryptor.java
ctor should throw GeneralSecurityException instead of RTException
decrypt should throw GeneralSecurityException

JCEAESCTREncryptor.java
ctor should throw GeneralSecurityException instead of RTException
encrypt should throw GeneralSecurityException

CryptoUtils.java
put a newline after "public class CryptoUtils {"
Could calIV be renamed to calcIV?

CryptoFSDataOutputStream.java
Why is fsOut needed? Why can't you just reference out for (e.g.) getPos()?

CryptoInputStream.java
You'll need a getWrappedStream() method.

Why 8192? Should this be moved to a static final int CONSTANT?
IWBNI the name of the interface that a particular method is implementing were 
put in a comment before the @Override. For instance,
// PositionedRead
@Override
public int read(long position ...)

IWBNI all of the methods for a particular interface were grouped together in 
the code.

In read(byte[], int, int), isn't the if (!usingByteBufferRead) I am worried 
that throwing and catching UnsupportedOperationException will be expensive. It 
seems very likely that for any particular stream, the same byte buffer will be 
passed in for the life of the stream. That means that for every call to 
read(...) there is potential for the UnsupportedOperationException to be 
thrown. That will be expensive. Perhaps keep a piece of state in the stream 
that gets set on the first time through indicating whether the BB is readable 
or not. Or keep a reference to the BB along with a bool. If the reference 
changes (on the off chance that the caller switched BBs for the same stream), 
then you can redetermine whether read is supported or not.

In readFully, you could simplify the implementation by just calling into 
read(long, byte[]...), like this:

  @Override // PositionedReadable
  public void readFully(long position, byte[] buffer, int offset, int length)
  throws IOException {
int nread = 0;
while (nread < length) {
  int nbytes =
  read(position + nread, buffer, offset + nread, length - nread);
  if (nbytes < 0) {
throw new EOFException("End of file reached before reading fully.");
  }
  nread += nbytes;
}
  }

That way you can let read(long...) do all the unwinding of the seek position.

In seek(), you can do a check for forward == 0 and return immediately, thus 
saving the two calls to position() in the noop case. Ditto skip().

I noticed that you implemented read(ByteBufferPool), but not releaseBuffer(BB). 
Is that because you didn't have time (it's ok if that's the case, I'm just 
wondering why one and not the other)?

CryptoOutputStream.java
You'll need a getWrappedStream() method.



> Crypto input and output streams implementing Hadoop stream interfaces
> -
>
> Key: HADOOP-10603
> URL: https://issues.apache.org/jira/browse/HADOOP-10603
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Alejandro Abdelnur
>Assignee: Yi Liu
> Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
> Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, 
> HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, 
> HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JD

[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005242#comment-14005242
 ] 

Mark Grover commented on HADOOP-9902:
-

Great! Yeah, sounds good to me and in my personal opinion, Bigtop will be ok 
with expanding the definition. Just let us know when you make that change and 
what release it would show up in:-)

And, we don't use HADOOP_IDENT_STR, so no objections from Bigtop side there.

Let me (or d...@bigtop.apache.org) know if you need anything else. Thank you!

> Shell script rewrite
> 
>
> Key: HADOOP-9902
> URL: https://issues.apache.org/jira/browse/HADOOP-9902
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number"

2014-05-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005209#comment-14005209
 ] 

Owen O'Malley commented on HADOOP-10611:


I disagree on this one. There is a lot of value in having semantics behind the 
key version. For example, the MapReduce task ids used to be randomly generated. 
That was easy, but it was a pain in the tail to figure out which tasks were 
related to which job. 

> KeyVersion name should not be assumed to be the 'key name @ the version 
> number"
> ---
>
> Key: HADOOP-10611
> URL: https://issues.apache.org/jira/browse/HADOOP-10611
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>
> The KeyProvider public API should treat keyversion name as an opaque value. 
> Same for the KMS client/server.
> Methods like {{KeyProvider#buildVersionName()}} and 
> {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005196#comment-14005196
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

That's very helpful! (Especially since that was going to be the next place I 
looked since I just happen to have it cloned from git on my dev machine 
it's going to be one of the first big tests I do as I work towards a 
commit-able patch. :D )

Bigtop looks like it is doing what I would expect: setting it for Hadoop, but 
not using it directly.  Which seems to indicate that, at least as far as Bigtop 
is concerned, we could expand the definition beyond "it must be a user".

Hadoop also uses HADOOP_IDENT_STR as the setting for the Java hadoop.id.str 
property.  But I can't find a single place where this property is used. IIRC, 
it was used in ancient times for logging and/or display, but if we don't need 
the property set anymore because we've gotten wiser, I'd like to just yank that 
property completely.

> Shell script rewrite
> 
>
> Key: HADOOP-9902
> URL: https://issues.apache.org/jira/browse/HADOOP-9902
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005174#comment-14005174
 ] 

Mark Grover commented on HADOOP-9902:
-

Hi Alan,
Good point. In Bigtop, where we create RPM and DEB packages for hadoop and 
bundle it into our Bigtop distribution, we do rely on this property.
And, looking at the code, it looks like we set that to be a user (hdfs user in 
our case).

Here are the references:
These get used in the scripts we deploy using puppet for our integration 
testing:
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-env.sh#L78
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-hdfs#L20

This gets used in the default configuration for our secure clusters for 
integration testing:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/conf.secure/hadoop-env.sh#L56

This gets used in the init script that starts the datanode services:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hadoop-hdfs-datanode.svc#L39

And, this gets used to set certain environment variables before starting 
various HDFS services:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hdfs.default#L20

Hope that helps but please let me know if you need any further info.

> Shell script rewrite
> 
>
> Key: HADOOP-9902
> URL: https://issues.apache.org/jira/browse/HADOOP-9902
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005165#comment-14005165
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

Ran across an interesting discrepancy. hadoop-env.sh says:

{code}
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
{code}

This implies that could be something that isn't a user.  However...

{code}
  chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR
{code}

... we clearly have that assumption.  Since the chown has already been removed 
from the new code, this problem goes away.  But should we explicitly state that 
HADOOP_IDENT_STRING needs to be a user?  Is anyone aware of anything else that 
uses this outside of the Hadoop shell scripts?

> Shell script rewrite
> 
>
> Key: HADOOP-9902
> URL: https://issues.apache.org/jira/browse/HADOOP-9902
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004967#comment-14004967
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

{code}
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/ahs-config/log4j.properties
...
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/timelineserver-config/log4j.properties
{code}

The timeline server made more custom (and likely equally undocumented) 
log4j.properties locations. Needless to say, that's going away too just like 
their rm-config and nm-config brethren.  

> Shell script rewrite
> 
>
> Key: HADOOP-9902
> URL: https://issues.apache.org/jira/browse/HADOOP-9902
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004855#comment-14004855
 ] 

Uma Maheswara Rao G commented on HADOOP-10561:
--

Moved as top level Jira as HDFS-2006 branch merged to trunk!

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Affects Version/s: (was: HDFS XAttrs (HDFS-2006))
   3.0.0

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Issue Type: Bug  (was: Sub-task)
Parent: (was: HADOOP-10514)

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Issue Type: Improvement  (was: Bug)

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10608) Support incremental data copy in DistCp

2014-05-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HADOOP-10608:
-

Hadoop Flags: Reviewed

+1 patch looks good.

> Support incremental data copy in DistCp
> ---
>
> Key: HADOOP-10608
> URL: https://issues.apache.org/jira/browse/HADOOP-10608
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10621) Remove CRLF for xattr value base64 encoding for better display.

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10621:
-

Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-10514

> Remove CRLF for xattr value base64 encoding for better display.
> ---
>
> Key: HADOOP-10621
> URL: https://issues.apache.org/jira/browse/HADOOP-10621
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6426.patch
>
>
> {{Base64.encodeBase64String(value)}} encodes binary data using the base64 
> algorithm into 76 character blocks separated by CRLF.
> In fs shell, xattrs display like:
> {code}
> # file: /user
> user.a1=0sMTIz
> user.a2=0sMTIzNDU2
> user.a3=0sMTIzNDU2
> {code}
> We don't need multiple line and CRLF for xattr value, and we can use:
> {code}
> Base64 base64 = new Base64(0);
> base64.encodeToString(value);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004498#comment-14004498
 ] 

Yi Liu commented on HADOOP-10603:
-

Andrew, thanks for your detailed review. Although they are based on a slightly 
old version, but most are also valid for latest patch :-).  
HADOOP-10617 is the common side test cases for crypto streams, since we already 
have lots of test cases and still need to increase and it's a bit large if 
merging to this, I made it as separate JIRA. 
Sure, according to your suggestion, I will merge it to this JIRA.

Following is my response to your comments, and I will update the patch later.

{quote}
Need class javadoc and interface annotations on all new classes
Need "" to actually line break in javadoc
Some tab characters present
{quote}
I will update them.

{quote}
s/mod/mode
What does "calIV" mean? Javadoc here would be nice.
calIV would be simpler if we used ByteBuffer.wrap and getLong. I think right 
now, we also need to cast each value to a long before shifting, else it only 
works up to an int. Would be good to unit test this function.
{quote}
Right, I will update them.

{quote}
Could you define the term "block" in the #encrypt javadoc?
{quote}
It was a wrong word and should be “buffer”.  Already updated it in latest patch.

{quote}
I don't understand the reinit conditions, do you mind explaining this a bit? 
The javadoc for Cipher#update indicates that it always fully reads the input 
buffer, so is the issue that the cipher sometimes doesn't flush all the input 
to the output buffer?
{quote}
Andrew, I agree with you. The javadoc for Cipher#update indicates that it 
always fully reads the input buffer and decrypt all input data.  This will be 
always correct for CTR mode, for some of other modes input data may be buffered 
if requested padding (CTR doesn’t need padding).  Charles has concern about 
maybe some custom JCE provider implementation can’t decrypt all data for CTR 
mode using {{Cipher#update}}, so I add the reinit conditions, and I think if 
that specific provider can’t decrypt all input data of {{Cipher#update}} for 
CTR mode, that should be a bug of that provider since it doesn't follow the 
definition of {{Cipher#update}}.  

{quote}
 If this API only accepts direct ByteBuffers, we should Precondition check that 
in the implementation
{quote}
I’m not sure we have this restriction. Java heap byteBuffer is also OK.  Direct 
ByteBuffer is more efficient (no copy) when the cipher provider is native code 
and using JNI. I will add if you prefer.

{quote}
 Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, 
not {{#ShortBufferException}}. I also don't see this being thrown because we 
wrap everything in an IOException.
{quote}
Right, I will revise this.

{quote}
How was the default buffer size of 8KB chosen? This should probably be a new 
configuration parameter, or respect io.file.buffer.size.
{quote}
OK. I will add configuration parameter for the default buffer size.

{quote}
Potential for int overflow in {{#write}} where we check {{off+len < 0}}. I also 
find this if statement hard to parse, would prefer if it were expanded.
{quote}
OK. I will expand them in next patch.

{quote}
Is the {{16}} in {{updateEncryptor}} something that should be hard-coded? Maybe 
pull it out into a constant and javadoc why it's 16. I'm curious if this is 
dependent on the Encryptor implementation.
{quote}
Let’s pull it out into variable.  16bytes is 128bits, and it’s in definition of 
AES: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard. Let’s define it 
as a configuration parameter, since other algorithm may have different block 
size, although we use AES.

{quote}
We need to be careful with direct BBs, since they don't trigger GC. We should 
be freeing them manually when the stream is closed, or pooling them somehow for 
reuse.
{quote}
Good point.  For pooling them, maybe they are created with different buffer 
size and not suitable in pool? So I will add freeing them manually when the 
stream is closed.

{quote}
•  In {{#process}}, we flip the inBuf, then if there's no data we just return. 
Shouldn't we restore inBuf to its previous padded state first? Also, IIUC 
{{inBuffer.remaining()}} cannot be less than padding since the inBuffer 
position does not move backwards, so I'd prefer to see a Precondition check and 
{{inBuf.remaining() == padding)}}. Test case would be nice if I'm right about 
this.
{quote}

You are right, there is a potential issue. I will fix it and add test case.  
Since in our code, only when we have input data then we go to {{#process}},  so 
{{inBuffer}} should have real data. But from view of code logic we should 
handle like you said. And agree we have a precondition check.

{quote}
Rename {{#process}} to {{#encrypt}}?
{quote}
Good, let’s do that.

{quote}
Do we need the special-case logic with tmpBuf? It looks lik

[jira] [Commented] (HADOOP-10608) Support incremental data copy in DistCp

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004479#comment-14004479
 ] 

Hadoop QA commented on HADOOP-10608:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645910/HADOOP-10608.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1280 javac 
compiler warnings (more than the trunk's current 1278 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-tools/hadoop-distcp:

  org.apache.hadoop.fs.TestFilterFileSystem
  org.apache.hadoop.fs.TestHarFileSystem
  org.apache.hadoop.hdfs.TestDistributedFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-distcp.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//console

This message is automatically generated.

> Support incremental data copy in DistCp
> ---
>
> Key: HADOOP-10608
> URL: https://issues.apache.org/jira/browse/HADOOP-10608
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-21 Thread Wenwu Peng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated HADOOP-10624:


Attachment: HADOOP-10624-pnative.001.patch

submit the first version patch.

> Fix some minors typo and add more test cases for hadoop_err
> ---
>
> Key: HADOOP-10624
> URL: https://issues.apache.org/jira/browse/HADOOP-10624
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HADOOP-10388
>Reporter: Wenwu Peng
>Assignee: Wenwu Peng
> Attachments: HADOOP-10624-pnative.001.patch
>
>
> Changes:
> 1. Add more test cases to cover method hadoop_lerr_alloc and 
> hadoop_uverr_alloc
> 2. Fix typo as following:
> 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
> hadoop_err.h
> 2) Change OutOfMemory to OutOfMemoryException to consistent with other 
> Exception in hadoop_err.c
> 3) Change DBUG to DEBUG in messenger.c
> 4) Change DBUG to DEBUG in reactor.c



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-21 Thread Wenwu Peng (JIRA)
Wenwu Peng created HADOOP-10624:
---

 Summary: Fix some minors typo and add more test cases for 
hadoop_err
 Key: HADOOP-10624
 URL: https://issues.apache.org/jira/browse/HADOOP-10624
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng


Changes:
1. Add more test cases to cover method hadoop_lerr_alloc and hadoop_uverr_alloc
2. Fix typo as following:
1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
hadoop_err.h
2) Change OutOfMemory to OutOfMemoryException to consistent with other 
Exception in hadoop_err.c
3) Change DBUG to DEBUG in messenger.c
4) Change DBUG to DEBUG in reactor.c




--
This message was sent by Atlassian JIRA
(v6.2#6252)