[jira] [Updated] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon
[ https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HADOOP-10623: --- Attachment: HADOOP-10623.v02.patch Added ability to load the config from - an arbitrary filesystem (helps digesting job.xml from a staging submit dir) - include only a certain key in the > Provide a utility to be able inspect the config as seen by a hadoop client > daemon > -- > > Key: HADOOP-10623 > URL: https://issues.apache.org/jira/browse/HADOOP-10623 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch > > > To ease debugging of config issues it is convenient to be able to generate a > config as seen by the job client or a hadoop daemon > {noformat} > ]$ hadoop org.apache.hadoop.util.ConfigTool -help > Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ] > if resource contains '/', load from local filesystem > otherwise, load from the classpath > Generic options supported are > -conf specify an application configuration file > -D use value for given property > -fs specify a namenode > -jt specify a job tracker > -files specify comma separated files to be > copied to the map reduce cluster > -libjars specify comma separated jar files > to include in the classpath. > -archives specify comma separated > archives to be unarchived on the compute machines. > The general command line syntax is > bin/hadoop command [genericOptions] [commandOptions] > {noformat} > {noformat} > $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml > ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python > -mjson.tool > { > "properties": [ > { > "isFinal": false, > "key": "mapreduce.framework.name", > "resource": "mapred-site.xml", > "value": "yarn" > }, > { > "isFinal": false, > "key": "mapreduce.client.genericoptionsparser.used", > "resource": "programatically", > "value": "true" > }, > { > "isFinal": false, > "key": "my.test.conf", > "resource": "from command line", > "value": "val" > }, > { > "isFinal": false, > "key": "from.file.key", > "resource": > "hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml", > "value": "from.file.val" > }, > { > "isFinal": false, > "key": "mapreduce.shuffle.port", > "resource": "mapred-site.xml", > "value": "${my.mapreduce.shuffle.port}" > } > ] > } > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HADOOP-10625: Status: Patch Available (was: Open) > Configuration: names should be trimmed when putting/getting to properties > - > > Key: HADOOP-10625 > URL: https://issues.apache.org/jira/browse/HADOOP-10625 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Affects Versions: 2.4.0 >Reporter: Wangda Tan > Attachments: HADOOP-10625.patch > > > Currently, Hadoop will not trim name when putting a pair of k/v to property. > But when loading configuration from file, names will be trimmed: > (In Configuration.java) > {code} > if ("name".equals(field.getTagName()) && field.hasChildNodes()) > attr = StringInterner.weakIntern( > ((Text)field.getFirstChild()).getData().trim()); > if ("value".equals(field.getTagName()) && field.hasChildNodes()) > value = StringInterner.weakIntern( > ((Text)field.getFirstChild()).getData()); > {code} > With this behavior, following steps will be problematic: > 1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key) > 2. User try to get "hadoop.key", cannot get "value" > 3. Serialize/deserialize configuration (Like what did in MR) > 4. User try to get "hadoop.key", can get "value", which will make > inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HADOOP-10625: Attachment: HADOOP-10625.patch Attach a patch for this. > Configuration: names should be trimmed when putting/getting to properties > - > > Key: HADOOP-10625 > URL: https://issues.apache.org/jira/browse/HADOOP-10625 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Affects Versions: 2.4.0 >Reporter: Wangda Tan > Attachments: HADOOP-10625.patch > > > Currently, Hadoop will not trim name when putting a pair of k/v to property. > But when loading configuration from file, names will be trimmed: > (In Configuration.java) > {code} > if ("name".equals(field.getTagName()) && field.hasChildNodes()) > attr = StringInterner.weakIntern( > ((Text)field.getFirstChild()).getData().trim()); > if ("value".equals(field.getTagName()) && field.hasChildNodes()) > value = StringInterner.weakIntern( > ((Text)field.getFirstChild()).getData()); > {code} > With this behavior, following steps will be problematic: > 1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key) > 2. User try to get "hadoop.key", cannot get "value" > 3. Serialize/deserialize configuration (Like what did in MR) > 4. User try to get "hadoop.key", can get "value", which will make > inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
Wangda Tan created HADOOP-10625: --- Summary: Configuration: names should be trimmed when putting/getting to properties Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if ("name".equals(field.getTagName()) && field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if ("value".equals(field.getTagName()) && field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set " hadoop.key=value" (with a space before hadoop.key) 2. User try to get "hadoop.key", cannot get "value" 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get "hadoop.key", can get "value", which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005481#comment-14005481 ] Yi Liu commented on HADOOP-10603: - Thanks Charles for good comments. I'm refining the patch for Andrew's comments, will respond you later and also want to address your comments in the new patch :-) > Crypto input and output streams implementing Hadoop stream interfaces > - > > Key: HADOOP-10603 > URL: https://issues.apache.org/jira/browse/HADOOP-10603 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Alejandro Abdelnur >Assignee: Yi Liu > Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) > > Attachments: CryptoInputStream.java, CryptoOutputStream.java, > HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, > HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, > HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch > > > A common set of Crypto Input/Output streams. They would be used by > CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. > Note we cannot use the JDK Cipher Input/Output streams directly because we > need to support the additional interfaces that the Hadoop FileSystem streams > implement (Seekable, PositionedReadable, ByteBufferReadable, > HasFileDescriptor, CanSetDropBehind, CanSetReadahead, > HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HADOOP-10603: -- Attachment: CryptoOutputStream.java CryptoInputStream.java > Crypto input and output streams implementing Hadoop stream interfaces > - > > Key: HADOOP-10603 > URL: https://issues.apache.org/jira/browse/HADOOP-10603 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Alejandro Abdelnur >Assignee: Yi Liu > Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) > > Attachments: CryptoInputStream.java, CryptoOutputStream.java, > HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, > HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, > HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch > > > A common set of Crypto Input/Output streams. They would be used by > CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. > Note we cannot use the JDK Cipher Input/Output streams directly because we > need to support the additional interfaces that the Hadoop FileSystem streams > implement (Seekable, PositionedReadable, ByteBufferReadable, > HasFileDescriptor, CanSetDropBehind, CanSetReadahead, > HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005400#comment-14005400 ] Charles Lamb commented on HADOOP-10603: --- Hi Yi, Good work so far. I took your latest patch and incorporated it into my sandbox and got my unit tests running with it. I have also made some edits to CryptoInputStream and CryptoOutputStream. I have attached the whole file for those two rather than diffs. CryptoFactory.java Perhaps rename this to Crypto. getEncryptor/getDecryptor should also declare "throws GeneralSecurityException" Encryptor.java encrypt should declare throws GeneralSecurityException decl for encrypt > 80 chars Consider making this interface an inner class of Crypto (aka CryptoFactory). Remind me again why encrypt/decrypt don't take a position argument? I wonder if, in general, we'll also want byte[] overloadings of the methods (as well as BB) for encrypt()/decrypt(). Decryptor.java decrypt should throw GeneralSecurityException The decl for decrypt > 80 chars Consider making this interface a subclass of Crypto (aka CryptoFactory). JCEAESCTRCryptoFactory.java This file needs an apache license header Perhaps rename it to JCEAESCTRCrypto.java getDescryptor/getEncryptor should throw GeneralSecurityException JCEAESCTRDecryptor.java ctor should throw GeneralSecurityException instead of RTException decrypt should throw GeneralSecurityException JCEAESCTREncryptor.java ctor should throw GeneralSecurityException instead of RTException encrypt should throw GeneralSecurityException CryptoUtils.java put a newline after "public class CryptoUtils {" Could calIV be renamed to calcIV? CryptoFSDataOutputStream.java Why is fsOut needed? Why can't you just reference out for (e.g.) getPos()? CryptoInputStream.java You'll need a getWrappedStream() method. Why 8192? Should this be moved to a static final int CONSTANT? IWBNI the name of the interface that a particular method is implementing were put in a comment before the @Override. For instance, // PositionedRead @Override public int read(long position ...) IWBNI all of the methods for a particular interface were grouped together in the code. In read(byte[], int, int), isn't the if (!usingByteBufferRead) I am worried that throwing and catching UnsupportedOperationException will be expensive. It seems very likely that for any particular stream, the same byte buffer will be passed in for the life of the stream. That means that for every call to read(...) there is potential for the UnsupportedOperationException to be thrown. That will be expensive. Perhaps keep a piece of state in the stream that gets set on the first time through indicating whether the BB is readable or not. Or keep a reference to the BB along with a bool. If the reference changes (on the off chance that the caller switched BBs for the same stream), then you can redetermine whether read is supported or not. In readFully, you could simplify the implementation by just calling into read(long, byte[]...), like this: @Override // PositionedReadable public void readFully(long position, byte[] buffer, int offset, int length) throws IOException { int nread = 0; while (nread < length) { int nbytes = read(position + nread, buffer, offset + nread, length - nread); if (nbytes < 0) { throw new EOFException("End of file reached before reading fully."); } nread += nbytes; } } That way you can let read(long...) do all the unwinding of the seek position. In seek(), you can do a check for forward == 0 and return immediately, thus saving the two calls to position() in the noop case. Ditto skip(). I noticed that you implemented read(ByteBufferPool), but not releaseBuffer(BB). Is that because you didn't have time (it's ok if that's the case, I'm just wondering why one and not the other)? CryptoOutputStream.java You'll need a getWrappedStream() method. > Crypto input and output streams implementing Hadoop stream interfaces > - > > Key: HADOOP-10603 > URL: https://issues.apache.org/jira/browse/HADOOP-10603 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Alejandro Abdelnur >Assignee: Yi Liu > Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) > > Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, > HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, > HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, > HADOOP-10603.patch > > > A common set of Crypto Input/Output streams. They would be used by > CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. > Note we cannot use the JD
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005242#comment-14005242 ] Mark Grover commented on HADOOP-9902: - Great! Yeah, sounds good to me and in my personal opinion, Bigtop will be ok with expanding the definition. Just let us know when you make that change and what release it would show up in:-) And, we don't use HADOOP_IDENT_STR, so no objections from Bigtop side there. Let me (or d...@bigtop.apache.org) know if you need anything else. Thank you! > Shell script rewrite > > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number"
[ https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005209#comment-14005209 ] Owen O'Malley commented on HADOOP-10611: I disagree on this one. There is a lot of value in having semantics behind the key version. For example, the MapReduce task ids used to be randomly generated. That was easy, but it was a pain in the tail to figure out which tasks were related to which job. > KeyVersion name should not be assumed to be the 'key name @ the version > number" > --- > > Key: HADOOP-10611 > URL: https://issues.apache.org/jira/browse/HADOOP-10611 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > > The KeyProvider public API should treat keyversion name as an opaque value. > Same for the KMS client/server. > Methods like {{KeyProvider#buildVersionName()}} and > {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005196#comment-14005196 ] Allen Wittenauer commented on HADOOP-9902: -- That's very helpful! (Especially since that was going to be the next place I looked since I just happen to have it cloned from git on my dev machine it's going to be one of the first big tests I do as I work towards a commit-able patch. :D ) Bigtop looks like it is doing what I would expect: setting it for Hadoop, but not using it directly. Which seems to indicate that, at least as far as Bigtop is concerned, we could expand the definition beyond "it must be a user". Hadoop also uses HADOOP_IDENT_STR as the setting for the Java hadoop.id.str property. But I can't find a single place where this property is used. IIRC, it was used in ancient times for logging and/or display, but if we don't need the property set anymore because we've gotten wiser, I'd like to just yank that property completely. > Shell script rewrite > > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005174#comment-14005174 ] Mark Grover commented on HADOOP-9902: - Hi Alan, Good point. In Bigtop, where we create RPM and DEB packages for hadoop and bundle it into our Bigtop distribution, we do rely on this property. And, looking at the code, it looks like we set that to be a user (hdfs user in our case). Here are the references: These get used in the scripts we deploy using puppet for our integration testing: https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-env.sh#L78 https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-hdfs#L20 This gets used in the default configuration for our secure clusters for integration testing: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/conf.secure/hadoop-env.sh#L56 This gets used in the init script that starts the datanode services: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hadoop-hdfs-datanode.svc#L39 And, this gets used to set certain environment variables before starting various HDFS services: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hdfs.default#L20 Hope that helps but please let me know if you need any further info. > Shell script rewrite > > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005165#comment-14005165 ] Allen Wittenauer commented on HADOOP-9902: -- Ran across an interesting discrepancy. hadoop-env.sh says: {code} # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER {code} This implies that could be something that isn't a user. However... {code} chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR {code} ... we clearly have that assumption. Since the chown has already been removed from the new code, this problem goes away. But should we explicitly state that HADOOP_IDENT_STRING needs to be a user? Is anyone aware of anything else that uses this outside of the Hadoop shell scripts? > Shell script rewrite > > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004967#comment-14004967 ] Allen Wittenauer commented on HADOOP-9902: -- {code} CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/ahs-config/log4j.properties ... CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/timelineserver-config/log4j.properties {code} The timeline server made more custom (and likely equally undocumented) log4j.properties locations. Needless to say, that's going away too just like their rm-config and nm-config brethren. > Shell script rewrite > > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs
[ https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004855#comment-14004855 ] Uma Maheswara Rao G commented on HADOOP-10561: -- Moved as top level Jira as HDFS-2006 branch merged to trunk! > Copy command with preserve option should handle Xattrs > -- > > Key: HADOOP-10561 > URL: https://issues.apache.org/jira/browse/HADOOP-10561 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Yi Liu > > The design docs for Xattrs stated that we handle preserve options with copy > commands > From doc: > Preserve option of commands like “cp -p” shell command and “distcp -p” should > work on XAttrs. > In the case of source fs supports XAttrs but target fs does not support, > XAttrs will be ignored > with warning message -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs
[ https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-10561: - Affects Version/s: (was: HDFS XAttrs (HDFS-2006)) 3.0.0 > Copy command with preserve option should handle Xattrs > -- > > Key: HADOOP-10561 > URL: https://issues.apache.org/jira/browse/HADOOP-10561 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Yi Liu > > The design docs for Xattrs stated that we handle preserve options with copy > commands > From doc: > Preserve option of commands like “cp -p” shell command and “distcp -p” should > work on XAttrs. > In the case of source fs supports XAttrs but target fs does not support, > XAttrs will be ignored > with warning message -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs
[ https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-10561: - Issue Type: Bug (was: Sub-task) Parent: (was: HADOOP-10514) > Copy command with preserve option should handle Xattrs > -- > > Key: HADOOP-10561 > URL: https://issues.apache.org/jira/browse/HADOOP-10561 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Yi Liu > > The design docs for Xattrs stated that we handle preserve options with copy > commands > From doc: > Preserve option of commands like “cp -p” shell command and “distcp -p” should > work on XAttrs. > In the case of source fs supports XAttrs but target fs does not support, > XAttrs will be ignored > with warning message -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs
[ https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-10561: - Issue Type: Improvement (was: Bug) > Copy command with preserve option should handle Xattrs > -- > > Key: HADOOP-10561 > URL: https://issues.apache.org/jira/browse/HADOOP-10561 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Yi Liu > > The design docs for Xattrs stated that we handle preserve options with copy > commands > From doc: > Preserve option of commands like “cp -p” shell command and “distcp -p” should > work on XAttrs. > In the case of source fs supports XAttrs but target fs does not support, > XAttrs will be ignored > with warning message -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10608) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HADOOP-10608: - Hadoop Flags: Reviewed +1 patch looks good. > Support incremental data copy in DistCp > --- > > Key: HADOOP-10608 > URL: https://issues.apache.org/jira/browse/HADOOP-10608 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch > > > Currently when doing distcp with -update option, for two files with the same > file names but with different file length or checksum, we overwrite the whole > file. It will be good if we can detect the case where (sourceFile = > targetFile + appended_data), and only transfer the appended data segment to > the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10621) Remove CRLF for xattr value base64 encoding for better display.
[ https://issues.apache.org/jira/browse/HADOOP-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-10621: - Issue Type: Sub-task (was: Improvement) Parent: HADOOP-10514 > Remove CRLF for xattr value base64 encoding for better display. > --- > > Key: HADOOP-10621 > URL: https://issues.apache.org/jira/browse/HADOOP-10621 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: HDFS XAttrs (HDFS-2006) >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Fix For: HDFS XAttrs (HDFS-2006) > > Attachments: HDFS-6426.patch > > > {{Base64.encodeBase64String(value)}} encodes binary data using the base64 > algorithm into 76 character blocks separated by CRLF. > In fs shell, xattrs display like: > {code} > # file: /user > user.a1=0sMTIz > user.a2=0sMTIzNDU2 > user.a3=0sMTIzNDU2 > {code} > We don't need multiple line and CRLF for xattr value, and we can use: > {code} > Base64 base64 = new Base64(0); > base64.encodeToString(value); > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004498#comment-14004498 ] Yi Liu commented on HADOOP-10603: - Andrew, thanks for your detailed review. Although they are based on a slightly old version, but most are also valid for latest patch :-). HADOOP-10617 is the common side test cases for crypto streams, since we already have lots of test cases and still need to increase and it's a bit large if merging to this, I made it as separate JIRA. Sure, according to your suggestion, I will merge it to this JIRA. Following is my response to your comments, and I will update the patch later. {quote} Need class javadoc and interface annotations on all new classes Need "" to actually line break in javadoc Some tab characters present {quote} I will update them. {quote} s/mod/mode What does "calIV" mean? Javadoc here would be nice. calIV would be simpler if we used ByteBuffer.wrap and getLong. I think right now, we also need to cast each value to a long before shifting, else it only works up to an int. Would be good to unit test this function. {quote} Right, I will update them. {quote} Could you define the term "block" in the #encrypt javadoc? {quote} It was a wrong word and should be “buffer”. Already updated it in latest patch. {quote} I don't understand the reinit conditions, do you mind explaining this a bit? The javadoc for Cipher#update indicates that it always fully reads the input buffer, so is the issue that the cipher sometimes doesn't flush all the input to the output buffer? {quote} Andrew, I agree with you. The javadoc for Cipher#update indicates that it always fully reads the input buffer and decrypt all input data. This will be always correct for CTR mode, for some of other modes input data may be buffered if requested padding (CTR doesn’t need padding). Charles has concern about maybe some custom JCE provider implementation can’t decrypt all data for CTR mode using {{Cipher#update}}, so I add the reinit conditions, and I think if that specific provider can’t decrypt all input data of {{Cipher#update}} for CTR mode, that should be a bug of that provider since it doesn't follow the definition of {{Cipher#update}}. {quote} If this API only accepts direct ByteBuffers, we should Precondition check that in the implementation {quote} I’m not sure we have this restriction. Java heap byteBuffer is also OK. Direct ByteBuffer is more efficient (no copy) when the cipher provider is native code and using JNI. I will add if you prefer. {quote} Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, not {{#ShortBufferException}}. I also don't see this being thrown because we wrap everything in an IOException. {quote} Right, I will revise this. {quote} How was the default buffer size of 8KB chosen? This should probably be a new configuration parameter, or respect io.file.buffer.size. {quote} OK. I will add configuration parameter for the default buffer size. {quote} Potential for int overflow in {{#write}} where we check {{off+len < 0}}. I also find this if statement hard to parse, would prefer if it were expanded. {quote} OK. I will expand them in next patch. {quote} Is the {{16}} in {{updateEncryptor}} something that should be hard-coded? Maybe pull it out into a constant and javadoc why it's 16. I'm curious if this is dependent on the Encryptor implementation. {quote} Let’s pull it out into variable. 16bytes is 128bits, and it’s in definition of AES: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard. Let’s define it as a configuration parameter, since other algorithm may have different block size, although we use AES. {quote} We need to be careful with direct BBs, since they don't trigger GC. We should be freeing them manually when the stream is closed, or pooling them somehow for reuse. {quote} Good point. For pooling them, maybe they are created with different buffer size and not suitable in pool? So I will add freeing them manually when the stream is closed. {quote} • In {{#process}}, we flip the inBuf, then if there's no data we just return. Shouldn't we restore inBuf to its previous padded state first? Also, IIUC {{inBuffer.remaining()}} cannot be less than padding since the inBuffer position does not move backwards, so I'd prefer to see a Precondition check and {{inBuf.remaining() == padding)}}. Test case would be nice if I'm right about this. {quote} You are right, there is a potential issue. I will fix it and add test case. Since in our code, only when we have input data then we go to {{#process}}, so {{inBuffer}} should have real data. But from view of code logic we should handle like you said. And agree we have a precondition check. {quote} Rename {{#process}} to {{#encrypt}}? {quote} Good, let’s do that. {quote} Do we need the special-case logic with tmpBuf? It looks lik
[jira] [Commented] (HADOOP-10608) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004479#comment-14004479 ] Hadoop QA commented on HADOOP-10608: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645910/HADOOP-10608.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1280 javac compiler warnings (more than the trunk's current 1278 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-distcp: org.apache.hadoop.fs.TestFilterFileSystem org.apache.hadoop.fs.TestHarFileSystem org.apache.hadoop.hdfs.TestDistributedFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-distcp.html Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//console This message is automatically generated. > Support incremental data copy in DistCp > --- > > Key: HADOOP-10608 > URL: https://issues.apache.org/jira/browse/HADOOP-10608 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch > > > Currently when doing distcp with -update option, for two files with the same > file names but with different file length or checksum, we overwrite the whole > file. It will be good if we can detect the case where (sourceFile = > targetFile + appended_data), and only transfer the appended data segment to > the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err
[ https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated HADOOP-10624: Attachment: HADOOP-10624-pnative.001.patch submit the first version patch. > Fix some minors typo and add more test cases for hadoop_err > --- > > Key: HADOOP-10624 > URL: https://issues.apache.org/jira/browse/HADOOP-10624 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: HADOOP-10388 >Reporter: Wenwu Peng >Assignee: Wenwu Peng > Attachments: HADOOP-10624-pnative.001.patch > > > Changes: > 1. Add more test cases to cover method hadoop_lerr_alloc and > hadoop_uverr_alloc > 2. Fix typo as following: > 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in > hadoop_err.h > 2) Change OutOfMemory to OutOfMemoryException to consistent with other > Exception in hadoop_err.c > 3) Change DBUG to DEBUG in messenger.c > 4) Change DBUG to DEBUG in reactor.c -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err
Wenwu Peng created HADOOP-10624: --- Summary: Fix some minors typo and add more test cases for hadoop_err Key: HADOOP-10624 URL: https://issues.apache.org/jira/browse/HADOOP-10624 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Changes: 1. Add more test cases to cover method hadoop_lerr_alloc and hadoop_uverr_alloc 2. Fix typo as following: 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in hadoop_err.h 2) Change OutOfMemory to OutOfMemoryException to consistent with other Exception in hadoop_err.c 3) Change DBUG to DEBUG in messenger.c 4) Change DBUG to DEBUG in reactor.c -- This message was sent by Atlassian JIRA (v6.2#6252)