[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735723#comment-13735723 ] Andrew Wang commented on HDFS-4949: --- Hey Arun, thanks for taking a look! Tying in YARN would definitely be great. There's half a hope that we can jump right from a prototype naive scheme to using YARN directly, but our resource management team doesn't have time in the near term to make this happen. I definitely want our abstractions to be as similar as possible though to ease a future transition; your input there is appreciated. As to your other points: 1. The main reason we added auto-caching of new files was actually for Hive. My understanding is that Hive users can drop new files into a Hive partition directory without notifying the Hive metastore, e.g. via the fs shell. Since we'd like to provide the abstraction of caching higher-level abstractions like Hive partitions or tables, this auto-caching is necessary. 2. We were planning on extending the existing getFileBlockLocations API (which takes a Path, offset, and length) to also indicate which replicas of the returned blocks are cached. This should satisfy the needs of framework schedulers like MR or Impala. At read time, we'll also provide per-stream statistics of the number of bytes read remotely vs. local disk vs. local memory. Remote memory reads are also on our mind, but will likely be a per-stream or per-client config option added later. Suresh, to partially address your questions, Colin's going to put pools into the patch at HDFS-5052, and he's also been working on buffer-oriented access at HDFS-4953. Thanks for your comments on the subtasks thus far. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735643#comment-13735643 ] Arun C Murthy commented on HDFS-4949: - [~andrew.wang] overall it's looks great, some more questions: # I'm not sure you want to automatically add new files in a directory to the cache, it seems a higher-level system (Hive, Impala, HCat) are in better position. Not doing this automatically simplifies cache mgmt, quota mgmt etc. # Can you please provide details on the read apis? For the Hive/MR/Pig use case I'd like to see a new open(Path, offset, length) which returns an indicator for whether the block is cached or not. This, for e.g., would be used by the RecordReader to read the split. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4994) Audit log getContentSummary() calls
[ https://issues.apache.org/jira/browse/HDFS-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735636#comment-13735636 ] Hadoop QA commented on HDFS-4994: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597201/HDFS-4994.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4798//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4798//console This message is automatically generated. > Audit log getContentSummary() calls > --- > > Key: HDFS-4994 > URL: https://issues.apache.org/jira/browse/HDFS-4994 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.3.0 >Reporter: Kihwal Lee >Assignee: Robert Parker >Priority: Minor > Labels: newbie > Attachments: HDFS-4994_branch-0.23.patch, HDFS-4994.patch > > > Currently there getContentSummary() calls are not logged anywhere. It should > be logged in the audit log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735632#comment-13735632 ] Arun C Murthy commented on HDFS-4949: - bq. As a meta-point, I think much of the remaining resource management design can wait until after we get the initial end-to-end implementation going. Makes sense. I, for one, would volunteer to help you guys do resource-management directly via YARN rather than go the route of inventing half of YARN RM within HDFS. It would benefit both HDFS (simpler, plus ability to use memory dynamically between applications and for caching) & YARN (more robust for a diverse set of applications). Any takers? Thanks. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735594#comment-13735594 ] Hadoop QA commented on HDFS-4504: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597191/HDFS-4504.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4797//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4797//console This message is automatically generated. > DFSOutputStream#close doesn't always release resources (such as leases) > --- > > Key: HDFS-4504 > URL: https://issues.apache.org/jira/browse/HDFS-4504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, > HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch > > > {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One > example is if there is a pipeline error and then pipeline recovery fails. > Unfortunately, in this case, some of the resources used by the > {{DFSOutputStream}} are leaked. One particularly important resource is file > leases. > So it's possible for a long-lived HDFS client, such as Flume, to write many > blocks to a file, but then fail to close it. Unfortunately, the > {{LeaseRenewerThread}} inside the client will continue to renew the lease for > the "undead" file. Future attempts to close the file will just rethrow the > previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735587#comment-13735587 ] Colin Patrick McCabe commented on HDFS-5051: {code} + public static final String DFS_CACHEREPORT_INTERVAL_MSEC_KEY = "dfs.cachereport.intervalMsec"; + public static final longDFS_CACHEREPORT_INTERVAL_MSEC_DEFAULT = 60 * 60 * 1000; {code} I'm not sure a cache report every hour is going to be ideal. {code} +ArrayList blocksAsList = new ArrayList(blocks.length); +for (int i=0; i Propagate cache status information from the DataNode to the NameNode > > > Key: HDFS-5051 > URL: https://issues.apache.org/jira/browse/HDFS-5051 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Colin Patrick McCabe >Assignee: Andrew Wang > Attachments: hdfs-5051-1.patch > > > The DataNode needs to inform the NameNode of its current cache state. Let's > wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5082) Move the version info of zookeeper test dependency to hadoop-project/pom
[ https://issues.apache.org/jira/browse/HDFS-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735495#comment-13735495 ] Karthik Kambatla commented on HDFS-5082: Didn't include any tests as this is just a pom change. > Move the version info of zookeeper test dependency to hadoop-project/pom > > > Key: HDFS-5082 > URL: https://issues.apache.org/jira/browse/HDFS-5082 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Minor > Attachments: hdfs-5082-1.patch > > > As different projects (HDFS, YARN) depend on zookeeper, it is better to keep > the version information in hadoop-project/pom.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5029) Token operations should not block read operations
[ https://issues.apache.org/jira/browse/HDFS-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735477#comment-13735477 ] Hadoop QA commented on HDFS-5029: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597172/HDFS-5029.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4796//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4796//console This message is automatically generated. > Token operations should not block read operations > - > > Key: HDFS-5029 > URL: https://issues.apache.org/jira/browse/HDFS-5029 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-5029.branch-23.patch, HDFS-5029.patch, > HDFS-5029.patch > > > Token operations unnecessarily obtain the write lock on the namespace. Edits > for token operations are independent of edits for other namespace write > operations, and the edits have no ordering requirement with respect to > namespace changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4994) Audit log getContentSummary() calls
[ https://issues.apache.org/jira/browse/HDFS-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated HDFS-4994: Assignee: Robert Parker Target Version/s: 3.0.0, 2.3.0, 0.23.10 Status: Patch Available (was: Open) > Audit log getContentSummary() calls > --- > > Key: HDFS-4994 > URL: https://issues.apache.org/jira/browse/HDFS-4994 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.3.0 >Reporter: Kihwal Lee >Assignee: Robert Parker >Priority: Minor > Labels: newbie > Attachments: HDFS-4994_branch-0.23.patch, HDFS-4994.patch > > > Currently there getContentSummary() calls are not logged anywhere. It should > be logged in the audit log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4994) Audit log getContentSummary() calls
[ https://issues.apache.org/jira/browse/HDFS-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated HDFS-4994: Attachment: HDFS-4994.patch HDFS-4994_branch-0.23.patch > Audit log getContentSummary() calls > --- > > Key: HDFS-4994 > URL: https://issues.apache.org/jira/browse/HDFS-4994 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.3.0 >Reporter: Kihwal Lee >Priority: Minor > Labels: newbie > Attachments: HDFS-4994_branch-0.23.patch, HDFS-4994.patch > > > Currently there getContentSummary() calls are not logged anywhere. It should > be logged in the audit log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4504: --- Attachment: HDFS-4504.009.patch in some cases, {{DFSOutputStream#close}} and {{DFSOutputStream#lastException}} will be set by the {{DataStreamer}}, prior to {{DFSOutputStream#close}} being called. In those cases, we need to throw an exception from close prior to clearing the exception. > DFSOutputStream#close doesn't always release resources (such as leases) > --- > > Key: HDFS-4504 > URL: https://issues.apache.org/jira/browse/HDFS-4504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, > HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch > > > {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One > example is if there is a pipeline error and then pipeline recovery fails. > Unfortunately, in this case, some of the resources used by the > {{DFSOutputStream}} are leaked. One particularly important resource is file > leases. > So it's possible for a long-lived HDFS client, such as Flume, to write many > blocks to a file, but then fail to close it. Unfortunately, the > {{LeaseRenewerThread}} inside the client will continue to renew the lease for > the "undead" file. Future attempts to close the file will just rethrow the > previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3245) Add metrics and web UI for cluster version summary
[ https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735387#comment-13735387 ] Ravi Prakash commented on HDFS-3245: Is any one planning to work on this? If not, I may take it up. > Add metrics and web UI for cluster version summary > -- > > Key: HDFS-3245 > URL: https://issues.apache.org/jira/browse/HDFS-3245 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon > > With the introduction of protocol compatibility, once HDFS-2983 is committed, > we have the possibility that different nodes in a cluster are running > different software versions. To aid operators, we should add the ability to > summarize the status of versions in the cluster, so they can easily determine > whether a rolling upgrade is in progress or if some nodes "missed" an upgrade > (eg maybe they were out of service when the software was updated) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735382#comment-13735382 ] Suresh Srinivas commented on HDFS-4949: --- bq. As a meta-point, I think much of the remaining resource management design can wait until after we get the initial end-to-end implementation going. +1 for this. There are many loose ends to be tied and details to be figured out in the design. But the basic implementation could start right away. Some things that we should get to sooner than later: - Pool abstraction and making sure all the APIs are using them (including cache creation and deletion) - Some details related to how the stream oriented APIs change to buffer oriented access. The real quota management, counting common cached data to different pools etc. can be revisited later. Will take a look at the updated doc soon. Thanks Andrew. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735375#comment-13735375 ] Hadoop QA commented on HDFS-4329: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597153/4329.trunk.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4795//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4795//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4795//console This message is automatically generated. > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk.v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4680) Audit logging of delegation tokens for MR tracing
[ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735331#comment-13735331 ] Daryn Sharp commented on HDFS-4680: --- I also noticed that ADTSM is taking the md5sum penalty for every token generation regardless of the conf setting. > Audit logging of delegation tokens for MR tracing > - > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Affects Versions: 2.0.3-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. > creation and deletion of files. This is useful for after-the-fact root cause > analysis and security. However, logging merely the username is insufficient > for many usecases. For instance, it is common for a single user to run > multiple MapReduce jobs (I believe this is the case with Hive). In this > scenario, given a particular audit log entry, it is difficult to trace it > back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already > pass a {{clientName}} as a parameter in many RPC calls, so this would > essentially make it standardized. MR tasks could then set this field to the > job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN > RPC format, which would then be audit logged. This has standalone benefits > outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients > are who they claim they are. Doing this securely requires the JobTracker to > sign MR task attempts, and then having the NN verify this signature. However, > this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-4949: -- Attachment: caching-design-doc-2013-08-09.pdf Suresh, thanks for posting your notes. Attached is a revised design doc that beefs up the resource management / user quotas section, as well as addressing your other smaller points. As a meta-point, I think much of the remaining resource management design can wait until after we get the initial end-to-end implementation going. I think it's reasonable for the first iteration to do something simple like "superuser only" or user quotas, then we layer on the complexities of pools, priorities, ACLs, min/max/share, and failure cases afterwards. It's good to get the API roughly right so we code with foresight, but I don't see us getting around to implementing pools for at least a month or two. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5029) Token operations should not block read operations
[ https://issues.apache.org/jira/browse/HDFS-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-5029: -- Attachment: HDFS-5029.patch Resubmitting patch since failures do not appear to be related - I cannot reproduce them. > Token operations should not block read operations > - > > Key: HDFS-5029 > URL: https://issues.apache.org/jira/browse/HDFS-5029 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-5029.branch-23.patch, HDFS-5029.patch, > HDFS-5029.patch > > > Token operations unnecessarily obtain the write lock on the namespace. Edits > for token operations are independent of edits for other namespace write > operations, and the edits have no ordering requirement with respect to > namespace changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5086) Support RPCSEC_GSS authentication in NFSv3 gateway
Brandon Li created HDFS-5086: Summary: Support RPCSEC_GSS authentication in NFSv3 gateway Key: HDFS-5086 URL: https://issues.apache.org/jira/browse/HDFS-5086 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5085) Support Kerberos authentication in NFSv3 gateway
Brandon Li created HDFS-5085: Summary: Support Kerberos authentication in NFSv3 gateway Key: HDFS-5085 URL: https://issues.apache.org/jira/browse/HDFS-5085 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5084) Add namespace ID and snapshot ID into fileHandle to support Federation and Snapshot
Brandon Li created HDFS-5084: Summary: Add namespace ID and snapshot ID into fileHandle to support Federation and Snapshot Key: HDFS-5084 URL: https://issues.apache.org/jira/browse/HDFS-5084 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735216#comment-13735216 ] Daryn Sharp commented on HDFS-4329: --- Looks good. Please add few more tests to exercise all the path variants: relative, absolute, scheme-qualified. We've had problems with all three not always working correctly. * relative: "a path with/whitespaces in directories" * absolute - you did that * scheme-qualified: "NAMENODE/a path with/whitespaces in directories" Also test that all 3 cases work the same with globs. Ie. try to list "/a path*" > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk.v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active
[ https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735195#comment-13735195 ] Hadoop QA commented on HDFS-5080: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597118/HDFS-5080.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4794//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4794//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4794//console This message is automatically generated. > BootstrapStandby not working with QJM when the existing NN is active > > > Key: HDFS-5080 > URL: https://issues.apache.org/jira/browse/HDFS-5080 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5080.000.patch > > > Currently when QJM is used, running BootstrapStandby while the existing NN is > active can get the following exception: > {code} > FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 > from the configured shared edits storage. Please copy these logs into the > shared edits storage or call saveNamespace on the active node. > Error: Gap in transactions. Expected to be able to read up until at least > txid 6175405 but unable to find any edit logs containing txid 6175405 > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 6175405 but unable to find any edit logs containing txid > 6175405 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) > at > org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229) > {code} > Looks like the cause of the exception is that, when the active NN is queries > by BootstrapStandby about the last written transaction ID, the in-progress > edit log segment is included. However, when journal nodes are asked about the > last written transaction ID, in-progress edit log is excluded. This causes > BootstrapStandby#checkLogsAvailableForRead to complain gaps. > To fix this, we can either let journal nodes take into account the > in-progress editlog, or let active NN exclude the in-progress edit log > segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735189#comment-13735189 ] Hudson commented on HDFS-4993: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4237 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4237/]) HDFS-4993. Fsck can fail if a file is renamed or deleted. Contributed by Robert Parker. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512451) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Affects Version/s: (was: 2.0.2-alpha) 3.0.0 Status: Patch Available (was: Open) > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk.v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Attachment: (was: 4329.trunk_v2.patch) > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk.v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Attachment: 4329.trunk.v2.patch > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk.v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Attachment: (was: 4329.trunk.patch2) > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Attachment: 4329.trunk.patch2 4329.branch-2.patch It's been a month since I posted this patch, so I checked to make sure that it has not gone stale. The 23 patch is fine, and tests are still passing. The trunk patch still works too, but I am re-posting it because the previous one had paths to the files that were specific to my own installation; the new patch does not have this issue. I am also posting a patch for branch 2, which is a copy of the trunk patch; tests pass on branch 2 too. > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name
[ https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristina L. Abad updated HDFS-4329: --- Attachment: 4329.trunk_v2.patch > DFSShell issues with directories with spaces in name > > > Key: HDFS-4329 > URL: https://issues.apache.org/jira/browse/HDFS-4329 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Cristina L. Abad > Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, > 4329.trunk.patch, 4329.trunk_v2.patch > > > This bug was discovered by Casey Ching. > The command {{dfs -put /foo/hello.txt dir}} is supposed to create > {{dir/hello.txt}} on HDFS. It doesn't work right if "dir" has a space in it: > {code} > [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat' > [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat' > [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd' > cat: `space cat/motd': No such file or directory > [adi@haus01 ~]$ hdfs dfs -ls space\* > Found 1 items > -rw-r--r-- 2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd > [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd' > Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64) > ... > {code} > Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded > directory name, turning {{%20}} into {{%2520}}. It does the same thing with > space: > {code} > [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo' > [adi@haus01 ~]$ hdfs dfs -ls 'space cat' > Found 1 items > -rw-r--r-- 2 adi supergroup 0 2012-12-20 11:36 space%20cat/foo > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4993: - Resolution: Fixed Fix Version/s: 2.1.1-beta 0.23.10 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the patch, Rob. I've committed this to branch-0.23, branch-2, branch-2.1-beta and trunk. > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5082) Move the version info of zookeeper test dependency to hadoop-project/pom
[ https://issues.apache.org/jira/browse/HDFS-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735158#comment-13735158 ] Hadoop QA commented on HDFS-5082: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597115/hdfs-5082-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4793//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4793//console This message is automatically generated. > Move the version info of zookeeper test dependency to hadoop-project/pom > > > Key: HDFS-5082 > URL: https://issues.apache.org/jira/browse/HDFS-5082 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Minor > Attachments: hdfs-5082-1.patch > > > As different projects (HDFS, YARN) depend on zookeeper, it is better to keep > the version information in hadoop-project/pom.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735156#comment-13735156 ] Kihwal Lee commented on HDFS-4993: -- +1 the patch looks good to me. > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-5083) Update the HDFS compatibility version range
[ https://issues.apache.org/jira/browse/HDFS-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-5083: Assignee: Kihwal Lee > Update the HDFS compatibility version range > --- > > Key: HDFS-5083 > URL: https://issues.apache.org/jira/browse/HDFS-5083 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 2.3.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: HDFS-5083.branch-2.x.patch > > > Since we have made incompatible changes including RPCv9, the following needs > to be updated. For branch-2 and branch-2.*, I think it needs to be set to > 2.1.0. > DFS_NAMENODE_MIN_SUPPORTED_DATANODE_VERSION_DEFAULT > DFS_DATANODE_MIN_SUPPORTED_NAMENODE_VERSION_DEFAULT > The rpc change will make dn registration fail, so it won't get to the actual > version check, but this still needs to be set correctly, since we are aiming > 2.1.0 to be api/protocol stable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5083) Update the HDFS compatibility version range
[ https://issues.apache.org/jira/browse/HDFS-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-5083. -- Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Incompatible change,Reviewed Thanks for the review, Suresh. I've committed this to branch-2, branch-2.1-beta and branch-2.1.0-beta. > Update the HDFS compatibility version range > --- > > Key: HDFS-5083 > URL: https://issues.apache.org/jira/browse/HDFS-5083 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 2.3.0 >Reporter: Kihwal Lee >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: HDFS-5083.branch-2.x.patch > > > Since we have made incompatible changes including RPCv9, the following needs > to be updated. For branch-2 and branch-2.*, I think it needs to be set to > 2.1.0. > DFS_NAMENODE_MIN_SUPPORTED_DATANODE_VERSION_DEFAULT > DFS_DATANODE_MIN_SUPPORTED_NAMENODE_VERSION_DEFAULT > The rpc change will make dn registration fail, so it won't get to the actual > version check, but this still needs to be set correctly, since we are aiming > 2.1.0 to be api/protocol stable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.
[ https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735115#comment-13735115 ] Ravi Prakash commented on HDFS-5068: +1. LGTM. Thanks Konstantin! Just fyi, for other people trying this test, I had to set dfs.namenode.fs-limits.min-block-size to 16 in my hdfs-site.xml because BLOCK_SIZE = 16; in the code, otherwise the test would hang indefinitely. > Convert NNThroughputBenchmark to a Tool to allow generic options. > - > > Key: HDFS-5068 > URL: https://issues.apache.org/jira/browse/HDFS-5068 > Project: Hadoop HDFS > Issue Type: Improvement > Components: benchmarks >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: NNThBenchTool.patch > > > Currently NNThroughputBenchmark does not recognize generic options like > -conf, etc. A simple way to enable such functionality is to make it implement > Tool interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735109#comment-13735109 ] Hadoop QA commented on HDFS-4993: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597110/HDFS-4993.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4792//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4792//console This message is automatically generated. > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4680) Audit logging of delegation tokens for MR tracing
[ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735105#comment-13735105 ] Daryn Sharp commented on HDFS-4680: --- I made the mistake of looking at the raw patch instead of applying it. With the way you've done it, I think we may be able to simplify it. The instanceof for the default audit logger seems like it can/should be avoided. It appears you did this in part to avoid the performance hit of looking up the token identifier and its tracking id every time you log a message. We should probably think of a way to avoid that. Off the top of my head, conceptually it would be ideal if the connection knew the trackingId, and the audit logger would simply log it if not null. I'll think about it more today since I'm trying to contemplate how a forward lookup would be an easy drop-in in the future and if there would be any rolling upgrade issues. > Audit logging of delegation tokens for MR tracing > - > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Affects Versions: 2.0.3-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. > creation and deletion of files. This is useful for after-the-fact root cause > analysis and security. However, logging merely the username is insufficient > for many usecases. For instance, it is common for a single user to run > multiple MapReduce jobs (I believe this is the case with Hive). In this > scenario, given a particular audit log entry, it is difficult to trace it > back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already > pass a {{clientName}} as a parameter in many RPC calls, so this would > essentially make it standardized. MR tasks could then set this field to the > job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN > RPC format, which would then be audit logged. This has standalone benefits > outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients > are who they claim they are. Doing this securely requires the JobTracker to > sign MR task attempts, and then having the NN verify this signature. However, > this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5049) Add JNI mlock support
[ https://issues.apache.org/jira/browse/HDFS-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5049: --- Resolution: Fixed Target Version/s: HDFS-4949 Status: Resolved (was: Patch Available) > Add JNI mlock support > - > > Key: HDFS-5049 > URL: https://issues.apache.org/jira/browse/HDFS-5049 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Colin Patrick McCabe >Assignee: Andrew Wang > Attachments: hdfs-5049-1.patch > > > Add support for {{mlock}} and {{munlock}}, for use in caching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5083) Update the HDFS compatibility version range
[ https://issues.apache.org/jira/browse/HDFS-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735095#comment-13735095 ] Suresh Srinivas commented on HDFS-5083: --- +1 for the change. > Update the HDFS compatibility version range > --- > > Key: HDFS-5083 > URL: https://issues.apache.org/jira/browse/HDFS-5083 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 2.3.0 >Reporter: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5083.branch-2.x.patch > > > Since we have made incompatible changes including RPCv9, the following needs > to be updated. For branch-2 and branch-2.*, I think it needs to be set to > 2.1.0. > DFS_NAMENODE_MIN_SUPPORTED_DATANODE_VERSION_DEFAULT > DFS_DATANODE_MIN_SUPPORTED_NAMENODE_VERSION_DEFAULT > The rpc change will make dn registration fail, so it won't get to the actual > version check, but this still needs to be set correctly, since we are aiming > 2.1.0 to be api/protocol stable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735085#comment-13735085 ] Colin Patrick McCabe commented on HDFS-4504: The TestBalancerWithNodeGroup test timeout is HDFS-4376. > DFSOutputStream#close doesn't always release resources (such as leases) > --- > > Key: HDFS-4504 > URL: https://issues.apache.org/jira/browse/HDFS-4504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, > HDFS-4504.007.patch, HDFS-4504.008.patch > > > {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One > example is if there is a pipeline error and then pipeline recovery fails. > Unfortunately, in this case, some of the resources used by the > {{DFSOutputStream}} are leaked. One particularly important resource is file > leases. > So it's possible for a long-lived HDFS client, such as Flume, to write many > blocks to a file, but then fail to close it. Unfortunately, the > {{LeaseRenewerThread}} inside the client will continue to renew the lease for > the "undead" file. Future attempts to close the file will just rethrow the > previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5049) Add JNI mlock support
[ https://issues.apache.org/jira/browse/HDFS-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735086#comment-13735086 ] Colin Patrick McCabe commented on HDFS-5049: Since you did not touch the relevant code, my guess is that trunk fixed some findbugs warnings, and the change didn't get ported to our branch. > Add JNI mlock support > - > > Key: HDFS-5049 > URL: https://issues.apache.org/jira/browse/HDFS-5049 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Colin Patrick McCabe >Assignee: Andrew Wang > Attachments: hdfs-5049-1.patch > > > Add support for {{mlock}} and {{munlock}}, for use in caching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4680) Audit logging of delegation tokens for MR tracing
[ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735062#comment-13735062 ] Andrew Wang commented on HDFS-4680: --- I did that based on your previous review feedback: bq. It's costly to compute the md5sum for every single client connection. Store it in the DelegationTokenInformation when the token is created and query the dtsm during logging. Do you want to change this up? > Audit logging of delegation tokens for MR tracing > - > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Affects Versions: 2.0.3-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. > creation and deletion of files. This is useful for after-the-fact root cause > analysis and security. However, logging merely the username is insufficient > for many usecases. For instance, it is common for a single user to run > multiple MapReduce jobs (I believe this is the case with Hive). In this > scenario, given a particular audit log entry, it is difficult to trace it > back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already > pass a {{clientName}} as a parameter in many RPC calls, so this would > essentially make it standardized. MR tasks could then set this field to the > job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN > RPC format, which would then be audit logged. This has standalone benefits > outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients > are who they claim they are. Doing this securely requires the JobTracker to > sign MR task attempts, and then having the NN verify this signature. However, > this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active
[ https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5080: Status: Patch Available (was: Open) > BootstrapStandby not working with QJM when the existing NN is active > > > Key: HDFS-5080 > URL: https://issues.apache.org/jira/browse/HDFS-5080 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5080.000.patch > > > Currently when QJM is used, running BootstrapStandby while the existing NN is > active can get the following exception: > {code} > FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 > from the configured shared edits storage. Please copy these logs into the > shared edits storage or call saveNamespace on the active node. > Error: Gap in transactions. Expected to be able to read up until at least > txid 6175405 but unable to find any edit logs containing txid 6175405 > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 6175405 but unable to find any edit logs containing txid > 6175405 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) > at > org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229) > {code} > Looks like the cause of the exception is that, when the active NN is queries > by BootstrapStandby about the last written transaction ID, the in-progress > edit log segment is included. However, when journal nodes are asked about the > last written transaction ID, in-progress edit log is excluded. This causes > BootstrapStandby#checkLogsAvailableForRead to complain gaps. > To fix this, we can either let journal nodes take into account the > in-progress editlog, or let active NN exclude the in-progress edit log > segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5083) Update the HDFS compatibility version range
[ https://issues.apache.org/jira/browse/HDFS-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5083: - Attachment: HDFS-5083.branch-2.x.patch The patch is for branch-2, branch-2.1-beta and branch-2.1.0-beta. Trunk should still have 3.0.0-SNAPSHOT. > Update the HDFS compatibility version range > --- > > Key: HDFS-5083 > URL: https://issues.apache.org/jira/browse/HDFS-5083 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 2.3.0 >Reporter: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5083.branch-2.x.patch > > > Since we have made incompatible changes including RPCv9, the following needs > to be updated. For branch-2 and branch-2.*, I think it needs to be set to > 2.1.0. > DFS_NAMENODE_MIN_SUPPORTED_DATANODE_VERSION_DEFAULT > DFS_DATANODE_MIN_SUPPORTED_NAMENODE_VERSION_DEFAULT > The rpc change will make dn registration fail, so it won't get to the actual > version check, but this still needs to be set correctly, since we are aiming > 2.1.0 to be api/protocol stable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4680) Audit logging of delegation tokens for MR tracing
[ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735040#comment-13735040 ] Daryn Sharp commented on HDFS-4680: --- I just glanced over it. Question: if the trackingId is in the token, why is it extracted and stored in the DelegationTokenInformation? Currently that object tracks info about token idents that isn't in the token ident itself. I need to look at the linked jiras to get the big picture. > Audit logging of delegation tokens for MR tracing > - > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Affects Versions: 2.0.3-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. > creation and deletion of files. This is useful for after-the-fact root cause > analysis and security. However, logging merely the username is insufficient > for many usecases. For instance, it is common for a single user to run > multiple MapReduce jobs (I believe this is the case with Hive). In this > scenario, given a particular audit log entry, it is difficult to trace it > back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already > pass a {{clientName}} as a parameter in many RPC calls, so this would > essentially make it standardized. MR tasks could then set this field to the > job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN > RPC format, which would then be audit logged. This has standalone benefits > outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients > are who they claim they are. Doing this securely requires the JobTracker to > sign MR task attempts, and then having the NN verify this signature. However, > this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active
[ https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5080: Attachment: HDFS-5080.000.patch Initial patch for review. In the patch the journal node can include the highest txid of the in-progress edit log in the edit manifest, if the corresponding request is not for loading/reading editlog. The in-progress editlog segment will still be ignored if forReading is true. > BootstrapStandby not working with QJM when the existing NN is active > > > Key: HDFS-5080 > URL: https://issues.apache.org/jira/browse/HDFS-5080 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5080.000.patch > > > Currently when QJM is used, running BootstrapStandby while the existing NN is > active can get the following exception: > {code} > FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 > from the configured shared edits storage. Please copy these logs into the > shared edits storage or call saveNamespace on the active node. > Error: Gap in transactions. Expected to be able to read up until at least > txid 6175405 but unable to find any edit logs containing txid 6175405 > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 6175405 but unable to find any edit logs containing txid > 6175405 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) > at > org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229) > {code} > Looks like the cause of the exception is that, when the active NN is queries > by BootstrapStandby about the last written transaction ID, the in-progress > edit log segment is included. However, when journal nodes are asked about the > last written transaction ID, in-progress edit log is excluded. This causes > BootstrapStandby#checkLogsAvailableForRead to complain gaps. > To fix this, we can either let journal nodes take into account the > in-progress editlog, or let active NN exclude the in-progress edit log > segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4754) Add an API in the namenode to mark a datanode as stale
[ https://issues.apache.org/jira/browse/HDFS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735026#comment-13735026 ] Nicolas Liochon commented on HDFS-4754: --- Nick: I agree. I will provide a version with these points fixed. Others: any comment, or can it be committed with Nick's comments taken into account. > Add an API in the namenode to mark a datanode as stale > -- > > Key: HDFS-4754 > URL: https://issues.apache.org/jira/browse/HDFS-4754 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, namenode >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 3.0.0, 2.1.1-beta > > Attachments: 4754.v1.patch, 4754.v2.patch, 4754.v4.patch, > 4754.v4.patch > > > There is a detection of the stale datanodes in HDFS since HDFS-3703, with a > timeout, defaulted to 30s. > There are two reasons to add an API to mark a node as stale even if the > timeout is not yet reached: > 1) ZooKeeper can detect that a client is dead at any moment. So, for HBase, > we sometimes start the recovery before a node is marked staled. (even with > reasonable settings as: stale: 20s; HBase ZK timeout: 30s > 2) Some third parties could detect that a node is dead before the timeout, > hence saving us the cost of retrying. An example or such hw is Arista, > presented here by [~tsuna] > http://tsunanet.net/~tsuna/fsf-hbase-meetup-april13.pdf, and confirmed in > HBASE-6290. > As usual, even if the node is dead it can comeback before the 10 minutes > limit. So I would propose to set a timebound. The API would be > namenode.markStale(String ipAddress, int port, long durationInMs); > After durationInMs, the namenode would again rely only on its heartbeat to > decide. > Thoughts? > If there is no objections, and if nobody in the hdfs dev team has the time to > spend some time on it, I will give it a try for branch 2 & 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4680) Audit logging of delegation tokens for MR tracing
[ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735022#comment-13735022 ] Daryn Sharp commented on HDFS-4680: --- I'll try to review this afternoon. > Audit logging of delegation tokens for MR tracing > - > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Affects Versions: 2.0.3-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. > creation and deletion of files. This is useful for after-the-fact root cause > analysis and security. However, logging merely the username is insufficient > for many usecases. For instance, it is common for a single user to run > multiple MapReduce jobs (I believe this is the case with Hive). In this > scenario, given a particular audit log entry, it is difficult to trace it > back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already > pass a {{clientName}} as a parameter in many RPC calls, so this would > essentially make it standardized. MR tasks could then set this field to the > job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN > RPC format, which would then be audit logged. This has standalone benefits > outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients > are who they claim they are. Doing this securely requires the JobTracker to > sign MR task attempts, and then having the NN verify this signature. However, > this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5082) Move the version info of zookeeper test dependency to hadoop-project/pom
[ https://issues.apache.org/jira/browse/HDFS-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated HDFS-5082: --- Status: Patch Available (was: Open) > Move the version info of zookeeper test dependency to hadoop-project/pom > > > Key: HDFS-5082 > URL: https://issues.apache.org/jira/browse/HDFS-5082 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Minor > Attachments: hdfs-5082-1.patch > > > As different projects (HDFS, YARN) depend on zookeeper, it is better to keep > the version information in hadoop-project/pom.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5082) Move the version info of zookeeper test dependency to hadoop-project/pom
[ https://issues.apache.org/jira/browse/HDFS-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated HDFS-5082: --- Attachment: hdfs-5082-1.patch Straight-forward patch. > Move the version info of zookeeper test dependency to hadoop-project/pom > > > Key: HDFS-5082 > URL: https://issues.apache.org/jira/browse/HDFS-5082 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Minor > Attachments: hdfs-5082-1.patch > > > As different projects (HDFS, YARN) depend on zookeeper, it is better to keep > the version information in hadoop-project/pom.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5081) DistributedFileSystem#listStatus() throws FileNotFoundException when directory doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das resolved HDFS-5081. --- Resolution: Not A Problem Am closing this issue since this is not relevant any more. > DistributedFileSystem#listStatus() throws FileNotFoundException when > directory doesn't exist > > > Key: HDFS-5081 > URL: https://issues.apache.org/jira/browse/HDFS-5081 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Ted Yu > > I was running HBase trunk test suite against hadoop 2.1.1-SNAPSHOT (same > behavior with hadoop 2.0.5-ALPHA) > One test failed due to: > {code} > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB Time > elapsed: 1,594,938.629 sec <<< ERROR! > java.io.FileNotFoundException: File > hdfs://localhost:61300/user/tyu/hbase/.archive does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:656) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:92) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:710) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:710) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518) > at org.apache.hadoop.hbase.util.FSUtils.getLocalTableDirs(FSUtils.java:1317) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.migrateTables(NamespaceUpgrade.java:114) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:87) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:206) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB.setUpBeforeClass(TestMetaMigrationConvertingToPB.java:128) > {code} > TestMetaMigrationConvertToPB.tgz was generated from filesystem image produced > by previous release of HBase. > TestMetaMigrationConvertingToPB, activating NamespaceUpgrade, would upgrade > it to current release of HBase. > The test is at > hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaMigrationConvertingToPB.java > under HBase trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5083) Update the HDFS compatibility version range
Kihwal Lee created HDFS-5083: Summary: Update the HDFS compatibility version range Key: HDFS-5083 URL: https://issues.apache.org/jira/browse/HDFS-5083 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta, 2.3.0 Reporter: Kihwal Lee Priority: Blocker Since we have made incompatible changes including RPCv9, the following needs to be updated. For branch-2 and branch-2.*, I think it needs to be set to 2.1.0. DFS_NAMENODE_MIN_SUPPORTED_DATANODE_VERSION_DEFAULT DFS_DATANODE_MIN_SUPPORTED_NAMENODE_VERSION_DEFAULT The rpc change will make dn registration fail, so it won't get to the actual version check, but this still needs to be set correctly, since we are aiming 2.1.0 to be api/protocol stable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5082) Move the version info of zookeeper test dependency to hadoop-project/pom
Karthik Kambatla created HDFS-5082: -- Summary: Move the version info of zookeeper test dependency to hadoop-project/pom Key: HDFS-5082 URL: https://issues.apache.org/jira/browse/HDFS-5082 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor As different projects (HDFS, YARN) depend on zookeeper, it is better to keep the version information in hadoop-project/pom.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5081) DistributedFileSystem#listStatus() throws FileNotFoundException when directory doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735002#comment-13735002 ] Ted Yu commented on HDFS-5081: -- Some new code didn't handle this very well. See HBASE-9168. > DistributedFileSystem#listStatus() throws FileNotFoundException when > directory doesn't exist > > > Key: HDFS-5081 > URL: https://issues.apache.org/jira/browse/HDFS-5081 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Ted Yu > > I was running HBase trunk test suite against hadoop 2.1.1-SNAPSHOT (same > behavior with hadoop 2.0.5-ALPHA) > One test failed due to: > {code} > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB Time > elapsed: 1,594,938.629 sec <<< ERROR! > java.io.FileNotFoundException: File > hdfs://localhost:61300/user/tyu/hbase/.archive does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:656) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:92) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:710) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:710) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518) > at org.apache.hadoop.hbase.util.FSUtils.getLocalTableDirs(FSUtils.java:1317) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.migrateTables(NamespaceUpgrade.java:114) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:87) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:206) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB.setUpBeforeClass(TestMetaMigrationConvertingToPB.java:128) > {code} > TestMetaMigrationConvertToPB.tgz was generated from filesystem image produced > by previous release of HBase. > TestMetaMigrationConvertingToPB, activating NamespaceUpgrade, would upgrade > it to current release of HBase. > The test is at > hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaMigrationConvertingToPB.java > under HBase trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5081) DistributedFileSystem#listStatus() throws FileNotFoundException when directory doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734998#comment-13734998 ] Devaraj Das commented on HDFS-5081: --- Ted, isn't this a known fact with HADOOP-2.x. I thought we already have code in HBase that handles this fact. Am I missing something? > DistributedFileSystem#listStatus() throws FileNotFoundException when > directory doesn't exist > > > Key: HDFS-5081 > URL: https://issues.apache.org/jira/browse/HDFS-5081 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Ted Yu > > I was running HBase trunk test suite against hadoop 2.1.1-SNAPSHOT (same > behavior with hadoop 2.0.5-ALPHA) > One test failed due to: > {code} > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB Time > elapsed: 1,594,938.629 sec <<< ERROR! > java.io.FileNotFoundException: File > hdfs://localhost:61300/user/tyu/hbase/.archive does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:656) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:92) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:710) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:710) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518) > at org.apache.hadoop.hbase.util.FSUtils.getLocalTableDirs(FSUtils.java:1317) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.migrateTables(NamespaceUpgrade.java:114) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:87) > at > org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:206) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB.setUpBeforeClass(TestMetaMigrationConvertingToPB.java:128) > {code} > TestMetaMigrationConvertToPB.tgz was generated from filesystem image produced > by previous release of HBase. > TestMetaMigrationConvertingToPB, activating NamespaceUpgrade, would upgrade > it to current release of HBase. > The test is at > hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaMigrationConvertingToPB.java > under HBase trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated HDFS-4993: Assignee: Robert Parker Target Version/s: 3.0.0, 2.3.0, 0.23.10 Affects Version/s: 3.0.0 Status: Patch Available (was: Open) > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.9, 3.0.0, 2.1.0-beta >Reporter: Kihwal Lee >Assignee: Robert Parker > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated HDFS-4993: Attachment: HDFS-4993.patch > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee > Attachments: HDFS-4993-branch_0.23.patch, HDFS-4993.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4993) fsck can fail if a file is renamed or deleted
[ https://issues.apache.org/jira/browse/HDFS-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated HDFS-4993: Attachment: HDFS-4993-branch_0.23.patch > fsck can fail if a file is renamed or deleted > - > > Key: HDFS-4993 > URL: https://issues.apache.org/jira/browse/HDFS-4993 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta, 0.23.9 >Reporter: Kihwal Lee > Attachments: HDFS-4993-branch_0.23.patch > > > In NamenodeFsck#check(), the getListing() and getBlockLocations() are not > synchronized, so the file deletions or renames at the right moment can cause > FileNotFoundException and failure of fsck. > Instead of failing, fsck should continue. Optionally it can record file > system modifications it encountered, but since most modifications during fsck > are not detected, there might be little value in recording these specifically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734926#comment-13734926 ] Andrew Wang commented on HDFS-4949: --- Hi Tsuyoshi, HDFS-4953 allows applications to do zero-copy reads, so when combined with this JIRA, HDFS will be able to provide full memory-bandwidth reads on cached data. Deserialization is a somewhat separate concern since it happens at the application-level though. If an app can operate directly on the raw bytes in a file (e.g. a ByteBuffer), then it can avoid deserialization overhead. IIUC, this is untrue of the current MR input formats. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5049) Add JNI mlock support
[ https://issues.apache.org/jira/browse/HDFS-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734905#comment-13734905 ] Andrew Wang commented on HDFS-5049: --- I'm not sure why Findbugs went off on {{org.apache.hadoop.metrics2.lib.DefaultMetricsSystem}} since I didn't touch any metrics code in this patch. Spurious? > Add JNI mlock support > - > > Key: HDFS-5049 > URL: https://issues.apache.org/jira/browse/HDFS-5049 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Colin Patrick McCabe >Assignee: Andrew Wang > Attachments: hdfs-5049-1.patch > > > Add support for {{mlock}} and {{munlock}}, for use in caching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
[ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4898: - Attachment: h4898_20130809.patch h4898_20130809.patch: follows Eric's idea. > BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly > fallback to local rack > - > > Key: HDFS-4898 > URL: https://issues.apache.org/jira/browse/HDFS-4898 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.2.0, 2.0.4-alpha >Reporter: Eric Sirianni >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h4898_20130809.patch > > > As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not > properly fallback to local rack when no nodes are available in remote racks, > resulting in an improper {{NotEnoughReplicasException}}. > {code:title=BlockPlacementPolicyWithNodeGroup.java} > @Override > protected void chooseRemoteRack(int numOfReplicas, > DatanodeDescriptor localMachine, HashMap excludedNodes, > long blocksize, int maxReplicasPerRack, List > results, > boolean avoidStaleNodes) throws NotEnoughReplicasException { > int oldNumOfReplicas = results.size(); > // randomly choose one node from remote racks > try { > chooseRandom( > numOfReplicas, > "~" + > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), > excludedNodes, blocksize, maxReplicasPerRack, results, > avoidStaleNodes); > } catch (NotEnoughReplicasException e) { > chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), > localMachine.getNetworkLocation(), excludedNodes, blocksize, > maxReplicasPerRack, results, avoidStaleNodes); > } > } > {code} > As currently coded the {{chooseRandom()}} call in the {{catch}} block will > never succeed as the set of nodes within the passed in node path (e.g. > {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes > (both are the set of nodes within the same nodegroup as the node chosen first > replica). > The bug is that the fallback {{chooseRandom()}} call in the catch block > should be passing in the _complement_ of the node path used in the initial > {{chooseRandom()}} call in the try block (e.g. {{/rack1}}) - namely: > {code} > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()) > {code} > This will yield the proper fallback behavior of choosing a random node from > _within the same rack_, but still excluding those nodes _in the same > nodegroup_ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5047) Supress logging of full stack trace of quota and lease exceptions
[ https://issues.apache.org/jira/browse/HDFS-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734836#comment-13734836 ] Hudson commented on HDFS-5047: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1513 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1513/]) HDFS-5047. Supress logging of full stack trace of quota and lease exceptions. Contributed by Robert Parker. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Supress logging of full stack trace of quota and lease exceptions > - > > Key: HDFS-5047 > URL: https://issues.apache.org/jira/browse/HDFS-5047 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.5-alpha, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-5047-branch_0.23.patch, HDFS-5047.patch > > > This is a follow up to HDFS-4714, which made a number of request-level > exceptions to the "terse" list of the namenode rpc server. I still see > several exceptions causing full stack trace to be logged. > NSQuotaExceededException > DSQuotaExceededException > LeaseExpiredException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4513) Clarify WebHDFS REST API that all JSON respsonses may contain additional properties
[ https://issues.apache.org/jira/browse/HDFS-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734740#comment-13734740 ] Hadoop QA commented on HDFS-4513: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595074/h4513_20130731.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4790//console This message is automatically generated. > Clarify WebHDFS REST API that all JSON respsonses may contain additional > properties > --- > > Key: HDFS-4513 > URL: https://issues.apache.org/jira/browse/HDFS-4513 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h4513_20130513.patch, h4513_20130731.patch > > > According to Section 5.4 in > http://tools.ietf.org/id/draft-zyp-json-schema-03.html, the default value of > "additionalProperties" is an empty schema which allows any value for > additional properties. Therefore, all WebHDFS JSON responses allow any > additional property since the WebHDFS REST API do not specify > additionalProperties. > However, it is better to clarify in the API that all JSON respsonses may > contain additional properties. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
[ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734738#comment-13734738 ] Eric Sirianni commented on HDFS-4898: - Nicholas - as discussed offline, from a legal perspective, I'm not yet able to contribute patches. I hope to get this worked out soon with my employer, but for now, I'm reassigning the JIRA to you. Thanks. > BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly > fallback to local rack > - > > Key: HDFS-4898 > URL: https://issues.apache.org/jira/browse/HDFS-4898 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.2.0, 2.0.4-alpha >Reporter: Eric Sirianni >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > > As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not > properly fallback to local rack when no nodes are available in remote racks, > resulting in an improper {{NotEnoughReplicasException}}. > {code:title=BlockPlacementPolicyWithNodeGroup.java} > @Override > protected void chooseRemoteRack(int numOfReplicas, > DatanodeDescriptor localMachine, HashMap excludedNodes, > long blocksize, int maxReplicasPerRack, List > results, > boolean avoidStaleNodes) throws NotEnoughReplicasException { > int oldNumOfReplicas = results.size(); > // randomly choose one node from remote racks > try { > chooseRandom( > numOfReplicas, > "~" + > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), > excludedNodes, blocksize, maxReplicasPerRack, results, > avoidStaleNodes); > } catch (NotEnoughReplicasException e) { > chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), > localMachine.getNetworkLocation(), excludedNodes, blocksize, > maxReplicasPerRack, results, avoidStaleNodes); > } > } > {code} > As currently coded the {{chooseRandom()}} call in the {{catch}} block will > never succeed as the set of nodes within the passed in node path (e.g. > {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes > (both are the set of nodes within the same nodegroup as the node chosen first > replica). > The bug is that the fallback {{chooseRandom()}} call in the catch block > should be passing in the _complement_ of the node path used in the initial > {{chooseRandom()}} call in the try block (e.g. {{/rack1}}) - namely: > {code} > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()) > {code} > This will yield the proper fallback behavior of choosing a random node from > _within the same rack_, but still excluding those nodes _in the same > nodegroup_ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
[ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-4898: Assignee: Tsz Wo (Nicholas), SZE (was: Eric Sirianni) > BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly > fallback to local rack > - > > Key: HDFS-4898 > URL: https://issues.apache.org/jira/browse/HDFS-4898 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.2.0, 2.0.4-alpha >Reporter: Eric Sirianni >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > > As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not > properly fallback to local rack when no nodes are available in remote racks, > resulting in an improper {{NotEnoughReplicasException}}. > {code:title=BlockPlacementPolicyWithNodeGroup.java} > @Override > protected void chooseRemoteRack(int numOfReplicas, > DatanodeDescriptor localMachine, HashMap excludedNodes, > long blocksize, int maxReplicasPerRack, List > results, > boolean avoidStaleNodes) throws NotEnoughReplicasException { > int oldNumOfReplicas = results.size(); > // randomly choose one node from remote racks > try { > chooseRandom( > numOfReplicas, > "~" + > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), > excludedNodes, blocksize, maxReplicasPerRack, results, > avoidStaleNodes); > } catch (NotEnoughReplicasException e) { > chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), > localMachine.getNetworkLocation(), excludedNodes, blocksize, > maxReplicasPerRack, results, avoidStaleNodes); > } > } > {code} > As currently coded the {{chooseRandom()}} call in the {{catch}} block will > never succeed as the set of nodes within the passed in node path (e.g. > {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes > (both are the set of nodes within the same nodegroup as the node chosen first > replica). > The bug is that the fallback {{chooseRandom()}} call in the catch block > should be passing in the _complement_ of the node path used in the initial > {{chooseRandom()}} call in the try block (e.g. {{/rack1}}) - namely: > {code} > NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()) > {code} > This will yield the proper fallback behavior of choosing a random node from > _within the same rack_, but still excluding those nodes _in the same > nodegroup_ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5047) Supress logging of full stack trace of quota and lease exceptions
[ https://issues.apache.org/jira/browse/HDFS-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734733#comment-13734733 ] Hudson commented on HDFS-5047: -- ABORTED: Integrated in Hadoop-Hdfs-trunk #1486 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1486/]) HDFS-5047. Supress logging of full stack trace of quota and lease exceptions. Contributed by Robert Parker. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Supress logging of full stack trace of quota and lease exceptions > - > > Key: HDFS-5047 > URL: https://issues.apache.org/jira/browse/HDFS-5047 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.5-alpha, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-5047-branch_0.23.patch, HDFS-5047.patch > > > This is a follow up to HDFS-4714, which made a number of request-level > exceptions to the "terse" list of the namenode rpc server. I still see > several exceptions causing full stack trace to be logged. > NSQuotaExceededException > DSQuotaExceededException > LeaseExpiredException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3020) Auto-logSync based on edit log buffer size broken
[ https://issues.apache.org/jira/browse/HDFS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734725#comment-13734725 ] Hudson commented on HDFS-3020: -- ABORTED: Integrated in Hadoop-Hdfs-0.23-Build #694 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/694/]) HDFS-3020. Fix editlog to automatically sync when buffer is full. Contributed by Todd Lipcon. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1511968) * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditsDoubleBuffer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java > Auto-logSync based on edit log buffer size broken > - > > Key: HDFS-3020 > URL: https://issues.apache.org/jira/browse/HDFS-3020 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.22.0, 0.23.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 2.0.0-alpha, 0.23.10 > > Attachments: hdfs-3020.txt, hdfs-3020.txt, hdfs-3020.txt > > > HDFS-1112 added a feature whereby the edit log automatically calls logSync() > if the buffered data crosses a threshold. However, the code checks > {{bufReady.size()}} rather than {{bufCurrent.size()}} -- which is incorrect > since the writes themselves go into {{bufCurrent}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5047) Supress logging of full stack trace of quota and lease exceptions
[ https://issues.apache.org/jira/browse/HDFS-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734727#comment-13734727 ] Hudson commented on HDFS-5047: -- ABORTED: Integrated in Hadoop-Hdfs-0.23-Build #694 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/694/]) HDFS-5047. Supress logging of full stack trace of quota and lease exceptions. Contributed by Robert Parker. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512060) * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Supress logging of full stack trace of quota and lease exceptions > - > > Key: HDFS-5047 > URL: https://issues.apache.org/jira/browse/HDFS-5047 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.5-alpha, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-5047-branch_0.23.patch, HDFS-5047.patch > > > This is a follow up to HDFS-4714, which made a number of request-level > exceptions to the "terse" list of the namenode rpc server. I still see > several exceptions causing full stack trace to be logged. > NSQuotaExceededException > DSQuotaExceededException > LeaseExpiredException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5047) Supress logging of full stack trace of quota and lease exceptions
[ https://issues.apache.org/jira/browse/HDFS-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734666#comment-13734666 ] Hudson commented on HDFS-5047: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #296 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/296/]) HDFS-5047. Supress logging of full stack trace of quota and lease exceptions. Contributed by Robert Parker. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Supress logging of full stack trace of quota and lease exceptions > - > > Key: HDFS-5047 > URL: https://issues.apache.org/jira/browse/HDFS-5047 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.5-alpha, 0.23.9 >Reporter: Kihwal Lee >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: HDFS-5047-branch_0.23.patch, HDFS-5047.patch > > > This is a follow up to HDFS-4714, which made a number of request-level > exceptions to the "terse" list of the namenode rpc server. I still see > several exceptions causing full stack trace to be logged. > NSQuotaExceededException > DSQuotaExceededException > LeaseExpiredException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira