Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. - The code is well detached as a self contained package. - It is a logically stand-alone project that can be replaced by other technologies. - If it is a separate project then there is no need to port it to other versions. You can package it as a dependent jar. - Finally, it will be a good precedent of spinning new projects out of HDFS rather than bringing everything under HDFS umbrella. Todd, I had a feeling you were in favor of this direction? Thanks, --Konstantin On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins e...@cloudera.com wrote: +1 Awesome work Todd. On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon t...@cloudera.com wrote: Dear fellow HDFS developers, Per my email thread last week (Heads up: merge for QJM branch soon at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to propose merging the HDFS-3077 branch into trunk. The branch has been active since mid July and has stabilized significantly over the last two months. It has passed the full test suite, findbugs, and release audit, and I think it's ready to merge at this point. The branch has been fully developed using the standard 'review-then-commit' (RTC) policy, and the design is described in detail in a document attached to HDFS-3077 itself. The code itself has been contributed by me, Aaron, and Eli, but I'd be remiss not to also acknowledge the contributions to the design from discussions with Suresh, Sanjay, Henry Robinson, Patrick Hunt, Ivan Kelly, Andrew Purtell, Flavio Junqueira, Ben Reed, Nicholas, Bikas, Brandon, and others. Additionally, special thanks to Andrew Purtell and Stephen Chu for their help with cluster testing. This initial VOTE is to merge only into trunk, but, following the pattern of automatic failover, I expect to merge it into branch-2 within a few weeks as well. The merge to branch-2 should be clean, as both I and Andrew Purtell have been testing on branch-2-derived codebases in addition to trunk. Please cast your vote by EOD Friday 9/29. Given that the branch has only had small changes in the last few weeks, and there was a heads up last week, I trust this should be enough time for committers to cast their votes. Per our by-laws, we need a minimum of three binding +1 votes from committers. I will start the voting with my own +1. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
[jira] [Created] (HDFS-3980) NPE in HttpURLConnection.java while starting SecondaryNameNode.
Brahma Reddy Battula created HDFS-3980: -- Summary: NPE in HttpURLConnection.java while starting SecondaryNameNode. Key: HDFS-3980 URL: https://issues.apache.org/jira/browse/HDFS-3980 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Brahma Reddy Battula Scenario: I started secure cluster by going thru following.. https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide.. Here SecondaryNamenode is getting shutdown by throwing NPE.. Please correct me If I am wrong... Will attach conf and logs.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop-Hdfs-0.23-Build - Build # 386 - Still Unstable
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/386/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 19644 lines...] [INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ hadoop-hdfs-project --- [INFO] Wrote classpath file '/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/target/classes/mrapp-generated-classpath'. [INFO] [INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-hdfs-project --- [INFO] Not executing Javadoc as the project is not a Java classpath-capable package [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-hdfs-project --- [INFO] Installing /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/pom.xml to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-hdfs-project/0.23.4-SNAPSHOT/hadoop-hdfs-project-0.23.4-SNAPSHOT.pom [INFO] [INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ hadoop-hdfs-project --- [INFO] Skipped writing classpath file '/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/target/classes/mrapp-generated-classpath'. No changes found. [INFO] [INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-hdfs-project --- [INFO] Not executing Javadoc as the project is not a Java classpath-capable package [INFO] [INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ hadoop-hdfs-project --- [INFO] [INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ hadoop-hdfs-project --- [INFO] ** FindBugsMojo execute *** [INFO] canGenerate is false [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop HDFS SUCCESS [5:21.821s] [INFO] Apache Hadoop HttpFS .. SUCCESS [44.039s] [INFO] Apache Hadoop HDFS Project SUCCESS [0.057s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 6:06.511s [INFO] Finished at: Wed Sep 26 11:39:14 UTC 2012 [INFO] Final Memory: 53M/732M [INFO] + /home/jenkins/tools/maven/latest/bin/mvn test -Dmaven.test.failure.ignore=true -Pclover -DcloverLicenseLocation=/home/jenkins/tools/clover/latest/lib/clover.license Archiving artifacts Recording test results Build step 'Publish JUnit test result report' changed build result to UNSTABLE Publishing Javadoc Recording fingerprints Updating HADOOP-8822 Updating MAPREDUCE-2786 Updating MAPREDUCE-4651 Sending e-mails to: hdfs-dev@hadoop.apache.org Email was triggered for: Unstable Sending email for trigger: Unstable ### ## FAILED TESTS (if any) ## 2 tests failed. REGRESSION: org.apache.hadoop.hdfs.TestCrcCorruption.testCrcCorruption Error Message: IPC server unable to read call parameters: readObject can't find class org.apache.hadoop.io.Writable Stack Trace: java.lang.RuntimeException: IPC server unable to read call parameters: readObject can't find class org.apache.hadoop.io.Writable at org.apache.hadoop.ipc.Client.call(Client.java:1088) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) at $Proxy13.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) at $Proxy13.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1097) at
Jenkins build is still unstable: Hadoop-Hdfs-0.23-Build #386
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/changes
About The Hadoop Plugin
Hi All, I am new to Hadoop development. I have written a Hadoop map-reduce program which I want to debug. I wrote it using Eclipse IDE (Ganymede/Indigo). What it takes to debug my code? Is it possible? Please guide me on this. Any help in this regard is very much appreciated. Thanks, Harshal
Re: Commits breaking compilation of MR 'classic' tests
Point. I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4687 to track this. On Sep 25, 2012, at 9:33 PM, Eli Collins wrote: How about adding this step to the MR PreCommit jenkins job so it's run as part test-patch? On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.com wrote: Committers, As most people are aware, the MapReduce 'classic' tests (in hadoop-mapreduce-project/src/test) still need to built using ant since they aren't mavenized yet. I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 and MAPREDUCE-3682) which lead me to believe developers/committers aren't checking for this. Henceforth, with all changes, before committing, please do run: $ mvn install $ cd hadoop-mapreduce-project $ ant veryclean all-jars -Dresolvers=internal These instructions were already in http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just updated http://wiki.apache.org/hadoop/HowToContribute. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
[jira] [Resolved] (HDFS-3977) Incompatible change between hadoop-1 and hadoop-2 when the dfs.hosts and dfs.hosts.exclude files are not present
[ https://issues.apache.org/jira/browse/HDFS-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta resolved HDFS-3977. --- Resolution: Invalid Thanks Todd. Resolving it as invalid. Incompatible change between hadoop-1 and hadoop-2 when the dfs.hosts and dfs.hosts.exclude files are not present Key: HDFS-3977 URL: https://issues.apache.org/jira/browse/HDFS-3977 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Arpit Gupta Assignee: Arpit Gupta While testing hadoop-1 and hadoop-2 the following was noticed if the files in the properties dfs.hosts and dfs.hosts.exclude do not exist in hadoop-1 namenode format and start went through successfully. in hadoop-2 we get a file not found exception and both the format and the namenode start commands fail. We should be logging a warning in the case when the file is not found so that we are compatible with hadoop-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Commits breaking compilation of MR 'classic' tests
As per my comment on the bug. I though we were going to remove them. MAPREDUCE-4266 only needs a little bit more work, change a patch to a script, before they disappear entirely. I would much rather see dead code die then be maintained for a few tests that are mostly testing the dead code itself. --Bobby On 9/26/12 9:39 AM, Arun C Murthy a...@hortonworks.com wrote: Point. I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4687 to track this. On Sep 25, 2012, at 9:33 PM, Eli Collins wrote: How about adding this step to the MR PreCommit jenkins job so it's run as part test-patch? On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.com wrote: Committers, As most people are aware, the MapReduce 'classic' tests (in hadoop-mapreduce-project/src/test) still need to built using ant since they aren't mavenized yet. I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 and MAPREDUCE-3682) which lead me to believe developers/committers aren't checking for this. Henceforth, with all changes, before committing, please do run: $ mvn install $ cd hadoop-mapreduce-project $ ant veryclean all-jars -Dresolvers=internal These instructions were already in http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just updated http://wiki.apache.org/hadoop/HowToContribute. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
[jira] [Resolved] (HDFS-3538) TestBlocksWithNotEnoughRacks fails
[ https://issues.apache.org/jira/browse/HDFS-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li resolved HDFS-3538. -- Resolution: Cannot Reproduce It has been a while and I don't see the problem in jenkins tests any more. Close it for now. TestBlocksWithNotEnoughRacks fails -- Key: HDFS-3538 URL: https://issues.apache.org/jira/browse/HDFS-3538 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Brandon Li Assignee: Brandon Li It failed for a few days in jenkins test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Commits breaking compilation of MR 'classic' tests
Fair, however there are still tests which need to be ported over. We can remove them after the port. On Sep 26, 2012, at 9:54 AM, Robert Evans wrote: As per my comment on the bug. I though we were going to remove them. MAPREDUCE-4266 only needs a little bit more work, change a patch to a script, before they disappear entirely. I would much rather see dead code die then be maintained for a few tests that are mostly testing the dead code itself. --Bobby On 9/26/12 9:39 AM, Arun C Murthy a...@hortonworks.com wrote: Point. I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4687 to track this. On Sep 25, 2012, at 9:33 PM, Eli Collins wrote: How about adding this step to the MR PreCommit jenkins job so it's run as part test-patch? On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.com wrote: Committers, As most people are aware, the MapReduce 'classic' tests (in hadoop-mapreduce-project/src/test) still need to built using ant since they aren't mavenized yet. I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 and MAPREDUCE-3682) which lead me to believe developers/committers aren't checking for this. Henceforth, with all changes, before committing, please do run: $ mvn install $ cd hadoop-mapreduce-project $ ant veryclean all-jars -Dresolvers=internal These instructions were already in http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just updated http://wiki.apache.org/hadoop/HowToContribute. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. - The code is well detached as a self contained package. The addition is mostly self-contained, but it makes use of a bunch of private parts of HDFS and Common: - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc - Coupled to the JournalManager interface which is still evolving. In fact there were several patches in trunk which were done during the development of this project, specifically to make this API more general. There's still some further work to be done in this area on the generic interface -- eg support for upgrade/rollback. - The functional tests make use of a bunch of private HDFS APIs as well. - It is a logically stand-alone project that can be replaced by other technologies. - If it is a separate project then there is no need to port it to other versions. You can package it as a dependent jar. Per above, it's not that separate, because in order to build it, we had to make a number of changes to core HDFS internal interfaces. It currently couldn't be used to store anything except for NN logs. It would be a nice extension to truly separate it out into a content-agnostic quorum-based edit log, but today it actually uses the existing edit log validation code to determine valid lengths, etc. - Finally, it will be a good precedent of spinning new projects out of HDFS rather than bringing everything under HDFS umbrella. Todd, I had a feeling you were in favor of this direction? I'm not in favor of it - I had said previously that it's worth discussing if several other people believe the same. I know that we plan to ship it as part of CDH and will be our recommended way of running HA HDFS. If the community doesn't accept the contribution, and prefers that we maintain it in a fork on github, then it's worth hearing. But I imagine that many other community members will want to either use or it ship it as part of their distros. Moving it to an entirely separate standalone project will just add extra work for these folks who, like us, think it's currently the best option for HA log storage. If at some point in the future, the internal APIs have fully stabilized (security, IPC, edit log streams, JournalManager, metrics, etc) then we can pull it out at that time. -Todd On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins e...@cloudera.com wrote: +1 Awesome work Todd. On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon t...@cloudera.com wrote: Dear fellow HDFS developers, Per my email thread last week (Heads up: merge for QJM branch soon at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to propose merging the HDFS-3077 branch into trunk. The branch has been active since mid July and has stabilized significantly over the last two months. It has passed the full test suite, findbugs, and release audit, and I think it's ready to merge at this point. The branch has been fully developed using the standard 'review-then-commit' (RTC) policy, and the design is described in detail in a document attached to HDFS-3077 itself. The code itself has been contributed by me, Aaron, and Eli, but I'd be remiss not to also acknowledge the contributions to the design from discussions with Suresh, Sanjay, Henry Robinson, Patrick Hunt, Ivan Kelly, Andrew Purtell, Flavio Junqueira, Ben Reed, Nicholas, Bikas, Brandon, and others. Additionally, special thanks to Andrew Purtell and Stephen Chu for their help with cluster testing. This initial VOTE is to merge only into trunk, but, following the pattern of automatic failover, I expect to merge it into branch-2 within a few weeks as well. The merge to branch-2 should be clean, as both I and Andrew Purtell have been testing on branch-2-derived codebases in addition to trunk. Please cast your vote by EOD Friday 9/29. Given that the branch has only had small changes in the last few weeks, and there was a heads up last week, I trust this should be enough time for committers to cast their votes. Per our by-laws, we need a minimum of three binding +1 votes from committers. I will start the voting with my own +1. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon t...@cloudera.com wrote: On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. - The code is well detached as a self contained package. The addition is mostly self-contained, but it makes use of a bunch of private parts of HDFS and Common: - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc - Coupled to the JournalManager interface which is still evolving. In fact there were several patches in trunk which were done during the development of this project, specifically to make this API more general. There's still some further work to be done in this area on the generic interface -- eg support for upgrade/rollback. - The functional tests make use of a bunch of private HDFS APIs as well. - It is a logically stand-alone project that can be replaced by other technologies. - If it is a separate project then there is no need to port it to other versions. You can package it as a dependent jar. Per above, it's not that separate, because in order to build it, we had to make a number of changes to core HDFS internal interfaces. It currently couldn't be used to store anything except for NN logs. It would be a nice extension to truly separate it out into a content-agnostic quorum-based edit log, but today it actually uses the existing edit log validation code to determine valid lengths, etc. - Finally, it will be a good precedent of spinning new projects out of HDFS rather than bringing everything under HDFS umbrella. Todd, I had a feeling you were in favor of this direction? I'm not in favor of it - I had said previously that it's worth discussing if several other people believe the same. I'm not in favor of it either. One of the benefits of QJM over the BK approach is that it's embedded in HDFS (ie not treated as a separate storage system). See HDFS-3077 and HDFS-3092 for details on that discussion.
Re: Commits breaking compilation of MR 'classic' tests
That is fine, we may want to then mark it so that the MR-4687 depends on the JIRA to port the tests, so the tests don't disapear before we are done. --Bobby From: Arun C Murthy a...@hortonworks.commailto:a...@hortonworks.com Date: Wednesday, September 26, 2012 12:31 PM To: hdfs-dev@hadoop.apache.orgmailto:hdfs-dev@hadoop.apache.org hdfs-dev@hadoop.apache.orgmailto:hdfs-dev@hadoop.apache.org, Yahoo! Inc. ev...@yahoo-inc.commailto:ev...@yahoo-inc.com Cc: common-...@hadoop.apache.orgmailto:common-...@hadoop.apache.org common-...@hadoop.apache.orgmailto:common-...@hadoop.apache.org, yarn-...@hadoop.apache.orgmailto:yarn-...@hadoop.apache.org yarn-...@hadoop.apache.orgmailto:yarn-...@hadoop.apache.org, mapreduce-...@hadoop.apache.orgmailto:mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.orgmailto:mapreduce-...@hadoop.apache.org Subject: Re: Commits breaking compilation of MR 'classic' tests Fair, however there are still tests which need to be ported over. We can remove them after the port. On Sep 26, 2012, at 9:54 AM, Robert Evans wrote: As per my comment on the bug. I though we were going to remove them. MAPREDUCE-4266 only needs a little bit more work, change a patch to a script, before they disappear entirely. I would much rather see dead code die then be maintained for a few tests that are mostly testing the dead code itself. --Bobby On 9/26/12 9:39 AM, Arun C Murthy a...@hortonworks.commailto:a...@hortonworks.com wrote: Point. I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4687 to track this. On Sep 25, 2012, at 9:33 PM, Eli Collins wrote: How about adding this step to the MR PreCommit jenkins job so it's run as part test-patch? On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.commailto:a...@hortonworks.com wrote: Committers, As most people are aware, the MapReduce 'classic' tests (in hadoop-mapreduce-project/src/test) still need to built using ant since they aren't mavenized yet. I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 and MAPREDUCE-3682) which lead me to believe developers/committers aren't checking for this. Henceforth, with all changes, before committing, please do run: $ mvn install $ cd hadoop-mapreduce-project $ ant veryclean all-jars -Dresolvers=internal These instructions were already in http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just updated http://wiki.apache.org/hadoop/HowToContribute. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
+1 for the merge. I've reviewed much of the code as individual patches and tested the whole system, both in single- and multi-node configurations. I've also tested the system with security enabled and confirmed that it works as expected. I've done all of the above testing both with and without HA enabled. Aaron On Sep 25, 2012, at 4:02 PM, Todd Lipcon t...@cloudera.com wrote: Dear fellow HDFS developers, Per my email thread last week (Heads up: merge for QJM branch soon at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to propose merging the HDFS-3077 branch into trunk. The branch has been active since mid July and has stabilized significantly over the last two months. It has passed the full test suite, findbugs, and release audit, and I think it's ready to merge at this point. The branch has been fully developed using the standard 'review-then-commit' (RTC) policy, and the design is described in detail in a document attached to HDFS-3077 itself. The code itself has been contributed by me, Aaron, and Eli, but I'd be remiss not to also acknowledge the contributions to the design from discussions with Suresh, Sanjay, Henry Robinson, Patrick Hunt, Ivan Kelly, Andrew Purtell, Flavio Junqueira, Ben Reed, Nicholas, Bikas, Brandon, and others. Additionally, special thanks to Andrew Purtell and Stephen Chu for their help with cluster testing. This initial VOTE is to merge only into trunk, but, following the pattern of automatic failover, I expect to merge it into branch-2 within a few weeks as well. The merge to branch-2 should be clean, as both I and Andrew Purtell have been testing on branch-2-derived codebases in addition to trunk. Please cast your vote by EOD Friday 9/29. Given that the branch has only had small changes in the last few weeks, and there was a heads up last week, I trust this should be enough time for committers to cast their votes. Per our by-laws, we need a minimum of three binding +1 votes from committers. I will start the voting with my own +1. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
[jira] [Reopened] (HDFS-3910) DFSTestUtil#waitReplication should timeout
[ https://issues.apache.org/jira/browse/HDFS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reopened HDFS-3910: - DFSTestUtil#waitReplication should timeout -- Key: HDFS-3910 URL: https://issues.apache.org/jira/browse/HDFS-3910 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Fix For: 2.0.2-alpha Attachments: hdfs-3910.txt, hdfs-3910.txt, hdfs-3910.txt DFSTestUtil#waitReplication never times out so test execution fails only when the mvn test executor times out. This leaves a stray test process around, an example is HDFS-3902. Let's make waitReplication do something like bail after it has checked block locations 10 times for one file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3981) access time is set without holding writelock in FSNamesystem
Xiaobo Peng created HDFS-3981: - Summary: access time is set without holding writelock in FSNamesystem Key: HDFS-3981 URL: https://issues.apache.org/jira/browse/HDFS-3981 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.1-alpha Reporter: Xiaobo Peng Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. I'd be -1 on that. Users shouldn't have to go elsewhere to get a fix for SPOF. St.Ack
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
Hi Todd, I had said previously that it's worth discussing if several other people believe the same. Well let's put it on to general list for discussion then? Seems to me an important issue for Hadoop evolution in general. We keep growing the HDFS umbrella with competing technologies (http/web HDFS as an example) within it. Which makes the project harder to stabilize and release. Not touching MR/Yarn here. If at some point in the future, the internal APIs have fully stabilized (security, IPC, edit log streams, JournalManager, metrics, etc) then we can pull it out at that time. By that time it will monolithically grow into HDFS and vise versa. I know that we plan to ship it as part of CDH and will be our recommended way of running HA HDFS. Sounds like CDH is moving well in release plans and otherwise. My concern is that if we add another 6000 lines of code to Hadoop-2, it will take yet another x months for stabilization. While it is not clear why people cannot just use NFS filers for shared storage, as you originally designed. distros. Moving it to an entirely separate standalone project will just add extra work for these folks who, like us, think it's currently the best option for HA log storage. Don't know who these folks are. I see it as less work for HDFS community, because there is no need for porting and supporting this project in two or more different versions. Thanks, --Konstantin On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon t...@cloudera.com wrote: On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. - The code is well detached as a self contained package. The addition is mostly self-contained, but it makes use of a bunch of private parts of HDFS and Common: - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc - Coupled to the JournalManager interface which is still evolving. In fact there were several patches in trunk which were done during the development of this project, specifically to make this API more general. There's still some further work to be done in this area on the generic interface -- eg support for upgrade/rollback. - The functional tests make use of a bunch of private HDFS APIs as well. - It is a logically stand-alone project that can be replaced by other technologies. - If it is a separate project then there is no need to port it to other versions. You can package it as a dependent jar. Per above, it's not that separate, because in order to build it, we had to make a number of changes to core HDFS internal interfaces. It currently couldn't be used to store anything except for NN logs. It would be a nice extension to truly separate it out into a content-agnostic quorum-based edit log, but today it actually uses the existing edit log validation code to determine valid lengths, etc. - Finally, it will be a good precedent of spinning new projects out of HDFS rather than bringing everything under HDFS umbrella. Todd, I had a feeling you were in favor of this direction? I'm not in favor of it - I had said previously that it's worth discussing if several other people believe the same. I know that we plan to ship it as part of CDH and will be our recommended way of running HA HDFS. If the community doesn't accept the contribution, and prefers that we maintain it in a fork on github, then it's worth hearing. But I imagine that many other community members will want to either use or it ship it as part of their distros. Moving it to an entirely separate standalone project will just add extra work for these folks who, like us, think it's currently the best option for HA log storage. If at some point in the future, the internal APIs have fully stabilized (security, IPC, edit log streams, JournalManager, metrics, etc) then we can pull it out at that time. -Todd On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins e...@cloudera.com wrote: +1 Awesome work Todd. On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon t...@cloudera.com wrote: Dear fellow HDFS developers, Per my email thread last week (Heads up: merge for QJM branch soon at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to propose merging the HDFS-3077 branch into trunk. The branch has been active since mid July and has stabilized significantly over the last two months. It has passed the full test suite, findbugs, and release audit, and I think it's ready to merge at this point. The branch has been fully developed using the standard 'review-then-commit' (RTC) policy, and the design is described in detail in a document attached to HDFS-3077 itself. The code itself has been contributed by me, Aaron, and Eli, but I'd be remiss not to also acknowledge the contributions to the design from
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
Don't understand your argument. Else where? One way or another users will be talking to Todd. Thanks, --Konst On Wed, Sep 26, 2012 at 2:32 PM, Stack st...@duboce.net wrote: On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. I'd be -1 on that. Users shouldn't have to go elsewhere to get a fix for SPOF. St.Ack
Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk
On Wed, Sep 26, 2012 at 04:17PM, Konstantin Shvachko wrote: Hi Todd, I had said previously that it's worth discussing if several other people believe the same. Well let's put it on to general list for discussion then? Seems to me an important issue for Hadoop evolution in general. We keep growing the HDFS umbrella with competing technologies (http/web HDFS as an example) within it. Which makes the project harder to stabilize and release. Not touching MR/Yarn here. If at some point in the future, the internal APIs have fully stabilized (security, IPC, edit log streams, JournalManager, metrics, etc) then we can pull it out at that time. By that time it will monolithically grow into HDFS and vise versa. I know that we plan to ship it as part of CDH and will be our recommended way of running HA HDFS. Sounds like CDH is moving well in release plans and otherwise. My concern is that if we add another 6000 lines of code to Hadoop-2, it will take yet another x months for stabilization. While it is not clear why people cannot just use NFS filers for shared storage, as you originally designed. distros. Moving it to an entirely separate standalone project will just add extra work for these folks who, like us, think it's currently the best option for HA log storage. Don't know who these folks are. I see it as less work for HDFS community, because there is no need for porting and supporting this project in two or more different versions. From a pure integration perspective I also see such separation to be beneficial, as the dependencies can be clearly defined and managed orthogonal to the project's source code. Regards, Cos Thanks, --Konstantin On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon t...@cloudera.com wrote: On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko shv.had...@gmail.com wrote: I think this is a great work, Todd. And I think we should not merge it into trunk or other branches. As I suggested earlier on this list I think this should be spinned off as a separate project or a subproject. - The code is well detached as a self contained package. The addition is mostly self-contained, but it makes use of a bunch of private parts of HDFS and Common: - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc - Coupled to the JournalManager interface which is still evolving. In fact there were several patches in trunk which were done during the development of this project, specifically to make this API more general. There's still some further work to be done in this area on the generic interface -- eg support for upgrade/rollback. - The functional tests make use of a bunch of private HDFS APIs as well. - It is a logically stand-alone project that can be replaced by other technologies. - If it is a separate project then there is no need to port it to other versions. You can package it as a dependent jar. Per above, it's not that separate, because in order to build it, we had to make a number of changes to core HDFS internal interfaces. It currently couldn't be used to store anything except for NN logs. It would be a nice extension to truly separate it out into a content-agnostic quorum-based edit log, but today it actually uses the existing edit log validation code to determine valid lengths, etc. - Finally, it will be a good precedent of spinning new projects out of HDFS rather than bringing everything under HDFS umbrella. Todd, I had a feeling you were in favor of this direction? I'm not in favor of it - I had said previously that it's worth discussing if several other people believe the same. I know that we plan to ship it as part of CDH and will be our recommended way of running HA HDFS. If the community doesn't accept the contribution, and prefers that we maintain it in a fork on github, then it's worth hearing. But I imagine that many other community members will want to either use or it ship it as part of their distros. Moving it to an entirely separate standalone project will just add extra work for these folks who, like us, think it's currently the best option for HA log storage. If at some point in the future, the internal APIs have fully stabilized (security, IPC, edit log streams, JournalManager, metrics, etc) then we can pull it out at that time. -Todd On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins e...@cloudera.com wrote: +1 Awesome work Todd. On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon t...@cloudera.com wrote: Dear fellow HDFS developers, Per my email thread last week (Heads up: merge for QJM branch soon at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to propose merging the HDFS-3077 branch into trunk. The branch has been active since mid July and has stabilized significantly over the last two months. It has passed the full test suite, findbugs, and release audit, and I think
[jira] [Created] (HDFS-3982) report failed replications in DN heartbeat
Andy Isaacson created HDFS-3982: --- Summary: report failed replications in DN heartbeat Key: HDFS-3982 URL: https://issues.apache.org/jira/browse/HDFS-3982 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.2-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor From HDFS-3931: {quote} # The test corrupts 2/3 replicas. # client reports a bad block. # NN asks a DN to re-replicate, and randomly picks the other corrupt replica. # DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed. # NN keeps the block on pendingReplications. # BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above. since block is on pendingReplications, NN does not schedule another replication. Todd wrote: I can think of a few ways to fix this: ... 2) Add a field to the DN heartbeat which reports back a failed replication for a given block. The NN would use this to decrement the pendingReplication count, which would cause a new replication attempt to be made if it was still under-replicated. This jira tracks implementing the DN heartbeat replication failure report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3983) Hftp should not use the https port
Eli Collins created HDFS-3983: - Summary: Hftp should not use the https port Key: HDFS-3983 URL: https://issues.apache.org/jira/browse/HDFS-3983 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Hftp currently doesn't work against a secure cluster unless you configure {{dfs.https.port}} to be the http port, otherwise the client can't fetch tokens: {noformat} $ hadoop fs -ls hftp://c1225.hal.cloudera.com:50070/ 12/09/26 18:02:00 INFO fs.FileSystem: Couldn't get a delegation token from http://c1225.hal.cloudera.com:50470 using http. ls: Security enabled but user not authenticated by filter {noformat} This is due to Hftp still using the https port. Post HDFS-2617 it should use the regular http port. Hsftp should still use the secure port, however now that we have HADOOP-8581 it's worth considering removing Hsftp entirely. I'll start a separate thread about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3931) TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
[ https://issues.apache.org/jira/browse/HDFS-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reopened HDFS-3931: --- TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken - Key: HDFS-3931 URL: https://issues.apache.org/jira/browse/HDFS-3931 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Andy Isaacson Priority: Minor Fix For: 2.0.3-alpha Attachments: hdfs3931-1.txt, hdfs3931-2.txt, hdfs3931-3.txt, hdfs3931.txt Per Andy's comment on HDFS-3902: TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828. The failure scenario for this one is a bit more tricky. I think I've captured the scenario below: - The test corrupts 2/3 replicas. - client reports a bad block. - NN asks a DN to re-replicate, and randomly picks the other corrupt replica. - DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed. - NN keeps the block on pendingReplications. - BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above. since block is on pendingReplications, NN does not schedule another replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3931) TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
[ https://issues.apache.org/jira/browse/HDFS-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HDFS-3931. --- Resolution: Fixed TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken - Key: HDFS-3931 URL: https://issues.apache.org/jira/browse/HDFS-3931 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Andy Isaacson Priority: Minor Fix For: 2.0.3-alpha Attachments: hdfs3931-1.txt, hdfs3931-2.txt, hdfs3931-3.txt, hdfs3931.txt Per Andy's comment on HDFS-3902: TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828. The failure scenario for this one is a bit more tricky. I think I've captured the scenario below: - The test corrupts 2/3 replicas. - client reports a bad block. - NN asks a DN to re-replicate, and randomly picks the other corrupt replica. - DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed. - NN keeps the block on pendingReplications. - BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above. since block is on pendingReplications, NN does not schedule another replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira