[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification
[ https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443854#comment-13443854 ] Todd Lipcon commented on HDFS-3859: --- We don't have existing SHA1 implementations, and this isn't about security. It's just to guard against bugs or in-flight corruption. Security is taken care of by other layers (eg SPNEGO on the image transfer). I dont want to add new code and switch hashes for no good reason. > QJM: implement md5sum verification > -- > > Key: HDFS-3859 > URL: https://issues.apache.org/jira/browse/HDFS-3859 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > When the QJM passes journal segments between nodes, it should use an md5sum > field to make sure the data doesn't get corrupted during transit. This also > serves as an extra safe-guard to make sure that the data is consistent across > all nodes when finalizing a segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.
[ https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443851#comment-13443851 ] Uma Maheswara Rao G commented on HDFS-2815: --- @Suresh, could you please take a look on branch-1 patch? If you +1 on it, I will commit and resolve the issue. > Namenode is not coming out of safemode when we perform ( NN crash + restart ) > . Also FSCK report shows blocks missed. > -- > > Key: HDFS-2815 > URL: https://issues.apache.org/jira/browse/HDFS-2815 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 2.0.0-alpha, 3.0.0 > > Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, > HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch > > > When tested the HA(internal) with continuous switch with some 5mins gap, > found some *blocks missed* and namenode went into safemode after next switch. > >After the analysis, i found that this files already deleted by clients. > But i don't see any delete commands logs namenode log files. But namenode > added that blocks to invalidateSets and DNs deleted the blocks. >When restart of the namenode, it went into safemode and expecting some > more blocks to come out of safemode. >Here the reason could be that, file has been deleted in memory and added > into invalidates after this it is trying to sync the edits into editlog file. > By that time NN asked DNs to delete that blocks. Now namenode shuts down > before persisting to editlogs.( log behind) >Due to this reason, we may not get the INFO logs about delete, and when we > restart the Namenode (in my scenario it is again switch), Namenode expects > this deleted blocks also, as delete request is not persisted into editlog > before. >I reproduced this scenario with bedug points. *I feel, We should not add > the blocks to invalidates before persisting into Editlog*. > Note: for switch, we used kill -9 (force kill) > I am currently in 0.20.2 version. Same verified in 0.23 as well in normal > crash + restart scenario. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443848#comment-13443848 ] Uma Maheswara Rao G commented on HDFS-3791: --- @Ted, Yes, I was also in the same lines as Suresh said about that parameter. Not sure about any usecase for getting some advantage by making it configurable. Do you have any usecase, where we may get some advatages by tuning that parameter? If yes, feel free to file a JIRA, we can discuss about it there.Thanks a lot, Ted for taking a look on it. Thanks, Uma > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 1.2.0 > > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
NativeS3FileSystem problem
I was attempting to use the natives3 file system outside of doing any map reduce tasks. A simple task of trying to create a directory: FileSystem fs = FileSystem.get(uri, conf); Path currPath = new Path("/a/b/c"); fs.mkdirs(currPath); ( I can provide full code if needed). Anyway the class Jets3tNativeFileSystemStore attempts to detect if each key part of the object path exists expecting a 404 response if it does not: public FileMetadata retrieveMetadata(String key) throws IOException { try { S3Object object = s3Service.getObjectDetails(bucket, key); return new FileMetadata(key, object.getContentLength(), object.getLastModifiedDate().getTime()); } catch (S3ServiceException e) { // Following is brittle. Is there a better way? if (e.getMessage().contains("ResponseCode=404")) { return null; } if (e.getCause() instanceof IOException) { throw (IOException) e.getCause(); } throw new S3Exception(e); } } All version of jets3 I have looked at that seem to have a compatible class structure (don't blow on AWSCredentials) actually return an exception containing ".ResponseCode: 404 I took a copy of the code in this directory and fixed the following to read: public FileMetadata retrieveMetadata(String key) throws IOException { try { S3Object object = s3Service.getObjectDetails(bucket, key); return new FileMetadata(key, object.getContentLength(), object.getLastModifiedDate().getTime()); } catch (S3ServiceException e) { // Following is brittle. Is there a better way? if (e.getResponseCode() == 404) { return null; } if (e.getCause() instanceof IOException) { throw (IOException) e.getCause(); } throw new S3Exception(e); } } which seems to fix the issue. Am I missing something? Also this seems to of been broken for a variety of hadoop versions. Does anyone actually use this code path and if so is there a valid version combination that should of worked for me? Comments welcome. Chris
[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification
[ https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443831#comment-13443831 ] Andy Isaacson commented on HDFS-3859: - Please consider using SHA1 rather than MD5. The performance should be comparable (SHA1 is about 2.5% faster in my quick test, but that's "equal" by any rational measure). The hash is much less awfully broken. And it's one fewer place where we'll need to continue supporting legacy insecure code in the future. > QJM: implement md5sum verification > -- > > Key: HDFS-3859 > URL: https://issues.apache.org/jira/browse/HDFS-3859 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > When the QJM passes journal segments between nodes, it should use an md5sum > field to make sure the data doesn't get corrupted during transit. This also > serves as an extra safe-guard to make sure that the data is consistent across > all nodes when finalizing a segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3837: -- Attachment: hdfs-3837.txt Updated patch attached. > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, > hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443827#comment-13443827 ] Eli Collins commented on HDFS-3837: --- I investigated some more and confirmed findbugs isn't searching back far enough for the common subclass. Eg if I swap variables in the equals call I get: {noformat} org.apache.hadoop.hdfs.protocol.DatanodeInfo.equals(Object) used to determine equality org.apache.hadoop.hdfs.server.common.JspHelper$NodeRecord.equals(Object) used to determine equality org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.equals(Object) used to determine equality At DataNode.java:[line 1871] {noformat} It stops at DatanodeDescriptor#equals even though this calls super.equals (DatanodeInfo) which calls super.equals (DatanodeID). Just like the current warning stops at DatanodeRegistration#equals which calls super.equals (DatanodeID). It would be better (and findbugs wouldn't choke) if the various classes that extend DatanodeID have a member instead. I looked at this for HDFS-3237 and it required a ton of changes that probably aren't worth it. Given this I'll update the patch per your suggestion Surresh to ignore the warning in DataNode#recoverBlock. > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, > hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-282) Serialize ipcPort in DatanodeID instead of DatanodeRegistration and DatanodeInfo
[ https://issues.apache.org/jira/browse/HDFS-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HDFS-282. -- Resolution: Not A Problem No longer an issue now that the writable methods have been removed. > Serialize ipcPort in DatanodeID instead of DatanodeRegistration and > DatanodeInfo > > > Key: HDFS-282 > URL: https://issues.apache.org/jira/browse/HDFS-282 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Tsz Wo (Nicholas), SZE > > The field DatanodeID.ipcPort is currently serialized in DatanodeRegistration > and DatanodeInfo. Once HADOOP-2797 (remove the codes for handling old layout > ) is committed, DatanodeID.ipcPort should be serialized in DatanodeID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3865) TestDistCp is @ignored
[ https://issues.apache.org/jira/browse/HDFS-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443812#comment-13443812 ] Eli Collins commented on HDFS-3865: --- Looks like some of the tests are commented out as well (eg testUniformSizeDistCp). > TestDistCp is @ignored > -- > > Key: HDFS-3865 > URL: https://issues.apache.org/jira/browse/HDFS-3865 > Project: Hadoop HDFS > Issue Type: Test > Components: tools >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Priority: Minor > > We should fix TestDistCp so that it actually runs, rather than being ignored. > {code} > @ignore > public class TestDistCp { > private static final Log LOG = LogFactory.getLog(TestDistCp.class); > private static List pathList = new ArrayList(); > ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443810#comment-13443810 ] Eli Collins commented on HDFS-3466: --- Hey Owen, I think you meant to remove the 2nd initialization of httpKeytab. {code} +String httpKeytab = conf.get( + DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY); +if (httpKeytab == null) { + httpKeytab = conf.get(DFSConfigKeys.DFS_NAMENODE_KEYTAB_FILE_KEY); +} String httpKeytab = conf .get(DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY); {code} > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1490) TransferFSImage should timeout
[ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443791#comment-13443791 ] Vinay commented on HDFS-1490: - {quote}Why not introduce a new config which defaults to something like 1 minute?{quote} Ok, agree. Will introduce new config for this. {quote}In the test case, shouldn't you somehow notify the servlet to exit? Currently it waits on itself, but nothing notifies it.{quote} That was just added make the client call get timeout. Ideally while stopping the server, that will be interrupted. Anyway I will add a timeout for that also. Thanks todd, for comments. I will post new patch in sometime. > TransferFSImage should timeout > -- > > Key: HDFS-1490 > URL: https://issues.apache.org/jira/browse/HDFS-1490 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov >Priority: Minor > Attachments: HDFS-1490.patch, HDFS-1490.patch > > > Sometimes when primary crashes during image transfer secondary namenode would > hang trying to read the image from HTTP connection forever. > It would be great to set timeouts on the connection so if something like that > happens there is no need to restart the secondary itself. > In our case restarting components is handled by the set of scripts and since > the Secondary as the process is running it would just stay hung until we get > an alarm saying the checkpointing doesn't happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443752#comment-13443752 ] Hudson commented on HDFS-3864: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2683 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2683/]) HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 2.2.0-alpha > > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443738#comment-13443738 ] Hudson commented on HDFS-3864: -- Integrated in Hadoop-Common-trunk-Commit #2654 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2654/]) HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 2.2.0-alpha > > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443737#comment-13443737 ] Hudson commented on HDFS-3864: -- Integrated in Hadoop-Hdfs-trunk-Commit #2717 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2717/]) HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 2.2.0-alpha > > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3864: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the review, Todd. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 2.2.0-alpha > > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443715#comment-13443715 ] Aaron T. Myers commented on HDFS-3864: -- The findbugs warning is unrelated and I'm confident that the test failures are unrelated as well. I'm going to commit this patch momentarily. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443712#comment-13443712 ] Hadoop QA commented on HDFS-3864: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542846/HDFS-3864.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.server.datanode.TestBPOfferService +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//console This message is automatically generated. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3135) Build a war file for HttpFS instead of packaging the server (tomcat) along with the application.
[ https://issues.apache.org/jira/browse/HDFS-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443710#comment-13443710 ] Ryan Hennig commented on HDFS-3135: --- I'm troubleshooting a broken build that fails on the Tomcat download, because our Jenkins server doesn't have internet access (by design). Rather, all components are supposed to be fetched from our internal Maven Repository (Artifactory). So while I don't need the war file change, I do think this direct download should be removed. > Build a war file for HttpFS instead of packaging the server (tomcat) along > with the application. > > > Key: HDFS-3135 > URL: https://issues.apache.org/jira/browse/HDFS-3135 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 0.23.2 >Reporter: Ravi Prakash > Labels: build > > There are several reason why web applications should not be packaged along > with the server that is expected to serve them. For one not all organisations > use vanilla tomcat. There are other reasons I won't go into. > I'm filing this bug because some of our builds failed in trying to download > the tomcat.tar.gz file. We then had to manually wget the file and place it in > downloads/ to make the build pass. I suspect the download failed because of > an overloaded server (Frankly, I don't really know). If someone has ideas, > please share them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3855) Replace hardcoded strings with the already defined config keys in DataNode.java
[ https://issues.apache.org/jira/browse/HDFS-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3855: - Description: Replace hardcoded strings with the already defined config keys in DataNode.java > Replace hardcoded strings with the already defined config keys in > DataNode.java > > > Key: HDFS-3855 > URL: https://issues.apache.org/jira/browse/HDFS-3855 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 1.2.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Trivial > Attachments: HDFS-3855.branch-1.patch > > > Replace hardcoded strings with the already defined config keys in > DataNode.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443657#comment-13443657 ] Hadoop QA commented on HDFS-3864: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542840/HDFS-3864.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//console This message is automatically generated. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443654#comment-13443654 ] Hadoop QA commented on HDFS-3466: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3116//console This message is automatically generated. > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443645#comment-13443645 ] Hadoop QA commented on HDFS-3466: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3115//console This message is automatically generated. > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HDFS-3466: Attachment: hdfs-3466-trunk-2.patch > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HDFS-3466: Attachment: (was: hdfs-3466-trunk.patch) > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HDFS-3466: Attachment: hdfs-3466-trunk.patch > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HDFS-3466: Attachment: (was: hdfs-3466-trunk.patch) > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk-2.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file
[ https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HDFS-3466: Attachment: hdfs-3466-b1-2.patch Here's a patch that incorporates Eli's feedback. > The SPNEGO filter for the NameNode should come out of the web keytab file > - > > Key: HDFS-3466 > URL: https://issues.apache.org/jira/browse/HDFS-3466 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, security >Affects Versions: 1.1.0, 2.0.0-alpha >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, > hdfs-3466-trunk.patch > > > Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find > the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to > do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443624#comment-13443624 ] Hudson commented on HDFS-3849: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2682 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2682/]) HDFS-3849. When re-loading the FSImage, we should clear the existing genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Fix For: 2.2.0-alpha > > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation
[ https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443610#comment-13443610 ] Aaron T. Myers edited comment on HDFS-2264 at 8/29/12 9:45 AM: --- Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly a year!) I just encountered this issue again in a user's cluster. My new thinking is that we should just remove the expected client principal from the NamenodeProtocol entirely. I think this makes sense since the 2NN, SBN, BN, and balancer all potentially use this interface, so there's no single client principal that could reasonably be expected. The balancer, in particular, should be able to be run from any node, even one not running a daemon at all. I think to do what I propose here all we have to do is remove the clientPrincipal parameter from the SecurityInfo annotation on the NamenodeProtocol, and make sure that all of the methods exposed by this interface definitely check for super user privileges. I think most of them do, but we should ensure that they all do. How does this sound to you? was (Author: atm): Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly a year!) I just encountered this issue again in a user's cluster. My new thinking is that we should just remove the expected client principal from the NamenodeProtocol entirely. I think this makes sense the 2NN, SBN, BN, and balancer all potentially use this interface, so there's no single client principal that could reasonably be expected. The balancer, in particular, should be able to be run from any node, even one not running a daemon at all. I think to do what I propose here all we have to do is remove the clientPrincipal parameter from the SecurityInfo annotation on the NamenodeProtocol, and make sure that all of the methods exposed by this interface definitely check for super user privileges. I think most of them do, but we should ensure that they all do. How does this sound to you? > NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo > annotation > --- > > Key: HDFS-2264 > URL: https://issues.apache.org/jira/browse/HDFS-2264 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Aaron T. Myers >Assignee: Harsh J > Fix For: 0.24.0 > > Attachments: HDFS-2264.r1.diff > > > The {{@KerberosInfo}} annotation specifies the expected server and client > principals for a given protocol in order to look up the correct principal > name from the config. The {{NamenodeProtocol}} has the wrong value for the > client config key. This wasn't noticed because most setups actually use the > same *value* for for both the NN and 2NN principals ({{hdfs/_HOST@REALM}}), > in which the {{_HOST}} part gets replaced at run-time. This bug therefore > only manifests itself on secure setups which explicitly specify the NN and > 2NN principals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation
[ https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443610#comment-13443610 ] Aaron T. Myers commented on HDFS-2264: -- Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly a year!) I just encountered this issue again in a user's cluster. My new thinking is that we should just remove the expected client principal from the NamenodeProtocol entirely. I think this makes sense the 2NN, SBN, BN, and balancer all potentially use this interface, so there's no single client principal that could reasonably be expected. The balancer, in particular, should be able to be run from any node, even one not running a daemon at all. I think to do what I propose here all we have to do is remove the clientPrincipal parameter from the SecurityInfo annotation on the NamenodeProtocol, and make sure that all of the methods exposed by this interface definitely check for super user privileges. I think most of them do, but we should ensure that they all do. How does this sound to you? > NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo > annotation > --- > > Key: HDFS-2264 > URL: https://issues.apache.org/jira/browse/HDFS-2264 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Aaron T. Myers >Assignee: Harsh J > Fix For: 0.24.0 > > Attachments: HDFS-2264.r1.diff > > > The {{@KerberosInfo}} annotation specifies the expected server and client > principals for a given protocol in order to look up the correct principal > name from the config. The {{NamenodeProtocol}} has the wrong value for the > client config key. This wasn't noticed because most setups actually use the > same *value* for for both the NN and 2NN principals ({{hdfs/_HOST@REALM}}), > in which the {{_HOST}} part gets replaced at run-time. This bug therefore > only manifests itself on secure setups which explicitly specify the NN and > 2NN principals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443586#comment-13443586 ] Aaron T. Myers edited comment on HDFS-3864 at 8/29/12 9:21 AM: --- Here's a patch which addresses the issue. Fortunately, the fix is quite simple - just apply the values that we read in from the edit log. In addition to the automated test provided in the patch, I also tested this manually on an HA cluster and confirmed that MR jobs no longer experience the "distributed cache object changed" errors which caused this issue to be discovered. was (Author: atm): Here's a patch which addresses the issue. Fortunately, the fix is quite simply - just apply the values that we read in from the edit log. In addition to the automated test provided in the patch, I also tested this manually on an HA cluster and confirmed that MR jobs no longer experience the :distributed cache object changed" errors which caused this issue to be discovered. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3864: - Attachment: HDFS-3864.patch Thanks a lot for the quick review, Todd. Here's an updated patch which lowers the sleep time to 10 milliseconds. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch, HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443598#comment-13443598 ] Hudson commented on HDFS-3849: -- Integrated in Hadoop-Hdfs-trunk-Commit #2716 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2716/]) HDFS-3849. When re-loading the FSImage, we should clear the existing genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Fix For: 2.2.0-alpha > > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443596#comment-13443596 ] Hudson commented on HDFS-3849: -- Integrated in Hadoop-Common-trunk-Commit #2653 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2653/]) HDFS-3849. When re-loading the FSImage, we should clear the existing genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Fix For: 2.2.0-alpha > > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443593#comment-13443593 ] Todd Lipcon commented on HDFS-3864: --- +1, looks good. One thing: do you really need a 5 second sleep here, or could you do with some small number of milliseconds? I'd think a 10ms sleep should be sufficient to always fail without the bug fix, so I don't see any reason to have a long-running test. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3849: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Colin. > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Fix For: 2.2.0-alpha > > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3865) TestDistCp is @ignored
Colin Patrick McCabe created HDFS-3865: -- Summary: TestDistCp is @ignored Key: HDFS-3865 URL: https://issues.apache.org/jira/browse/HDFS-3865 Project: Hadoop HDFS Issue Type: Test Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor We should fix TestDistCp so that it actually runs, rather than being ignored. {code} @ignore public class TestDistCp { private static final Log LOG = LogFactory.getLog(TestDistCp.class); private static List pathList = new ArrayList(); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3864: - Status: Patch Available (was: Open) > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
[ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3864: - Attachment: HDFS-3864.patch Here's a patch which addresses the issue. Fortunately, the fix is quite simply - just apply the values that we read in from the edit log. In addition to the automated test provided in the patch, I also tested this manually on an HA cluster and confirmed that MR jobs no longer experience the :distributed cache object changed" errors which caused this issue to be discovered. > NN does not update internal file mtime for OP_CLOSE when reading from the > edit log > -- > > Key: HDFS-3864 > URL: https://issues.apache.org/jira/browse/HDFS-3864 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3864.patch > > > When logging an OP_CLOSE to the edit log, the NN writes out an updated file > mtime and atime. However, when reading in an OP_CLOSE from the edit log, the > NN does not apply these values to the in-memory FS data structure. Because of > this, a file's mtime or atime may appear to go back in time after an NN > restart, or an HA failover. > Most of the time this will be harmless and folks won't notice, but in the > event one of these files is being used in the distributed cache of an MR job > when an HA failover occurs, the job might notice that the mtime of a cache > file has changed, which in MR2 will cause the job to fail with an exception > like the following: > {noformat} > java.io.IOException: Resource > hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar > changed on src filesystem (expected 1342137814599, was 1342137814473 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443584#comment-13443584 ] Andy Isaacson commented on HDFS-3733: - OK, backing up -- I think my addition of CurClient just duplicates functionality already provided by NamenodeWebHdfsMethods#REMOTE_ADDRESS . So I can drop that new ThreadLocal and just teach NameNodeRpcServer to use REMOTE_ADDRESS appropriately. Or am I missing something? bq. getRemoteIp should not just return NamenodeWebHdfsMethods#getRemoteAddress (I assume you are referring to my newly added {{FSNamesystem#getRemoteIp}}.) Agreed, FSNamesystem should support all remote methods: RPC, WebHdfs ... and Hftp? The {{FSNamesystem#getRemoteIp}} should handle them all. The helper {{NameNodeRpcServer#getRemoteIp}} implements the WebHdfs portion of {{FSNamesystem#getRemoteIp}} just as {{Server#getRemoteIp}} implements the RPC portion. > Audit logs should include WebHDFS access > > > Key: HDFS-3733 > URL: https://issues.apache.org/jira/browse/HDFS-3733 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson > Attachments: hdfs-3733.txt > > > Access via WebHdfs does not result in audit log entries. It should. > {noformat} > % curl "http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS"; > {"FileStatus":{"accessTime":1343351432395,"blockSize":134217728,"group":"supergroup","length":12,"modificationTime":1342808158399,"owner":"adi","pathSuffix":"","permission":"644","replication":1,"type":"FILE"}} > {noformat} > and observe that no audit log entry is generated. > Interestingly, OPEN requests do not generate audit log entries when the NN > generates the redirect, but do generate audit log entries when the second > phase against the DN is executed. > {noformat} > % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' > ... > < HTTP/1.1 307 TEMPORARY_REDIRECT > < Location: > http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPEN&namenoderpcaddress=nn1:8020&offset=0 > ... > % curl -v > 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPEN&namenoderpcaddress=nn1:8020' > ... > < HTTP/1.1 200 OK > < Content-Type: application/octet-stream > < Content-Length: 12 > < Server: Jetty(6.1.26.cloudera.1) > < > hello world > {noformat} > This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} > thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
Aaron T. Myers created HDFS-3864: Summary: NN does not update internal file mtime for OP_CLOSE when reading from the edit log Key: HDFS-3864 URL: https://issues.apache.org/jira/browse/HDFS-3864 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers When logging an OP_CLOSE to the edit log, the NN writes out an updated file mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN does not apply these values to the in-memory FS data structure. Because of this, a file's mtime or atime may appear to go back in time after an NN restart, or an HA failover. Most of the time this will be harmless and folks won't notice, but in the event one of these files is being used in the distributed cache of an MR job when an HA failover occurs, the job might notice that the mtime of a cache file has changed, which in MR2 will cause the job to fail with an exception like the following: {noformat} java.io.IOException: Resource hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar changed on src filesystem (expected 1342137814599, was 1342137814473 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443577#comment-13443577 ] Colin Patrick McCabe commented on HDFS-3731: bq. Do you have a list of ones you know about? If not I can start pulling on that thread tomorrow. Sorry, I just took a preliminary look, didn't have time to go in depth. The state machine errors are pretty clear in the test. You may need to wait a while for them to appear since surefire does a lot of buffering. > 2.0 release upgrade must handle blocks being written from 1.0 > - > > Key: HDFS-3731 > URL: https://issues.apache.org/jira/browse/HDFS-3731 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Suresh Srinivas >Assignee: Colin Patrick McCabe >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch > > > Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 > release. Problem reported by Brahma Reddy. > The {{DataNode}} will only have one block pool after upgrading from a 1.x > release. (This is because in the 1.x releases, there were no block pools-- > or equivalently, everything was in the same block pool). During the upgrade, > we should hardlink the block files from the {{blocksBeingWritten}} directory > into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, > we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3863) QJM: track last "committed" txid
[ https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443556#comment-13443556 ] Todd Lipcon commented on HDFS-3863: --- The design here is pretty simple, given the way our journaling protocol works. In particular, we only have one outstanding "batch" of transactions at once. We never send a batch of transactions beginning at txid N until the prior batch (up through N-1) has been accepted at a quorum of nodes. Thus, any {{sendEdits()}} call with {{firstTxId}} N implies a {{commit(N-1)}}. So, my plan is as follows: - Introduce a new file inside the journal directory called {{committed-txid}}. This would include a single numeric text line, similar to the {{seen_txid}} that the NameNode maintains. - Since this whole feature is not required for correctness, we don't need to fsync this file on every update. Instead, we can let the operating system write it out to disk whenever it so chooses. If, after a system crash, it reverts to an earlier value, this is OK, since our recovery protocol doesn't depend on it being up-to-date in any way. Put another way, the invariant is that the file contains a value which is a lower bound on the latest committed txn. The data would be when any sendEdits() call is made -- the call implicitly commits all edits prior to the current batch. This alone is enough for a good sanity check. If we want to also support reading the committed transactions while in-progress, it's not quite sufficient -- the last batch of transactions will never be readable if the NN stops writing new batches for a protracted period of time. To solve this, we can add a timer thread to the client which periodically (eg once or twice a second) sends an RPC to update the committed-txid on all of the nodes. The periodic timer will also have the nice property of causing a NN which has been fenced to abort itself even if no write transactions are taking place. > QJM: track last "committed" txid > > > Key: HDFS-3863 > URL: https://issues.apache.org/jira/browse/HDFS-3863 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > Per some discussion with [~stepinto] > [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], > we should keep track of the "last committed txid" on each JournalNode. Then > during any recovery operation, we can sanity-check that we aren't asked to > truncate a log to an earlier transaction. > This is also a necessary step if we want to support reading from in-progress > segments in the future (since we should only allow reads up to the commit > point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443551#comment-13443551 ] Robert Joseph Evans commented on HDFS-3731: --- Do you have a list of ones you know about? If not I can start pulling on that thread tomorrow. > 2.0 release upgrade must handle blocks being written from 1.0 > - > > Key: HDFS-3731 > URL: https://issues.apache.org/jira/browse/HDFS-3731 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Suresh Srinivas >Assignee: Colin Patrick McCabe >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch > > > Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 > release. Problem reported by Brahma Reddy. > The {{DataNode}} will only have one block pool after upgrading from a 1.x > release. (This is because in the 1.x releases, there were no block pools-- > or equivalently, everything was in the same block pool). During the upgrade, > we should hardlink the block files from the {{blocksBeingWritten}} directory > into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, > we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3863) QJM: track last "committed" txid
Todd Lipcon created HDFS-3863: - Summary: QJM: track last "committed" txid Key: HDFS-3863 URL: https://issues.apache.org/jira/browse/HDFS-3863 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Per some discussion with [~stepinto] [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], we should keep track of the "last committed txid" on each JournalNode. Then during any recovery operation, we can sanity-check that we aren't asked to truncate a log to an earlier transaction. This is also a necessary step if we want to support reading from in-progress segments in the future (since we should only allow reads up to the commit point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443541#comment-13443541 ] Hadoop QA commented on HDFS-3849: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542806/HDFS-3849.003.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3112//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3112//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3112//console This message is automatically generated. > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1490) TransferFSImage should timeout
[ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443526#comment-13443526 ] Todd Lipcon commented on HDFS-1490: --- - I dont like reusing the ipc ping interval for this timeout here. It's from an entirely separate module, and I don't see why one should correlate to the other. Why not introduce a new config which defaults to something like 1 minute? - In the test case, shouldn't you somehow notify the servlet to exit? Currently it waits on itself, but nothing notifies it. > TransferFSImage should timeout > -- > > Key: HDFS-1490 > URL: https://issues.apache.org/jira/browse/HDFS-1490 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov >Priority: Minor > Attachments: HDFS-1490.patch, HDFS-1490.patch > > > Sometimes when primary crashes during image transfer secondary namenode would > hang trying to read the image from HTTP connection forever. > It would be great to set timeouts on the connection so if something like that > happens there is no need to restart the secondary itself. > In our case restarting components is handled by the set of scripts and since > the Secondary as the process is running it would just stay hung until we get > an alarm saying the checkpointing doesn't happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443524#comment-13443524 ] Hadoop QA commented on HDFS-3373: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542795/HDFS-3373.trunk.patch.1 against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3110//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3110//console This message is automatically generated. > FileContext HDFS implementation can leak socket caches > -- > > Key: HDFS-3373 > URL: https://issues.apache.org/jira/browse/HDFS-3373 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Todd Lipcon >Assignee: John George > Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, > HDFS-3373.trunk.patch.1 > > > As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, > and thus never calls DFSClient.close(). This means that, until finalizers > run, DFSClient will hold on to its SocketCache object and potentially have a > lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443518#comment-13443518 ] Todd Lipcon commented on HDFS-3862: --- I think this might be the case for BookKeeper as well. Any of the folks working on BKJM want to take this on? I anticipate we would add a simple API to JournalManager like: {{boolean isNativelySingleWriter();}} or {{boolean needsExternalFencing();}}. Then the failover code could check the shared storage dir to see if this is the case, and if so, not error out if the user doesn't specify a fence method. > QJM: don't require a fencer to be configured if shared storage has built-in > single-writer semantics > --- > > Key: HDFS-3862 > URL: https://issues.apache.org/jira/browse/HDFS-3862 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon > > Currently, NN HA requires that the administrator configure a fencing method > to ensure that only a single NameNode may write to the shared storage at a > time. Some shared edits storage implementations (like QJM) inherently enforce > single-writer semantics at the storage level, and thus the user should not be > forced to specify one. > We should extend the JournalManager interface so that the HA code can operate > without a configured fencer if the JM has such built-in fencing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
Todd Lipcon created HDFS-3862: - Summary: QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification
[ https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443483#comment-13443483 ] Todd Lipcon commented on HDFS-3859: --- Sure, it's overkill, but it's not that expensive and we already have an implementation of it sitting around. It's also handy because "md5sum" is commonly available on the command line, and we use it for FSImages already as well. Performance-wise, my laptop can md5sum at about 500MB/sec, so given that log segments under recovery are likely to be much smaller than 500M, I don't think we should be concerned about that. > QJM: implement md5sum verification > -- > > Key: HDFS-3859 > URL: https://issues.apache.org/jira/browse/HDFS-3859 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > When the QJM passes journal segments between nodes, it should use an md5sum > field to make sure the data doesn't get corrupted during transit. This also > serves as an extra safe-guard to make sure that the data is consistent across > all nodes when finalizing a segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification
[ https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443476#comment-13443476 ] Steve Loughran commented on HDFS-3859: -- Isn't MD5 overkill? Can't a good CRC (like TCP Jumbo Frames uses) suffice? > QJM: implement md5sum verification > -- > > Key: HDFS-3859 > URL: https://issues.apache.org/jira/browse/HDFS-3859 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > When the QJM passes journal segments between nodes, it should use an md5sum > field to make sure the data doesn't get corrupted during transit. This also > serves as an extra safe-guard to make sure that the data is consistent across > all nodes when finalizing a segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443463#comment-13443463 ] Colin Patrick McCabe commented on HDFS-3861: Looks good to me. > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443445#comment-13443445 ] Aaron T. Myers commented on HDFS-3849: -- +1 pending Jenkins. > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.
[ https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3849: --- Attachment: HDFS-3849.003.patch * don't set DT config > When re-loading the FSImage, we should clear the existing genStamp and leases. > -- > > Key: HDFS-3849 > URL: https://issues.apache.org/jira/browse/HDFS-3849 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.2.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, > HDFS-3849.003.patch > > > When re-loading the FSImage, we should clear the existing genStamp and leases. > This is an issue in the 2NN, because it sometimes clears the existing FSImage > and reloads a new one in order to get back in sync with the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443401#comment-13443401 ] Kihwal Lee commented on HDFS-3861: -- - The test failures are not related to this patch. - No test was added. Existing test case exposed this bug (TestDataNodeDeath). - The findbugs warning is not caused by this patch. > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.
[ https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443394#comment-13443394 ] Hadoop QA commented on HDFS-2815: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542794/HDFS-2815-branch-1.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3111//console This message is automatically generated. > Namenode is not coming out of safemode when we perform ( NN crash + restart ) > . Also FSCK report shows blocks missed. > -- > > Key: HDFS-2815 > URL: https://issues.apache.org/jira/browse/HDFS-2815 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 2.0.0-alpha, 3.0.0 > > Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, > HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch > > > When tested the HA(internal) with continuous switch with some 5mins gap, > found some *blocks missed* and namenode went into safemode after next switch. > >After the analysis, i found that this files already deleted by clients. > But i don't see any delete commands logs namenode log files. But namenode > added that blocks to invalidateSets and DNs deleted the blocks. >When restart of the namenode, it went into safemode and expecting some > more blocks to come out of safemode. >Here the reason could be that, file has been deleted in memory and added > into invalidates after this it is trying to sync the edits into editlog file. > By that time NN asked DNs to delete that blocks. Now namenode shuts down > before persisting to editlogs.( log behind) >Due to this reason, we may not get the INFO logs about delete, and when we > restart the Namenode (in my scenario it is again switch), Namenode expects > this deleted blocks also, as delete request is not persisted into editlog > before. >I reproduced this scenario with bedug points. *I feel, We should not add > the blocks to invalidates before persisting into Editlog*. > Note: for switch, we used kill -9 (force kill) > I am currently in 0.20.2 version. Same verified in 0.23 as well in normal > crash + restart scenario. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443387#comment-13443387 ] Hadoop QA commented on HDFS-3837: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542780/hdfs-3837.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3108//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3108//console This message is automatically generated. > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443377#comment-13443377 ] Colin Patrick McCabe commented on HDFS-3731: bq. Any update on branch-0.23? Do you want me to look into it? There are some differences in the branch-0.23 BlockManager state machine, such that a straight port of the patch doesn't work. The easiest thing to do would probably be to backport some of the BlockManager fixes and improvements to branch-0.23. If you would look into that it would be good. > 2.0 release upgrade must handle blocks being written from 1.0 > - > > Key: HDFS-3731 > URL: https://issues.apache.org/jira/browse/HDFS-3731 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Suresh Srinivas >Assignee: Colin Patrick McCabe >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch > > > Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 > release. Problem reported by Brahma Reddy. > The {{DataNode}} will only have one block pool after upgrading from a 1.x > release. (This is because in the 1.x releases, there were no block pools-- > or equivalently, everything was in the same block pool). During the upgrade, > we should hardlink the block files from the {{blocksBeingWritten}} directory > into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, > we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443375#comment-13443375 ] Colin Patrick McCabe commented on HDFS-3540: Hi Nicholas, Your summary seems reasonable to me overall. I agree with you that the recommended setting for edit log toleration should be disabled. Is there anything left to do for this JIRA? > Further improvement on recovery mode and edit log toleration in branch-1 > > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.2.0 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443367#comment-13443367 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443353#comment-13443353 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Common-trunk-Commit #2651 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443351#comment-13443351 ] Hadoop QA commented on HDFS-3861: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542787/hdfs-3861.patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3109//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3109//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3109//console This message is automatically generated. > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443338#comment-13443338 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: recovery-mode.pdf Here is an updated Recovery Mode design document. > Implement Recovery Mode > --- > > Key: HDFS-3004 > URL: https://issues.apache.org/jira/browse/HDFS-3004 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.0.0-alpha > > Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, > HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, > HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, > HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, > HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch, > HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, > HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch, > HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, > HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch, > HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch, > HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch, > HDFS-3004__namenode_recovery_tool.txt, recovery-mode.pdf > > > When the NameNode metadata is corrupt for some reason, we want to be able to > fix it. Obviously, we would prefer never to get in this case. In a perfect > world, we never would. However, bad data on disk can happen from time to > time, because of hardware errors or misconfigurations. In the past we have > had to correct it manually, which is time-consuming and which can result in > downtime. > Recovery mode is initialized by the system administrator. When the NameNode > starts up in Recovery Mode, it will try to load the FSImage file, apply all > the edits from the edits log, and then write out a new image. Then it will > shut down. > Unlike in the normal startup process, the recovery mode startup process will > be interactive. When the NameNode finds something that is inconsistent, it > will prompt the operator as to what it should do. The operator can also > choose to take the first option for all prompts by starting up with the '-f' > flag, or typing 'a' at one of the prompts. > I have reused as much code as possible from the NameNode in this tool. > Hopefully, the effort that was spent developing this will also make the > NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-3373: -- Attachment: HDFS-3373.trunk.patch.1 TestConnCache failure is related to this JIRA. I had moved testDisableCache() from that test to another test file because now it is not possible to change cache config per DFS. TestHftpDelegationToken is unrelated to this patch and has been failing in other builds as well. Attaching a patch with testDisableCache() removed from TestConnCache to a new file > FileContext HDFS implementation can leak socket caches > -- > > Key: HDFS-3373 > URL: https://issues.apache.org/jira/browse/HDFS-3373 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Todd Lipcon >Assignee: John George > Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, > HDFS-3373.trunk.patch.1 > > > As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, > and thus never calls DFSClient.close(). This means that, until finalizers > run, DFSClient will hold on to its SocketCache object and potentially have a > lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-3373: -- Status: Patch Available (was: Open) > FileContext HDFS implementation can leak socket caches > -- > > Key: HDFS-3373 > URL: https://issues.apache.org/jira/browse/HDFS-3373 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Todd Lipcon >Assignee: John George > Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, > HDFS-3373.trunk.patch.1 > > > As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, > and thus never calls DFSClient.close(). This means that, until finalizers > run, DFSClient will hold on to its SocketCache object and potentially have a > lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-3373: -- Status: Open (was: Patch Available) > FileContext HDFS implementation can leak socket caches > -- > > Key: HDFS-3373 > URL: https://issues.apache.org/jira/browse/HDFS-3373 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Todd Lipcon >Assignee: John George > Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch > > > As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, > and thus never calls DFSClient.close(). This means that, until finalizers > run, DFSClient will hold on to its SocketCache object and potentially have a > lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.
[ https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2815: -- Attachment: HDFS-2815-branch-1.patch > Namenode is not coming out of safemode when we perform ( NN crash + restart ) > . Also FSCK report shows blocks missed. > -- > > Key: HDFS-2815 > URL: https://issues.apache.org/jira/browse/HDFS-2815 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 2.0.0-alpha, 3.0.0 > > Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, > HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch > > > When tested the HA(internal) with continuous switch with some 5mins gap, > found some *blocks missed* and namenode went into safemode after next switch. > >After the analysis, i found that this files already deleted by clients. > But i don't see any delete commands logs namenode log files. But namenode > added that blocks to invalidateSets and DNs deleted the blocks. >When restart of the namenode, it went into safemode and expecting some > more blocks to come out of safemode. >Here the reason could be that, file has been deleted in memory and added > into invalidates after this it is trying to sync the edits into editlog file. > By that time NN asked DNs to delete that blocks. Now namenode shuts down > before persisting to editlogs.( log behind) >Due to this reason, we may not get the INFO logs about delete, and when we > restart the Namenode (in my scenario it is again switch), Namenode expects > this deleted blocks also, as delete request is not persisted into editlog > before. >I reproduced this scenario with bedug points. *I feel, We should not add > the blocks to invalidates before persisting into Editlog*. > Note: for switch, we used kill -9 (force kill) > I am currently in 0.20.2 version. Same verified in 0.23 as well in normal > crash + restart scenario. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-3861: Assignee: Kihwal Lee > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443296#comment-13443296 ] Suresh Srinivas commented on HDFS-3791: --- when I added this in trunk, I was not sure if there is a usecase. The whole idea was to give up lock once deleting some number of blocks. So the number currently is arbitrary. > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 1.2.0 > > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443292#comment-13443292 ] Jing Zhao commented on HDFS-3860: - I just checked all the invocation of namesystem#writelock / writeunlock, and did not find similar problems. I will check other similar code too. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443289#comment-13443289 ] Suresh Srinivas commented on HDFS-3860: --- Thanks Aaron for committing the patch. bq. BTW could you please also ensure that this pattern of code is not repeated in any other places. Going back to my previous comment, Jing, if possible can you also see if there other such issues. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443286#comment-13443286 ] Suresh Srinivas commented on HDFS-3837: --- If this is a findbugs issue, why not just add this to findbugs exclude? > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3860: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Jing. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3861: - Attachment: hdfs-3861.patch.txt > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3861: - Status: Patch Available (was: Open) > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443271#comment-13443271 ] Aaron T. Myers commented on HDFS-3860: -- Oof, good catch, Jing. Fortunately this case seems like it would be pretty tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck will return early, so to hit this the NN would have to enter SM in a very short window of time. Still certainly worth fixing, though. The patch looks good to me. The findbugs warning is unrelated and TestHftpDelegationToken is known to currently be failing. +1, I'll commit this momentarily. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443269#comment-13443269 ] Kihwal Lee commented on HDFS-3861: -- DFSClient#getLeaseRenewer() doesn't have to be synchronized since LeaseManager.Factory methods are synchronized. Multiple callers are still guaranteed to get a single live renewer back. {noformat} Java stack information for the threads listed above: === "Thread-28": at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1729) - waiting to lock <0xb5a05dc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:674) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:691) - locked <0xb5a06ed8> (a org.apache.hadoop.hdfs.DFSClient) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2386) - locked <0xb44b00e8> (a org.apache.hadoop.fs.FileSystem$Cache) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2403) - locked <0xb44b0100> (a org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) "Thread-1175": at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:538) - waiting to lock <0xb5a06ed8> (a org.apache.hadoop.hdfs.DFSClient) at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:550) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1757) - locked <0xb5a05dc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99) at org.apache.hadoop.hdfs.TestDatanodeDeath$Workload.run(TestDatanodeDeath.java:101) {noformat} > Deadlock in DFSClient > - > > Key: HDFS-3861 > URL: https://issues.apache.org/jira/browse/HDFS-3861 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Kihwal Lee >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.2.0-alpha > > Attachments: hdfs-3861.patch.txt > > > The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3861) Deadlock in DFSClient
Kihwal Lee created HDFS-3861: Summary: Deadlock in DFSClient Key: HDFS-3861 URL: https://issues.apache.org/jira/browse/HDFS-3861 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Kihwal Lee Priority: Blocker Fix For: 0.23.4, 3.0.0, 2.2.0-alpha The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225
[ https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443264#comment-13443264 ] Hadoop QA commented on HDFS-3852: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542779/HDFS-3852.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3107//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3107//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3107//console This message is automatically generated. > TestHftpDelegationToken is broken after HADOOP-8225 > --- > > Key: HDFS-3852 > URL: https://issues.apache.org/jira/browse/HDFS-3852 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, security >Affects Versions: 0.23.3, 2.1.0-alpha >Reporter: Aaron T. Myers >Assignee: Daryn Sharp > Attachments: HDFS-3852.patch > > > It's been failing in all builds for the last 2 days or so. Git bisect > indicates that it's due to HADOOP-8225. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443245#comment-13443245 ] Ted Yu commented on HDFS-3791: -- Currently small deletion is determined by the constant BLOCK_DELETION_INCREMENT: {code} + deleteNow = collectedBlocks.size() <= BLOCK_DELETION_INCREMENT; {code} I wonder if there is use case where the increment should be configurable. > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 1.2.0 > > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3837: -- Attachment: hdfs-3837.txt The findbugs warning seems bogus - "This method calls equals(Object) on two references of different class types with no common subclasses. Therefore, the objects being compared are unlikely to be members of the same class at runtime". Both DatanodeInfo and DatanodeRegistration extend DatanodeID so they both share the equals implementation. Anyway, I'll put the relevant code back (cast the array) since this fixes the findbugs warning is is fine (just more verbose). {code} -DatanodeID[] datanodeids = rBlock.getLocations(); +DatanodeInfo[] targets = rBlock.getLocations(); +DatanodeID[] datanodeids = (DatanodeID[])targets; {code} Updated patch, includes the comments as well so it's clear both classes are using the same equals method. > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443221#comment-13443221 ] Robert Joseph Evans commented on HDFS-3731: --- Any update on branch-0.23? Do you want me to look into it? > 2.0 release upgrade must handle blocks being written from 1.0 > - > > Key: HDFS-3731 > URL: https://issues.apache.org/jira/browse/HDFS-3731 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Suresh Srinivas >Assignee: Colin Patrick McCabe >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch > > > Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 > release. Problem reported by Brahma Reddy. > The {{DataNode}} will only have one block pool after upgrading from a 1.x > release. (This is because in the 1.x releases, there were no block pools-- > or equivalently, everything was in the same block pool). During the upgrade, > we should hardlink the block files from the {{blocksBeingWritten}} directory > into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, > we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225
[ https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443212#comment-13443212 ] Aaron T. Myers commented on HDFS-3852: -- Got it. Makes sense. Thanks for the explanation, Daryn, and thanks for looking into this issue. The patch looks good to me. +1 pending Jenkins. > TestHftpDelegationToken is broken after HADOOP-8225 > --- > > Key: HDFS-3852 > URL: https://issues.apache.org/jira/browse/HDFS-3852 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, security >Affects Versions: 0.23.3, 2.1.0-alpha >Reporter: Aaron T. Myers >Assignee: Daryn Sharp > Attachments: HDFS-3852.patch > > > It's been failing in all builds for the last 2 days or so. Git bisect > indicates that it's due to HADOOP-8225. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225
[ https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3852: -- Attachment: HDFS-3852.patch The test is attempting to insert two tokens with the same service. The UGI's private creds is a list which happily accepted tokens with duplicate services and even duplicate tokens. When I changed UGI in HADOOP-8225 to allow extraction of a {{Credentials}} object from the UGI, it broke the test because {{Credentials}} uses a map for tokens which naturally doesn't allow for service dups. The test is really trying to ensure the correct token is retrieved for htftp so I changed the 2nd token to have a different service to prevent it replacing the first token. Arguably, multiple tokens for the same service with different kinds should be permissible. However in practice that is/was not "possible" because a {{Credentials}} (which doesn't allow service dups) is used to build up tokens to be dumped into the UGI. > TestHftpDelegationToken is broken after HADOOP-8225 > --- > > Key: HDFS-3852 > URL: https://issues.apache.org/jira/browse/HDFS-3852 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, security >Affects Versions: 0.23.3, 2.1.0-alpha >Reporter: Aaron T. Myers >Assignee: Daryn Sharp > Attachments: HDFS-3852.patch > > > It's been failing in all builds for the last 2 days or so. Git bisect > indicates that it's due to HADOOP-8225. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225
[ https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3852: -- Status: Patch Available (was: Open) > TestHftpDelegationToken is broken after HADOOP-8225 > --- > > Key: HDFS-3852 > URL: https://issues.apache.org/jira/browse/HDFS-3852 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, security >Affects Versions: 0.23.3, 2.1.0-alpha >Reporter: Aaron T. Myers >Assignee: Daryn Sharp > Attachments: HDFS-3852.patch > > > It's been failing in all builds for the last 2 days or so. Git bisect > indicates that it's due to HADOOP-8225. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure
[ https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443151#comment-13443151 ] Hudson commented on HDFS-3856: -- Integrated in Hadoop-Mapreduce-trunk #1179 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1179/]) Fixup CHANGELOG for HDFS-3856. (Revision 1377936) HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. Contributed by Colin Patrick McCabe (Revision 1377934) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377936 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377934 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java > TestHDFSServerPorts failure is causing surefire fork failure > > > Key: HDFS-3856 > URL: https://issues.apache.org/jira/browse/HDFS-3856 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.2.0-alpha >Reporter: Thomas Graves >Assignee: Eli Collins >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hdfs-3856.txt, hdfs-3856.txt > > > We have been seeing the hdfs tests on trunk and branch-2 error out with fork > failures. I see the hadoop jenkins trunk build is also seeing these: > https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure
[ https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443123#comment-13443123 ] Hudson commented on HDFS-3856: -- Integrated in Hadoop-Hdfs-trunk #1148 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1148/]) Fixup CHANGELOG for HDFS-3856. (Revision 1377936) HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. Contributed by Colin Patrick McCabe (Revision 1377934) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377936 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377934 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java > TestHDFSServerPorts failure is causing surefire fork failure > > > Key: HDFS-3856 > URL: https://issues.apache.org/jira/browse/HDFS-3856 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.2.0-alpha >Reporter: Thomas Graves >Assignee: Eli Collins >Priority: Blocker > Fix For: 2.2.0-alpha > > Attachments: hdfs-3856.txt, hdfs-3856.txt > > > We have been seeing the hdfs tests on trunk and branch-2 error out with fork > failures. I see the hadoop jenkins trunk build is also seeing these: > https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443113#comment-13443113 ] Suresh Srinivas commented on HDFS-3837: --- Seems to me the findbugs is not fixed by the new patch or is it Jenkins error. Fixing this issue quickly will help. Currently all Jenkins reports have findbugs -1 for precommit tests. {noformat} Call to equals() comparing different types in org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) Bug type EC_UNRELATED_TYPES (click for details) In class org.apache.hadoop.hdfs.server.datanode.DataNode In method org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) Actual type org.apache.hadoop.hdfs.protocol.DatanodeInfo Expected org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration Value loaded from id Value loaded from bpReg org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration.equals(Object) used to determine equality At DataNode.java:[line 1869] {noformat} > Fix DataNode.recoverBlock findbugs warning > -- > > Key: HDFS-3837 > URL: https://issues.apache.org/jira/browse/HDFS-3837 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3837.txt, hdfs-3837.txt > > > HDFS-2686 introduced the following findbugs warning: > {noformat} > Call to equals() comparing different types in > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) > {noformat} > Both are using DatanodeID#equals but it's a different method because > DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443105#comment-13443105 ] Hadoop QA commented on HDFS-3860: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console This message is automatically generated. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443103#comment-13443103 ] Uma Maheswara Rao G commented on HDFS-3791: --- Oh, I have just seen the comments. {quote} Uma sorry for the delay in reviewing this. +1 for the patch. {quote} No problem :-). Thanks a lot, Suresh for the reviews. Also thanks for rebasing it. I will to get a patch for HDFS-2815 today in some time. > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 1.2.0 > > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-3791. --- Resolution: Fixed Fix Version/s: 1.2.0 Hadoop Flags: Reviewed I committed the patch. Thank you Uma. > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 1.2.0 > > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-3791: -- Attachment: HDFS-3791.patch Rebased the patch on latest branch-1 > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
[ https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443075#comment-13443075 ] Suresh Srinivas commented on HDFS-3791: --- Uma sorry for the delay in reviewing this. +1 for the patch. > Backport HDFS-173 to Branch-1 : Recursively deleting a directory with > millions of files makes NameNode unresponsive for other commands until the > deletion completes > > > Key: HDFS-3791 > URL: https://issues.apache.org/jira/browse/HDFS-3791 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-3791.patch, HDFS-3791.patch > > > Backport HDFS-173. > see the > [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] > for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443056#comment-13443056 ] Suresh Srinivas commented on HDFS-3860: --- Jing, nice find. Submitting the patch. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443058#comment-13443058 ] Suresh Srinivas commented on HDFS-3860: --- BTW could you please also ensure that this pattern of code is not repeated in any other places. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-3860: -- Status: Patch Available (was: Open) > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write
[ https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned HDFS-3847: --- Assignee: (was: Devaraj K) > using NFS As a shared storage for NameNode HA , how to ensure that only one > write > - > > Key: HDFS-3847 > URL: https://issues.apache.org/jira/browse/HDFS-3847 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: liaowenrui >Priority: Critical > Fix For: 2.0.0-alpha > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write
[ https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned HDFS-3847: --- Assignee: Devaraj K > using NFS As a shared storage for NameNode HA , how to ensure that only one > write > - > > Key: HDFS-3847 > URL: https://issues.apache.org/jira/browse/HDFS-3847 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: liaowenrui >Assignee: Devaraj K >Priority: Critical > Fix For: 2.0.0-alpha > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira