[jira] [Commented] (HDFS-5462) Fail to compile in Branch HDFS-2832 with COMPILATION ERROR
[ https://issues.apache.org/jira/browse/HDFS-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813736#comment-13813736 ] Junping Du commented on HDFS-5462: -- Hi Wenwu, this should be some warnings instead of an errors and not involved by any changes on HDFS-2832. Please check your build env. Fail to compile in Branch HDFS-2832 with COMPILATION ERROR --- Key: HDFS-5462 URL: https://issues.apache.org/jira/browse/HDFS-5462 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: wenwupeng Failed to compile HDFS in Branch HDFS-2832 with COMPILATION ERROR , OutputFormat is Sun proprietary API and may be removed in a future release [INFO] Compiling 276 source files to /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[134,41] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[135,14] sun.misc.Cleaner is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[136,22] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [INFO] 10 errors -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813747#comment-13813747 ] Jing Zhao commented on HDFS-5443: - bq. Here actual problem is not containing the 0-sized blocks, but counting them also in safemode threshold as these are loaded as COMPLETE blocks Agree. But in the meanwhile, we also should clear these 0-sized block since if the corresponding file is only in snapshot, no one will finalize the block I guess. That's why I think maybe we should fix this part in a separate jira. I think for the safemode part, as Vinay mentioned, the key issue is still the current code fails to recognize INodeFileUC if the file is in snapshot and the deletion is on its parent/ancestral directory, while loading the fsimage. I think HDFS-5428 can solve the problem, but it may overkill the problem because in the current HDFS-5428 patch we need to keep records in the lease map, and maintain these records even for snapshot deletion and renaming. Since the safemode issue only happens when starting NN, can we fix the problem by: 1. recording extra information in fsimage to indicate INodeFileUC that are only in snapshots 2. re-generating all the INodeFileUC when loading fsimage 3. using a similar workaround as in HDFS-5283. For 12, we need to cover the files that are deleted through its ancestral directory. To avoid the incompatibility of fsimage, we can put the extra information to the under construction files section of the fsimage. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Attachment: 5443-test.patch Upload unit tests to reproduce the issue while clearing the 0-sized blocks. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813770#comment-13813770 ] Uma Maheswara Rao G commented on HDFS-5443: --- {quote} problem of 0-sized blocks is there with normal files also (HDFS-4516), but that will not cause in NN safemode because file will be an under construction file and 0-sized block will not be counted in safemode threshold. {quote} Yep, This JIRA explains the same. Please see description and first comment. {quote} ut counting them also in safemode threshold as these are loaded as COMPLETE blocks {quote} Here point was we no need to keep them in snapshotted files.(There was inconsistency in the flow) . If there is simple way to wipe out all the file 0-sized blocks consistently in someway, that will be good to address this. Anyway, leases maintaining may solve as that will be same as normal file UC. Let Sathish verify this with that patch. But I am little uncomfortable for managing leases for snapshotted files as they are readonly files, no need of leases. If all others ok on that point, I will not object. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813776#comment-13813776 ] Uma Maheswara Rao G commented on HDFS-5443: --- Oh, I did not see your comment. Thanks Jing for patch. {quote} I think HDFS-5428 can solve the problem, but it may overkill the problem because in the current HDFS-5428 patch we need to keep records in the lease map, and maintain these records even for snapshot deletion and renaming. {quote} Exactly. This is what I was trying to indicate with my above comment. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813777#comment-13813777 ] Vinay commented on HDFS-5443: - bq. 1. recording extra information in fsimage to indicate INodeFileUC that are only in snapshots These extra information only kept as snapshot leases. It will keep track all the time instead of only at the time of checkpointing. Also it will keep bq. 2. re-generating all the INodeFileUC when loading fsimage This will happen as loading leases. and also blocksmap will be updated with UNDERCONSTRUCTION state bq. 3. using a similar workaround as in HDFS-5283. As we already excluding under construction blocks, this workaround no more required. bq. To avoid the incompatibility of fsimage, we can put the extra information to the under construction files section of the fsimage. Yes. exactly because of this reason I went for the approach of storing these files as leases. Because this section will be stored from the leases and leases will be loaded from this section. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs
[ https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813782#comment-13813782 ] Hadoop QA commented on HDFS-5458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612086/HDFS-5458-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5335//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5335//console This message is automatically generated. Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs -- Key: HDFS-5458 URL: https://issues.apache.org/jira/browse/HDFS-5458 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Mike Mellenthin Attachments: HDFS-5458-1.patch Saw a stacktrace of datanode startup with a bad volume, where even listing directories would throw an IOException. The failed volume threshold was set to 1, but it would fatally error out in {{File#getCanonicalPath}} in {{getDataDirsFromURIs}}: {code} File dir = new File(dirURI.getPath()); try { dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI())); dirs.add(dir); } catch (IOException ioe) { LOG.warn(Invalid + DFS_DATANODE_DATA_DIR_KEY + + dir + : , ioe); invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ ); } {code} Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, this catch clause doesn't properly protect startup from a failed volume. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813784#comment-13813784 ] Konstantin Shvachko commented on HDFS-2832: --- UUID#randomUUID generates RFC-4122 compliant UUIDs which are unique *for all practical purposes* RFC-4122 has a special note about distributed applications. But let's just think about it in general. randomUUID is based on pseudo random sequence of numbers, which is like a Mobius Strip or just a loop. It actually works well if you generate IDs on a single node, because the sequence lasts long without repetitions. In our case we initiate thousands of pseudo random sequences (one per node), each starting from a random number. Let's mark those starting numbers on the Mobius Strip or the loop. Then we actually decreased the probability of uniqueness because now in order to get a collision one of the nodes need to reach the starting point of another node, rather than going all around the loop. So in distributed environment we increase the probability of collision with each new node added. And when you add more storage types per node you further increase the collision probability. for all practical purposes as I understand it in the case means that probability of non-unique IDs is low. But it does not mean impossible. The consequences of a storageID collision are pretty bad, hard to detect and recover. At the same time {{DataNode.createNewStorageId()}} generates unique IDs as of today. Why changing it to a problematic approach? Part of the rationale is in HDFS-5115. Making them UUIDs simplifies the generation logic. Looks like HDFS-5115 was based on an incomplete assumption: bq. The Storage ID is currently generated from the DataNode's IP+Port+Random components while in fact it also includes currentTime, which guarantees the uniqueness of ids generated on the same node, unless somebody resets the machine clock to the past. Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813845#comment-13813845 ] Hudson commented on HDFS-5427: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #383 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/383/]) HDFS-5427. Not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538875) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5456) NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist.
[ https://issues.apache.org/jira/browse/HDFS-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813846#comment-13813846 ] Hudson commented on HDFS-5456: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #383 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/383/]) HDFS-5456. NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538872) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/StartupProgress.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/TestStartupProgress.java NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. - Key: HDFS-5456 URL: https://issues.apache.org/jira/browse/HDFS-5456 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Critical Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5456.1.patch NameNode startup progress is supposed to be immutable after startup has completed. All methods are coded to ignore update attempts after startup has completed. However, {{StartupProgress#getCounter}} does not implement this correctly. If a caller attempts to get a counter for a new step that hasn't been seen before, then the method accidentally creates the step. This allocates additional space in the internal tracking data structures, so ultimately this is a memory leak. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813899#comment-13813899 ] Hudson commented on HDFS-5427: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1600 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1600/]) HDFS-5427. Not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538875) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5456) NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist.
[ https://issues.apache.org/jira/browse/HDFS-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813900#comment-13813900 ] Hudson commented on HDFS-5456: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1600 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1600/]) HDFS-5456. NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538872) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/StartupProgress.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/TestStartupProgress.java NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. - Key: HDFS-5456 URL: https://issues.apache.org/jira/browse/HDFS-5456 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Critical Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5456.1.patch NameNode startup progress is supposed to be immutable after startup has completed. All methods are coded to ignore update attempts after startup has completed. However, {{StartupProgress#getCounter}} does not implement this correctly. If a caller attempts to get a counter for a new step that hasn't been seen before, then the method accidentally creates the step. This allocates additional space in the internal tracking data structures, so ultimately this is a memory leak. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5463) NameNode should limit to number of blocks per file
Vinay created HDFS-5463: --- Summary: NameNode should limit to number of blocks per file Key: HDFS-5463 URL: https://issues.apache.org/jira/browse/HDFS-5463 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Currently there is no limit to number of blocks user can write to a file. And blocksize also can be set to minimum possible. User can write any number of blocks continously, which may create problems in NameNodes performance and service as the number of blocks of file increases. Because each time new block allocated, all blocks of the file will be persisted, and this can cause serious performance degradation So proposal is to limit the number of maximum blocks a user can write to a file. May be 1024 blocks(if 128*MB is block size then 128 GB can be max file size) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5463) NameNode should limit the number of blocks per file
[ https://issues.apache.org/jira/browse/HDFS-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5463: Summary: NameNode should limit the number of blocks per file (was: NameNode should limit to number of blocks per file) NameNode should limit the number of blocks per file --- Key: HDFS-5463 URL: https://issues.apache.org/jira/browse/HDFS-5463 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Currently there is no limit to number of blocks user can write to a file. And blocksize also can be set to minimum possible. User can write any number of blocks continously, which may create problems in NameNodes performance and service as the number of blocks of file increases. Because each time new block allocated, all blocks of the file will be persisted, and this can cause serious performance degradation So proposal is to limit the number of maximum blocks a user can write to a file. May be 1024 blocks(if 128*MB is block size then 128 GB can be max file size) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813917#comment-13813917 ] Hudson commented on HDFS-5427: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1574 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1574/]) HDFS-5427. Not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538875) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5456) NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist.
[ https://issues.apache.org/jira/browse/HDFS-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813918#comment-13813918 ] Hudson commented on HDFS-5456: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1574 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1574/]) HDFS-5456. NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538872) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/StartupProgress.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/TestStartupProgress.java NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist. - Key: HDFS-5456 URL: https://issues.apache.org/jira/browse/HDFS-5456 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Critical Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5456.1.patch NameNode startup progress is supposed to be immutable after startup has completed. All methods are coded to ignore update attempts after startup has completed. However, {{StartupProgress#getCounter}} does not implement this correctly. If a caller attempts to get a counter for a new step that hasn't been seen before, then the method accidentally creates the step. This allocates additional space in the internal tracking data structures, so ultimately this is a memory leak. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5463) NameNode should limit the number of blocks per file
[ https://issues.apache.org/jira/browse/HDFS-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813940#comment-13813940 ] Uma Maheswara Rao G commented on HDFS-5463: --- Hi Vinay, I think this is already addressed. Please see this parameter: public static final String DFS_NAMENODE_MAX_BLOCKS_PER_FILE_KEY = dfs.namenode.fs-limits.max-blocks-per-file; public static final longDFS_NAMENODE_MAX_BLOCKS_PER_FILE_DEFAULT = 1024*1024; {code} if (pendingFile.getBlocks().length = maxBlocksPerFile) { throw new IOException(File has reached the limit on maximum number of + blocks ( + DFSConfigKeys.DFS_NAMENODE_MAX_BLOCKS_PER_FILE_KEY + ): + pendingFile.getBlocks().length + = + maxBlocksPerFile); } {code} Addressed as part of HDFS-4305. NameNode should limit the number of blocks per file --- Key: HDFS-5463 URL: https://issues.apache.org/jira/browse/HDFS-5463 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Currently there is no limit to number of blocks user can write to a file. And blocksize also can be set to minimum possible. User can write any number of blocks continously, which may create problems in NameNodes performance and service as the number of blocks of file increases. Because each time new block allocated, all blocks of the file will be persisted, and this can cause serious performance degradation So proposal is to limit the number of maximum blocks a user can write to a file. May be 1024 blocks(if 128*MB is block size then 128 GB can be max file size) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5462) Fail to compile in Branch HDFS-2832 with COMPILATION ERROR
[ https://issues.apache.org/jira/browse/HDFS-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni resolved HDFS-5462. - Resolution: Fixed Fail to compile in Branch HDFS-2832 with COMPILATION ERROR --- Key: HDFS-5462 URL: https://issues.apache.org/jira/browse/HDFS-5462 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: wenwupeng Failed to compile HDFS in Branch HDFS-2832 with COMPILATION ERROR , OutputFormat is Sun proprietary API and may be removed in a future release [INFO] Compiling 276 source files to /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[134,41] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[135,14] sun.misc.Cleaner is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[136,22] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [INFO] 10 errors -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5462) Fail to compile in Branch HDFS-2832 with COMPILATION ERROR
[ https://issues.apache.org/jira/browse/HDFS-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813942#comment-13813942 ] Eric Sirianni commented on HDFS-5462: - There seems to be an issue in the maven-compiler-plugin whereby when an ERROR is detected, it incorrectly marks the compiler warnings as ERRORs also. This makes it hard to see the actual ERROR in all the noise. It looks like [MCOMPILER-179|http://jira.codehaus.org/browse/MCOMPILER-179] (though that has been marked as fixed in maven 3.0... At any rate, the actual error is {code} [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown {code} This has been fixed by [~arpitagarwal] (see my comment on HDFS-5448). Fail to compile in Branch HDFS-2832 with COMPILATION ERROR --- Key: HDFS-5462 URL: https://issues.apache.org/jira/browse/HDFS-5462 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: wenwupeng Failed to compile HDFS in Branch HDFS-2832 with COMPILATION ERROR , OutputFormat is Sun proprietary API and may be removed in a future release [INFO] Compiling 276 source files to /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[134,41] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[135,14] sun.misc.Cleaner is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[136,22] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [INFO] 10 errors -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5462) Fail to compile in Branch HDFS-2832 with COMPILATION ERROR
[ https://issues.apache.org/jira/browse/HDFS-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813960#comment-13813960 ] Arpit Agarwal commented on HDFS-5462: - Thanks for responding to this Eric. Wenwu, please resync to the latest revision. Fail to compile in Branch HDFS-2832 with COMPILATION ERROR --- Key: HDFS-5462 URL: https://issues.apache.org/jira/browse/HDFS-5462 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: wenwupeng Failed to compile HDFS in Branch HDFS-2832 with COMPILATION ERROR , OutputFormat is Sun proprietary API and may be removed in a future release [INFO] Compiling 276 source files to /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[134,41] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[135,14] sun.misc.Cleaner is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[136,22] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [INFO] 10 errors -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814025#comment-13814025 ] Arpit Agarwal commented on HDFS-2832: - Konstantin, UUID generation uses a cryptographically secure PRNG. On Linux this is /dev/random, the fallback is SHA1PRNG with a period of 2^160. With a billion nodes the probability of a collision in a 128-bit space is less than 1 in 10^20. Note that what was previously the storageID is now the datanode UUID and it is generated once for the lifetime of a datanode. Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814070#comment-13814070 ] Luke Lu commented on HDFS-5333: --- Auto-redirect from the index page is nice. A link to alternative UI at the bottom would be very convenient for dev/qa/user to checkout alternative without having to enable/disable js and/or type explicit URLs. Improvement of current HDFS Web UI -- Key: HDFS-5333 URL: https://issues.apache.org/jira/browse/HDFS-5333 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Haohui Mai This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814087#comment-13814087 ] Haohui Mai commented on HDFS-5333: -- Thanks for the feedbacks. I'll make sure they will be addressed in HDFS-5444. Improvement of current HDFS Web UI -- Key: HDFS-5333 URL: https://issues.apache.org/jira/browse/HDFS-5333 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Haohui Mai This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs
[ https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814089#comment-13814089 ] Andrew Wang commented on HDFS-5458: --- Hey Mike, the patch looks good. I think this is small enough that we can commit it without a test. +1, thanks for the contribution. Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs -- Key: HDFS-5458 URL: https://issues.apache.org/jira/browse/HDFS-5458 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Mike Mellenthin Attachments: HDFS-5458-1.patch Saw a stacktrace of datanode startup with a bad volume, where even listing directories would throw an IOException. The failed volume threshold was set to 1, but it would fatally error out in {{File#getCanonicalPath}} in {{getDataDirsFromURIs}}: {code} File dir = new File(dirURI.getPath()); try { dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI())); dirs.add(dir); } catch (IOException ioe) { LOG.warn(Invalid + DFS_DATANODE_DATA_DIR_KEY + + dir + : , ioe); invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ ); } {code} Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, this catch clause doesn't properly protect startup from a failed volume. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs
[ https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5458: -- Resolution: Fixed Fix Version/s: 2.2.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed for 2.2.1. Thanks again, Mike! Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs -- Key: HDFS-5458 URL: https://issues.apache.org/jira/browse/HDFS-5458 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Mike Mellenthin Fix For: 2.2.1 Attachments: HDFS-5458-1.patch Saw a stacktrace of datanode startup with a bad volume, where even listing directories would throw an IOException. The failed volume threshold was set to 1, but it would fatally error out in {{File#getCanonicalPath}} in {{getDataDirsFromURIs}}: {code} File dir = new File(dirURI.getPath()); try { dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI())); dirs.add(dir); } catch (IOException ioe) { LOG.warn(Invalid + DFS_DATANODE_DATA_DIR_KEY + + dir + : , ioe); invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ ); } {code} Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, this catch clause doesn't properly protect startup from a failed volume. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs
[ https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814106#comment-13814106 ] Hudson commented on HDFS-5458: -- FAILURE: Integrated in Hadoop-trunk-Commit #4695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4695/]) HDFS-5458. Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs. Contributed by Mike Mellenthin. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539091) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs -- Key: HDFS-5458 URL: https://issues.apache.org/jira/browse/HDFS-5458 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Mike Mellenthin Fix For: 2.2.1 Attachments: HDFS-5458-1.patch Saw a stacktrace of datanode startup with a bad volume, where even listing directories would throw an IOException. The failed volume threshold was set to 1, but it would fatally error out in {{File#getCanonicalPath}} in {{getDataDirsFromURIs}}: {code} File dir = new File(dirURI.getPath()); try { dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI())); dirs.add(dir); } catch (IOException ioe) { LOG.warn(Invalid + DFS_DATANODE_DATA_DIR_KEY + + dir + : , ioe); invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ ); } {code} Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, this catch clause doesn't properly protect startup from a failed volume. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5463) NameNode should limit the number of blocks per file
[ https://issues.apache.org/jira/browse/HDFS-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved HDFS-5463. --- Resolution: Duplicate As Uma said above, I think this is handled as of 2.1.0 by HDFS-4305. Please re-open if you feel this is incorrect. Thanks Vinay. NameNode should limit the number of blocks per file --- Key: HDFS-5463 URL: https://issues.apache.org/jira/browse/HDFS-5463 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Currently there is no limit to number of blocks user can write to a file. And blocksize also can be set to minimum possible. User can write any number of blocks continously, which may create problems in NameNodes performance and service as the number of blocks of file increases. Because each time new block allocated, all blocks of the file will be persisted, and this can cause serious performance degradation So proposal is to limit the number of maximum blocks a user can write to a file. May be 1024 blocks(if 128*MB is block size then 128 GB can be max file size) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814133#comment-13814133 ] Jing Zhao commented on HDFS-5252: - The patch looks good to me. One minor is that maybe we do not need to always call sync-and-update-length here, which will always fire a RPC call to NN. Instead, maybe we can do hsync for DATA_SYNC, and only update length when the stable flag is FILE_SYNC. This will save us some RPC calls for larger files. Stable write is not handled correctly in someplace -- Key: HDFS-5252 URL: https://issues.apache.org/jira/browse/HDFS-5252 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5252.001.patch When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
[ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-3752: - Assignee: (was: Todd Lipcon) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM --- Key: HDFS-3752 URL: https://issues.apache.org/jira/browse/HDFS-3752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0-alpha Reporter: Vinay 1. do {{saveNameSpace}} in ANN node by entering into safemode 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY 3. Now StandBy NN will not able to copy the fsimage_txid from ANN This is because, SNN not able to find the next txid (txid+1) in shared storage. Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT edits op. and BookKeeper will not be able to read last entry from inprogress ledger. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
[ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814206#comment-13814206 ] Todd Lipcon commented on HDFS-3752: --- This might have gotten fixed by HDFS-5080. I don't know anything about BKJM, so not going to work on this. BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM --- Key: HDFS-3752 URL: https://issues.apache.org/jira/browse/HDFS-3752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0-alpha Reporter: Vinay 1. do {{saveNameSpace}} in ANN node by entering into safemode 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY 3. Now StandBy NN will not able to copy the fsimage_txid from ANN This is because, SNN not able to find the next txid (txid+1) in shared storage. Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT edits op. and BookKeeper will not be able to read last entry from inprogress ledger. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Attachment: HDFS-5326.003.patch * adjust protobufs as recommended in HDFS-5166. Specifically, create a {{PathBasedCacheDirectiveInfoProto}}, and use it in the add, modify, and list RPCs. This avoids having to duplicate those fields everywhere. * get rid of the descriptor / directive division. Having many different types for the same thing is confusing to users. The only advantage of the division is that it prevented using a Directive with an ID in a context where that was inappropriate; however, we can simply validate this in the one case it matters (in FSNamesystem when doing addDirective). This also gets rid of some long-standing WTFs (why does -removeDirective remove a descriptor?, etc) * Both the directive type and the protobuf now have all fields optional. We can simply validate that fields exist when we need them. This will be helpful later in allowing us to compatibly add new fields, once compatibility becomes a big concern (in branch-2). * add {{modifyPathBasedCacheDirective}}, which modifies an existing PBCD. * in CacheManager, there were a few cases where we were converting a PBCE to a PBCD, just to get some field we could have accessed directly in the PBCE. Just access the field directly from the PBCE. * in CacheManager, use try ... catch and log all {{IOException}} objects that were thrown, rather than making the programmer duplicate the failure message in the log and in the thrown exception. This does change some indentation but it makes things much cleaner on the whole. * use standardized exceptions like {{AccessControlException}} rather than custom ones like {{AddPathBasedCacheDirectiveException}}. Add {{IdNotFoundException}} to the common set of exceptions. * {{addPathBasedCacheDirective}] now returns an ID, not a Directive. The previous situation was confusing because the object that was being returned had its ID based on what the NameNode set, but the rest of the fields left identical to what the client passed. This could result in some of the fields being wrong. So just return to the client what the server returned. * Similarly, {{removePathBasedCacheDirective}} now just takes an ID, not an object. It's confusing to take an object, since it obscures the fact that we only look at one field (ID). Making the parameter an object encourages people to try to remove by path or some other field, which simply won't work. Calling Directive#getId is straightforward and makes it obvious what is going on. * Make sure that AddPathBasedCacheDirectiveOp stores the ID of the created directive. Previously, we were relying on the ordering of the directives and the ID assignment order, which is brittle. If any edit log entries are unreadable, this strategy fails completely. Storing the ID is much more robust. add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Status: Patch Available (was: Open) add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5464) Simplify block report diff calculation
Tsz Wo (Nicholas), SZE created HDFS-5464: Summary: Simplify block report diff calculation Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Status: Patch Available (was: Open) Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Attachment: h5464_20131105.patch h5464_20131105.patch: remove the delimiter logic. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Attachment: HDFS-5364.006.patch Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814335#comment-13814335 ] Brandon Li commented on HDFS-5364: -- Thanks for the review. 1. done 2 and 3 are optimization of the eviction method. As we discussed offline, I will file a following up JIRA for that. 4. done. The lock needs to be held there to synchronize with insert operation. 5. done. nice catch! 6. done. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814338#comment-13814338 ] Hari Mankude commented on HDFS-5427: Is this patch going to be backported to 2.2 also? not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814339#comment-13814339 ] Sandy Ryza commented on HDFS-5436: -- This appears to be causing the following when I try to set up a pseudo-distributed cluster. Any idea why? {code} java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem not found at java.util.ServiceLoader.fail(ServiceLoader.java:214) at java.util.ServiceLoader.access$400(ServiceLoader.java:164) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:350) at java.util.ServiceLoader$1.next(ServiceLoader.java:421) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2282) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2293) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2310) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2349) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2331) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:168) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:446) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:274) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} Also, the old package name is still referred to in ./hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm and ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml. Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web Key: HDFS-5436 URL: https://issues.apache.org/jira/browse/HDFS-5436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.3.0 Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, HDFS-5436.002.patch Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in different packages. This force several methods in ByteInputStream and URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814350#comment-13814350 ] Haohui Mai commented on HDFS-5436: -- The service loader should be looking at the contents of {noformat} /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem {noformat} Can you please double check whether the file is up-to-date? Thanks for catching the bugs in the documentation, I'll file a jira to fix them shortly. Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web Key: HDFS-5436 URL: https://issues.apache.org/jira/browse/HDFS-5436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.3.0 Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, HDFS-5436.002.patch Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in different packages. This force several methods in ByteInputStream and URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5465) Update the package names for hsftp / hftp in the documentation
Haohui Mai created HDFS-5465: Summary: Update the package names for hsftp / hftp in the documentation Key: HDFS-5465 URL: https://issues.apache.org/jira/browse/HDFS-5465 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor HDFS-5436 move HftpFileSystem and HsftpFileSystem to a different package. The documentation should be updated as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814357#comment-13814357 ] Hadoop QA commented on HDFS-5364: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612265/HDFS-5364.006.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5338//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5338//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5338//console This message is automatically generated. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814361#comment-13814361 ] Hadoop QA commented on HDFS-5326: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612243/HDFS-5326.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.protocolPB.TestClientNamenodeProtocolServerSideTranslatorPB org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5336//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5336//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5336//console This message is automatically generated. add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5466) Update storage IDs when the pipeline is updated
Tsz Wo (Nicholas), SZE created HDFS-5466: Summary: Update storage IDs when the pipeline is updated Key: HDFS-5466 URL: https://issues.apache.org/jira/browse/HDFS-5466 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE In DFSOutputStream, when the nodes in the pipeline is updated, we should also update the storage IDs. Otherwise, the node list and the storage ID list are mismatched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5466) Update storage IDs when the pipeline is updated
[ https://issues.apache.org/jira/browse/HDFS-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5466: - Attachment: h5466_20131105.patch h5466_20131105.patch: update storage IDs. Update storage IDs when the pipeline is updated --- Key: HDFS-5466 URL: https://issues.apache.org/jira/browse/HDFS-5466 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5466_20131105.patch In DFSOutputStream, when the nodes in the pipeline is updated, we should also update the storage IDs. Otherwise, the node list and the storage ID list are mismatched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814383#comment-13814383 ] Sandy Ryza commented on HDFS-5436: -- My bad, it looks like I had some old jars lying around. Thanks, [~wheat9]. Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web Key: HDFS-5436 URL: https://issues.apache.org/jira/browse/HDFS-5436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.3.0 Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, HDFS-5436.002.patch Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in different packages. This force several methods in ByteInputStream and URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5466) Update storage IDs when the pipeline is updated
[ https://issues.apache.org/jira/browse/HDFS-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5466: - Attachment: h5466_20131105b.patch h5466_20131105b.patch: add setPipeline(..) methodes. BTW, this will fix TestEncryptedTransfer and TestCrcCorruption. Update storage IDs when the pipeline is updated --- Key: HDFS-5466 URL: https://issues.apache.org/jira/browse/HDFS-5466 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5466_20131105.patch, h5466_20131105b.patch In DFSOutputStream, when the nodes in the pipeline is updated, we should also update the storage IDs. Otherwise, the node list and the storage ID list are mismatched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814426#comment-13814426 ] Andrew Wang commented on HDFS-5394: --- Sorry for leaving this for so long, got tied up in a variety of different things. Thanks for bumping it based on feedback thus far, I think we're close. * I like the test stub for mlock Nits: * Unused imports in FsDatasetImpl and FsVolumeImpl * Do we still need to rename {{getExecutor}} to {{getCacheExecutor}} in FsVolumeImpl? * {{State#isUncaching()}} is unused * Could use a core pool size of 0 for {{uncachingExecutor}}, I don't think it's that latency sensitive * usedBytes javadoc: more things to cache that we can't actually do because of is an awkward turn of phrase, maybe say assign more blocks than we can actually cache because of instead * MappableBlock#load javadoc: visibleLeng parameter should be renamed to length. The return value is now also a MappableBlock, not a boolean. * Key: rename {{id}} to {{blockId}} for clarity? or add a bit of javadoc * Naming the HashMap {{replicaMap}} is confusing since there's already a datanode {{ReplicaMap}} class. Maybe {{mappableBlockMap}} instead? Impl: * Caching can fail if the underlying block is invalidated in between getting the block's filename and running the CacheTask. It'd be nice to distinguish this race from a real error for when we do metrics (and also quash the exception). * If we get a {{DNA_CACHE}} for a block that is currently being uncached, shouldn't we try to cancel the uncache and re-cache it? The NN will resend the command, but it'd be better to not have to wait for that. {code} if ((value == null) || (value.state != State.CACHING)) { {code} * Could this be written with {{value.state == State.CACHING_CANCELLED}} instead? Would be clearer, and I believe equivalent since {{uncacheBlock}} won't set the state to {{UNCACHING}} if it's {{CACHING}} or {{CACHING_CANCELLED}}. * Even better would be interrupting a {{CachingTask}} on uncache since it'll save us I/O and CPU. * Could we combine {{CACHING_CANCELLED}} into {{UNCACHING}}? It seems like {{CachingTask}} could check for {{UNCACHING}} in that if statement at the end and uncache, same sort of change for {{uncacheBlock}}. * I think using a switch/case on the prevValue.state in uncacheBlock would be clearer Test: * 6,000,000 milliseconds seem like very long test timeouts :) Can we change them to say, 60,000? * Are these new log prints for sanity checking? Maybe we can just remove them. * Some of the comments seem to refer to a previous patch version that used a countdown latch. * It's unclear what this is testing beyond caching and then uncaching a bunch of blocks. Can we check for log prints to see that it's actually cancelling as expected? Any other ideas for definitively hitting cancellation? fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814437#comment-13814437 ] Colin Patrick McCabe commented on HDFS-5394: OK, I figured out the test failure. It seems that when computing how much data we can mlock, we must round up mmap'ed regions to the operating system page size. In the case of Linux, that is almost always 4096. The reason behind this is because the OS manages memory in units of 4096 bytes. It is simply impossible to lock at a finer granularity than that. So we should take this into account in our statistics. I adjusted the test to take this into account, and also added a skip if we don't have enough lockable memory available. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5466) Update storage IDs when the pipeline is updated
[ https://issues.apache.org/jira/browse/HDFS-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5466. - Resolution: Fixed Fix Version/s: Heterogeneous Storage (HDFS-2832) Hadoop Flags: Reviewed +1 for the patch. I committed it to branch HDFS-2832. Update storage IDs when the pipeline is updated --- Key: HDFS-5466 URL: https://issues.apache.org/jira/browse/HDFS-5466 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: Heterogeneous Storage (HDFS-2832) Attachments: h5466_20131105.patch, h5466_20131105b.patch In DFSOutputStream, when the nodes in the pipeline is updated, we should also update the storage IDs. Otherwise, the node list and the storage ID list are mismatched. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Attachment: HDFS-5364.007.patch Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814459#comment-13814459 ] Hadoop QA commented on HDFS-5364: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612290/HDFS-5364.007.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5339//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5339//console This message is automatically generated. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Attachment: HDFS-5443.000.patch Upload a simple patch that tries to delete the 0-sized block for INodeFileUC. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5467) Remove tab characters in hdfs-default.xml
Andrew Wang created HDFS-5467: - Summary: Remove tab characters in hdfs-default.xml Key: HDFS-5467 URL: https://issues.apache.org/jira/browse/HDFS-5467 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Priority: Trivial The retrycache parameters are indented with tabs rather than the normal 2 spaces. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Status: Patch Available (was: Open) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5467) Remove tab characters in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-5467: - Assignee: Andrew Wang Remove tab characters in hdfs-default.xml - Key: HDFS-5467 URL: https://issues.apache.org/jira/browse/HDFS-5467 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Labels: newbie The retrycache parameters are indented with tabs rather than the normal 2 spaces. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814517#comment-13814517 ] Konstantin Shvachko commented on HDFS-5464: --- So in regular case you will be adding 100,000 replicas to {{toRemove}} list only in order to delete most of them later. How does it make things simpler? The delimiter lets you keep the calculated lists as small as possible, reducing memory consumption, avoiding frequent GCs. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814527#comment-13814527 ] Tsz Wo (Nicholas), SZE commented on HDFS-5464: -- Hi Konstantin, You may be correct that the new code use more memory, however, I beg you won't argue that the new code is simpler than the existing code. :) I will think about how to reduce the memory usage. Thanks for the input. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Attachment: h5466_20131105b.patch Actually, it is unnecessarily to add all the blocks to the remove list. Here is a new patch. h5466_20131105b.patch Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Attachment: h5464_20131105b.patch Here is the correct file: h5464_20131105b.patch Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Attachment: (was: h5466_20131105b.patch) Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5462) Fail to compile in Branch HDFS-2832 with COMPILATION ERROR
[ https://issues.apache.org/jira/browse/HDFS-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814549#comment-13814549 ] wenwupeng commented on HDFS-5462: - Thanks for helpful respondce Eric and Arpit. it is passed after sync to the latest version. Fail to compile in Branch HDFS-2832 with COMPILATION ERROR --- Key: HDFS-5462 URL: https://issues.apache.org/jira/browse/HDFS-5462 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: wenwupeng Failed to compile HDFS in Branch HDFS-2832 with COMPILATION ERROR , OutputFormat is Sun proprietary API and may be removed in a future release [INFO] Compiling 276 source files to /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:[337,34] unreported exception java.io.IOException; must be caught or declared to be thrown [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[134,41] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[135,14] sun.misc.Cleaner is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java:[136,22] sun.nio.ch.DirectBuffer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] com.sun.org.apache.xml.internal.serialize.OutputFormat is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [ERROR] /home/jenkins/slave/workspace/HVE-PostCommit-HDFS-2832/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] com.sun.org.apache.xml.internal.serialize.XMLSerializer is Sun proprietary API and may be removed in a future release [INFO] 10 errors -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814552#comment-13814552 ] Arpit Agarwal commented on HDFS-5439: - Thanks for the patch Junping! Most of your changes look fine. A remaining issue was that {{PendingReplicationBlock}} should track targets as {{DatanodeDescriptor}} instead of {{DatanodeStorageInfo}}. Will post a consolidated patch along with your changes. Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5448) Datanode should generate its ID on first registration
[ https://issues.apache.org/jira/browse/HDFS-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814555#comment-13814555 ] Arpit Agarwal commented on HDFS-5448: - Thanks for the correction, you're right. Datanode should generate its ID on first registration - Key: HDFS-5448 URL: https://issues.apache.org/jira/browse/HDFS-5448 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: Heterogeneous Storage (HDFS-2832) Attachments: h5448.01.patch, h5448.03.patch, h5448.04.addendum.patch, h5448.04.patch Prior to the heterogeneous storage feature, each Datanode had a single storage ID which was generated by the Namenode on first registration. The storage ID used fixed Datanode identifiers like IP address and port, so that in a federated cluster, for example, all NameNodes would generate the same storage ID. With Heterogeneous storage, we have replaced the storage ID with a per-datanode identifier called the Datanode-UUID. The Datanode UUID is also assigned by a NameNode on first registration. In a federated cluster with multiple namenodes, there are two ways to ensure a unique Datanode UUID allocation: # Synchronize initial registration requests from the BPServiceActors. If a Datanode UUID is already assigned we don't need to synchronize. # The datanode assigns itself a UUID on initialization. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5394: --- Attachment: HDFS-5394.007.patch fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814568#comment-13814568 ] Colin Patrick McCabe commented on HDFS-5394: bq. Unused imports in FsDatasetImpl and FsVolumeImpl removed bq. Do we still need to rename getExecutor to getCacheExecutor in FsVolumeImpl? Well, the name of the variable is {{cacheExecutor}}; shouldn't the getter be {{getCacheExecutor}}? bq. State#isUncaching() is unused removed bq. Could use a core pool size of 0 for uncachingExecutor, I don't think it's that latency sensitive agreed bq. usedBytes javadoc: more things to cache that we can't actually do because of is an awkward turn of phrase, maybe say assign more blocks than we can actually cache because of instead ok bq. MappableBlock#load javadoc: visibleLeng parameter should be renamed to length. The return value is now also a MappableBlock, not a boolean. fixed bq. Key: rename id to blockId for clarity? or add a bit of javadoc added javadoc bq. Naming the HashMap replicaMap is confusing since there's already a datanode ReplicaMap class. Maybe mappableBlockMap instead? ok bq. Caching can fail if the underlying block is invalidated in between getting the block's filename and running the CacheTask. It'd be nice to distinguish this race from a real error for when we do metrics (and also quash the exception). I just added a catch block for the {{FileNotFound}} exception which both {{getBlockInputStream}} and {{getMetaDataInputStream}} can throw. I still think we want to log this exception, but at INFO rather than WARN. We will retry sending the {{DNA_CACHE}} command (once 5366 is committed), so hitting this narrow race if a block is being moved is just a temporary setback. bq. If we get a DNA_CACHE for a block that is currently being uncached, shouldn't we try to cancel the uncache and re-cache it? The NN will resend the command, but it'd be better to not have to wait for that. We don't know how far along the uncaching process is. We can't cancel it if we already called {{munmap}}. We could allow cancellation of pending uncaches by splitting {{UNCACHING}} into {{UNCACHING_SCHEDULED}} and {{UNCACHING_IN_PROGRESS}}, and only allowing cancellation on the former. This might be a good improvement to make as part of 5182. But for now, the uncaching process is really quick, so let's keep it simple. bq. Could this be written with value.state == State.CACHING_CANCELLED instead? Would be clearer, and I believe equivalent since uncacheBlock won't set the state to UNCACHING if it's CACHING or CACHING_CANCELLED. well, if value is null, you don't want to be dereferencing that, right? bq. Even better would be interrupting a CachingTask on uncache since it'll save us I/O and CPU. That kind of interruption logic gets complex quickly. I'd rather save that for a potential performance improvement JIRA later down the line. I also think that if we're thrashing (cancelling caching requests right and left) the real fix might be on the NameNode anyway... bq. Could we combine CACHING_CANCELLED into UNCACHING? It seems like CachingTask could check for UNCACHING in that if statement at the end and uncache, same sort of change for uncacheBlock. I would rather not do that, since right now we can look at entries in the map and instantly know that anything in state {{UNCACHING}} has an associated {{Runnable}} scheduled in the {{Executor}}. cancelled is not really the same thing as uncaching since in the former case, there is actually nothing to do! bq. I think using a switch/case on the prevValue.state in uncacheBlock would be clearer ok bq. 6,000,000 milliseconds seem like very long test timeouts Can we change them to say, 60,000? the general idea is to do stuff that can time out in {{GenericTestUtils#waitFor}} blocks. The waitFor blocks actually give useful backtraces and messages when they time out, unlike the generic test timeouts. I wanted to avoid the scenario where the test-level timeouts kick in, but out of paranoia, I set the overall test timeout to 10 minutes in case there was some other unexpected timeout. I wanted to avoid the issues we've had with zombie tests in Jenkins causing heisenfailures. bq. Are these new log prints for sanity checking? Maybe we can just remove them. it's more so you can see what's going on in the sea of log messages. otherwise, it becomes hard to debug. bq. Some of the comments seem to refer to a previous patch version that used a countdown latch. fixed bq. It's unclear what this is testing beyond caching and then uncaching a bunch of blocks. Can we check for log prints to see that it's actually cancelling as expected? Any other ideas for definitively hitting cancellation? we could add callback hooks to more points in the system, and set up a bunch of countdown latches (or similar), but it might
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814570#comment-13814570 ] Hadoop QA commented on HDFS-5443: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612292/HDFS-5443.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5340//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5340//console This message is automatically generated. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5468) CacheAdmin help command does not recognize commands
Stephen Chu created HDFS-5468: - Summary: CacheAdmin help command does not recognize commands Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Priority: Minor Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-5468: -- Attachment: HDFS-5468.patch Attached a patch that avoids remixing leading hyphens on the input command, so getting help usage works when executing something like _hdfs cacheadmin -help -addPool_. Added a unit test to exercise the cacheadmin help command. Added the -path specifier to the help usage of the removeDirectives command. CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Priority: Minor Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-5468: -- Status: Patch Available (was: Open) CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Priority: Minor Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814583#comment-13814583 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612253/h5464_20131105.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu reassigned HDFS-5468: - Assignee: Stephen Chu CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814587#comment-13814587 ] Junping Du commented on HDFS-5439: -- Hi Arpit, do you mean targets in PendingBlockInfo? I think storageID info is necessary and computeReplicationWorkForBlocks() in BlockManager is something we should fix. Thoughts? Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5411) Update Bookkeeper dependency to 4.2.1
[ https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-5411: --- Issue Type: Sub-task (was: Improvement) Parent: HDFS-3399 Update Bookkeeper dependency to 4.2.1 - Key: HDFS-5411 URL: https://issues.apache.org/jira/browse/HDFS-5411 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Robert Rati Priority: Minor Attachments: HDFS-5411.patch Update the bookkeeper dependency to 4.2.1. This eases compilation on Fedora platforms -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.1
[ https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814602#comment-13814602 ] Rakesh R commented on HDFS-5411: Hi Robert, Recently Bookkeeper has released 4.2.2 version, would be good to use this stable version. Whats your opinion? Update Bookkeeper dependency to 4.2.1 - Key: HDFS-5411 URL: https://issues.apache.org/jira/browse/HDFS-5411 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Robert Rati Priority: Minor Attachments: HDFS-5411.patch Update the bookkeeper dependency to 4.2.1. This eases compilation on Fedora platforms -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5439: Attachment: h5439.04.patch Attaching patch to fix the following: # Bug in {{BlockPlacementPolicyDefault#chooseRandom}} hit when numOfReplicas 2. # {{PendingReplicationBlocks}} tracks replicas by {{DatanodeDescriptor}} instead of {{DatanodeStorageInfo}} # Update couple of logs (copied from Junping's patch), remove obsolete TODO in {{BPServiceActor}} # Update {{TestPendingReplications}} Junping, I did not understand your comment about {{computeReplicationWorkForBlocks}}. Also I don't think we need 1 and 2 from your list. Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814616#comment-13814616 ] Arpit Agarwal commented on HDFS-5439: - To clarify, the reason we don't need 1 anymore is because {{PendingReplicationBlocks}} uses DatanodeDescriptor as they key (thanks to [~szetszwo] for the idea). Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5464: - Attachment: h5464_20131105c.patch h5464_20131105c.patch: reverts the changes causing the findbugs warning. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814620#comment-13814620 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612309/h5464_20131105b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestLeaseRecovery2 The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.namenode.TestFsck {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814640#comment-13814640 ] Hadoop QA commented on HDFS-5394: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612314/HDFS-5394.007.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1548 javac compiler warnings (more than the trunk's current 1547 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestPathBasedCacheRequests org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5343//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5343//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5343//console This message is automatically generated. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814647#comment-13814647 ] Hadoop QA commented on HDFS-5468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612319/HDFS-5468.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5344//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5344//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5344//console This message is automatically generated. CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131105.patch Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814660#comment-13814660 ] Tsz Wo (Nicholas), SZE commented on HDFS-5439: -- Patch looks good. Just a question: why changing BlockPlacementPolicyDefault? Is there a bug? Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814662#comment-13814662 ] Arpit Agarwal commented on HDFS-5439: - Thanks for reviewing Nicholas. Yes, the bug is that goodTarget was not reset after processing the first datanode but it was used as a terminating condition in the for loop. So the function would always fail when numOfReplicas 2. Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5439. - Resolution: Fixed Fix Version/s: Heterogeneous Storage (HDFS-2832) Hadoop Flags: Reviewed I committed this to branch HDFS-2832. Thanks for contributing part of the fix Junping. Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: Heterogeneous Storage (HDFS-2832) Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplication
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814664#comment-13814664 ] Tsz Wo (Nicholas), SZE commented on HDFS-5439: -- +1 good catch! Fix TestPendingReplication -- Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: Heterogeneous Storage (HDFS-2832) Attachments: HDFS-5439-demo1.patch, h5439.04.patch {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:4 but was:3 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814670#comment-13814670 ] Tsz Wo (Nicholas), SZE commented on HDFS-5464: -- h5464_20131105[bc].patch won't work. h5464_20131105.patch works but it uses more memory and longer running time. I agree with Konstantin that the original code is better although it is complicated. I will leave this for awhile and see if I could come up a better solution. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)