[jira] [Assigned] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-7884: -- Assignee: Brahma Reddy Battula NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347080#comment-14347080 ] Brahma Reddy Battula commented on HDFS-7884: [~szetszwo] Please re-assign to yourself, if you started work on this jira...Thanks NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7881) TestHftpFileSystem#testSeek fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-7881: -- Assignee: Brahma Reddy Battula TestHftpFileSystem#testSeek fails in branch-2 - Key: HDFS-7881 URL: https://issues.apache.org/jira/browse/HDFS-7881 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Priority: Blocker TestHftpFileSystem#testSeek fails in branch-2. {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestHftpFileSystem Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.201 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestHftpFileSystem testSeek(org.apache.hadoop.hdfs.web.TestHftpFileSystem) Time elapsed: 0.054 sec ERROR! java.io.IOException: Content-Length is missing: {null=[HTTP/1.1 206 Partial Content], Date=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 GMT], Expires=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 GMT], Connection=[close], Content-Type=[text/plain; charset=utf-8], Server=[Jetty(6.1.26)], Content-Range=[bytes 7-9/10], Pragma=[no-cache, no-cache], Cache-Control=[no-cache]} at org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:132) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:104) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:181) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.web.TestHftpFileSystem.testSeek(TestHftpFileSystem.java:253) Results : Tests in error: TestHftpFileSystem.testSeek:253 ยป IO Content-Length is missing: {null=[HTTP/1 Tests run: 14, Failures: 0, Errors: 1, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7757) Misleading error messages in FSImage.java
[ https://issues.apache.org/jira/browse/HDFS-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347002#comment-14347002 ] Hudson commented on HDFS-7757: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/]) HDFS-7757. Misleading error messages in FSImage.java. (Contributed by Brahma Reddy Battula) (arp: rev 1004473aa612ee3703394943f25687aa5bef47ea) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java Misleading error messages in FSImage.java - Key: HDFS-7757 URL: https://issues.apache.org/jira/browse/HDFS-7757 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: HDFS-7757-002.patch, HDFS-7757.patch If a quota violation is detected while loading an image, the NameNode logs scary error messages indicating a bug. However the quota violation state is very easy to get into e.g. # Copy a 2MB file to a directory. # Set a disk space quota of 1MB on the directory. We are in quota violation state now. We should reword the error messages, ideally making them warnings and suggesting the administrator needs to fix the quotas: Relevant code: {code} LOG.error(BUG: Diskspace quota violation in image for + dir.getFullPathName() + quota = + dsQuota + consumed = + diskspace); ... LOG.error(BUG Disk quota by storage type violation in image for + dir.getFullPathName() + type = + t.toString() + quota = + typeQuota + consumed + typeSpace); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content
[ https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347036#comment-14347036 ] Hudson commented on HDFS-7682: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) HDFS-7682. {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content. Contributed by Charles Lamb. (atm: rev f2d7a67a2c1d9dde10ed3171fdec65dff885afcc) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotFileLength.java {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content Key: HDFS-7682 URL: https://issues.apache.org/jira/browse/HDFS-7682 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Fix For: 2.7.0 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, HDFS-7682.002.patch, HDFS-7682.003.patch DistributedFileSystem#getFileChecksum of a snapshotted file includes non-snapshotted content. The reason why this happens is because DistributedFileSystem#getFileChecksum simply calculates the checksum of all of the CRCs from the blocks in the file. But, in the case of a snapshotted file, we don't want to include data in the checksum that was appended to the last block in the file after the snapshot was taken. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347031#comment-14347031 ] Hudson commented on HDFS-7869: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) HDFS-7869. Inconsistency in the return information while performing rolling upgrade ( Contributed by J.Andreina ) (vinayakumarb: rev 3560180b6e9926aa3ee1357da59b28a4b4689a0d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Inconsistency in the return information while performing rolling upgrade Key: HDFS-7869 URL: https://issues.apache.org/jira/browse/HDFS-7869 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Fix For: 2.7.0 Attachments: HDFS-7869.1.patch, HDFS-7869.2.patch Return information , while performing finalize Rolling upgrade is improper ( does not gives information whether the current action is successful / not) {noformat} Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... There is no rolling upgrade in progress or rolling upgrade has already been finalized. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7757) Misleading error messages in FSImage.java
[ https://issues.apache.org/jira/browse/HDFS-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347039#comment-14347039 ] Hudson commented on HDFS-7757: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) HDFS-7757. Misleading error messages in FSImage.java. (Contributed by Brahma Reddy Battula) (arp: rev 1004473aa612ee3703394943f25687aa5bef47ea) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Misleading error messages in FSImage.java - Key: HDFS-7757 URL: https://issues.apache.org/jira/browse/HDFS-7757 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: HDFS-7757-002.patch, HDFS-7757.patch If a quota violation is detected while loading an image, the NameNode logs scary error messages indicating a bug. However the quota violation state is very easy to get into e.g. # Copy a 2MB file to a directory. # Set a disk space quota of 1MB on the directory. We are in quota violation state now. We should reword the error messages, ideally making them warnings and suggesting the administrator needs to fix the quotas: Relevant code: {code} LOG.error(BUG: Diskspace quota violation in image for + dir.getFullPathName() + quota = + dsQuota + consumed = + diskspace); ... LOG.error(BUG Disk quota by storage type violation in image for + dir.getFullPathName() + type = + t.toString() + quota = + typeQuota + consumed + typeSpace); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6565) Use jackson instead jetty json in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347038#comment-14347038 ] Hudson commented on HDFS-6565: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) HDFS-6565. Use jackson instead jetty json in hdfs-client. Contributed by Akira AJISAKA. (wheat9: rev e2262d3d18c6d5c2aa20f96920104dc07271b869) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestJsonUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java Use jackson instead jetty json in hdfs-client - Key: HDFS-6565 URL: https://issues.apache.org/jira/browse/HDFS-6565 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Akira AJISAKA Fix For: 2.7.0 Attachments: HDFS-6565-002.patch, HDFS-6565-003.patch, HDFS-6565-004.patch, HDFS-6565-005.patch, HDFS-6565.patch hdfs-client should use Jackson instead of jetty to parse JSON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7746: -- Attachment: h7746_20150305.patch h7746_20150305.patch: adds TestAppendSnapshotTruncate. Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API
[ https://issues.apache.org/jira/browse/HDFS-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347196#comment-14347196 ] Hudson commented on HDFS-7879: -- FAILURE: Integrated in Hadoop-trunk-Commit #7254 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7254/]) HDFS-7879. hdfs.dll does not export functions of the public libhdfs API. Contributed by Chris Nauroth. (wheat9: rev f717dc51b27d72ad02732a8da397e4a1cc270514) * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hdfs.dll does not export functions of the public libhdfs API Key: HDFS-7879 URL: https://issues.apache.org/jira/browse/HDFS-7879 Project: Hadoop HDFS Issue Type: Bug Components: build, libhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.7.0 Attachments: HDFS-7879.001.patch, HDFS-7879.002.patch HDFS-573 enabled libhdfs to be built for Windows. This did not include marking the public API functions for export in hdfs.dll though, effectively making dynamic linking scenarios impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347165#comment-14347165 ] Daryn Sharp commented on HDFS-7435: --- Thanks Jing Charles. Addressing comments, will post later today. PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API
[ https://issues.apache.org/jira/browse/HDFS-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7879: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~cnauroth] for the contribution. hdfs.dll does not export functions of the public libhdfs API Key: HDFS-7879 URL: https://issues.apache.org/jira/browse/HDFS-7879 Project: Hadoop HDFS Issue Type: Bug Components: build, libhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.7.0 Attachments: HDFS-7879.001.patch, HDFS-7879.002.patch HDFS-573 enabled libhdfs to be built for Windows. This did not include marking the public API functions for export in hdfs.dll though, effectively making dynamic linking scenarios impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API
[ https://issues.apache.org/jira/browse/HDFS-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7879: - Summary: hdfs.dll does not export functions of the public libhdfs API (was: hdfs.dll does not export functions of the public libhdfs API.) hdfs.dll does not export functions of the public libhdfs API Key: HDFS-7879 URL: https://issues.apache.org/jira/browse/HDFS-7879 Project: Hadoop HDFS Issue Type: Bug Components: build, libhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.7.0 Attachments: HDFS-7879.001.patch, HDFS-7879.002.patch HDFS-573 enabled libhdfs to be built for Windows. This did not include marking the public API functions for export in hdfs.dll though, effectively making dynamic linking scenarios impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API
[ https://issues.apache.org/jira/browse/HDFS-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347182#comment-14347182 ] Haohui Mai commented on HDFS-7879: -- +1 hdfs.dll does not export functions of the public libhdfs API Key: HDFS-7879 URL: https://issues.apache.org/jira/browse/HDFS-7879 Project: Hadoop HDFS Issue Type: Bug Components: build, libhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.7.0 Attachments: HDFS-7879.001.patch, HDFS-7879.002.patch HDFS-573 enabled libhdfs to be built for Windows. This did not include marking the public API functions for export in hdfs.dll though, effectively making dynamic linking scenarios impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-7434: -- Attachment: HDFS-7434.patch UUID becomes immutable and the hashCode of DatanodeIds. Updated the few tests that relied on mutating the uuid to create new instances instead. This actually makes the tests more accurate because real registrations are not using the exact same object reference. DatanodeIds are now safe for collections. DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-7434: -- Status: Patch Available (was: Open) DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reassigned HDFS-7434: - Assignee: Daryn Sharp DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API
[ https://issues.apache.org/jira/browse/HDFS-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347230#comment-14347230 ] Chris Nauroth commented on HDFS-7879: - Haohui, thank you for the review and commit. hdfs.dll does not export functions of the public libhdfs API Key: HDFS-7879 URL: https://issues.apache.org/jira/browse/HDFS-7879 Project: Hadoop HDFS Issue Type: Bug Components: build, libhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.7.0 Attachments: HDFS-7879.001.patch, HDFS-7879.002.patch HDFS-573 enabled libhdfs to be built for Windows. This did not include marking the public API functions for export in hdfs.dll though, effectively making dynamic linking scenarios impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6488) HDFS superuser unable to access user's Trash files using NFSv3 mount
[ https://issues.apache.org/jira/browse/HDFS-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347414#comment-14347414 ] Brandon Li commented on HDFS-6488: -- The unit test failure is not introduced by this patch. HDFS superuser unable to access user's Trash files using NFSv3 mount Key: HDFS-6488 URL: https://issues.apache.org/jira/browse/HDFS-6488 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.3.0 Reporter: Stephen Chu Assignee: Brandon Li Attachments: HDFS-6488.001.patch, HDFS-6488.002.patch, HDFS-6488.003.patch As hdfs superuseruser on the NFS mount, I cannot cd or ls the /user/schu/.Trash directory: {code} bash-4.1$ cd .Trash/ bash: cd: .Trash/: Permission denied bash-4.1$ ls -la total 2 drwxr-xr-x 4 schu 2584148964 128 Jan 7 10:42 . drwxr-xr-x 4 hdfs 2584148964 128 Jan 6 16:59 .. drwx-- 2 schu 2584148964 64 Jan 7 10:45 .Trash drwxr-xr-x 2 hdfs hdfs64 Jan 7 10:42 tt bash-4.1$ ls .Trash ls: cannot open directory .Trash: Permission denied bash-4.1$ {code} When using FsShell as hdfs superuser, I have superuser permissions to schu's .Trash contents: {code} bash-4.1$ hdfs dfs -ls -R /user/schu/.Trash drwx-- - schu supergroup 0 2014-01-07 10:48 /user/schu/.Trash/Current drwx-- - schu supergroup 0 2014-01-07 10:48 /user/schu/.Trash/Current/user drwx-- - schu supergroup 0 2014-01-07 10:48 /user/schu/.Trash/Current/user/schu -rw-r--r-- 1 schu supergroup 4 2014-01-07 10:48 /user/schu/.Trash/Current/user/schu/tf1 {code} The NFSv3 logs don't produce any error when superuser tries to access schu Trash contents. However, for other permission errors (e.g. schu tries to delete a directory owned by hdfs), there will be a permission error in the logs. I think this is not specific to the .Trash directory perhaps. I created a /user/schu/dir1 which has the same permissions as .Trash (700). When I try cd'ing into the directory from the NFSv3 mount as hdfs superuser, I get the same permission denied. {code} [schu@hdfs-nfs ~]$ hdfs dfs -ls Found 4 items drwx-- - schu supergroup 0 2014-01-07 10:57 .Trash drwx-- - schu supergroup 0 2014-01-07 11:05 dir1 -rw-r--r-- 1 schu supergroup 4 2014-01-07 11:05 tf1 drwxr-xr-x - hdfs hdfs0 2014-01-07 10:42 tt bash-4.1$ whoami hdfs bash-4.1$ pwd /hdfs_nfs_mount/user/schu bash-4.1$ cd dir1 bash: cd: dir1: Permission denied bash-4.1$ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7535: Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Thanks again for the review, Nicholas! I've committed this to trunk and branch-2. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347447#comment-14347447 ] Hadoop QA commented on HDFS-7746: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702490/h7746_20150305.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9732//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9732//console This message is automatically generated. Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7853) Erasure coding: extend LocatedBlocks to support reading from striped files
[ https://issues.apache.org/jira/browse/HDFS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347370#comment-14347370 ] Zhe Zhang commented on HDFS-7853: - Thanks Jing for the patch and sorry for the delayed review. The overall structure looks good to me. The following implementation details are worth more discussions: # You might have already started optimizing {{BlockInfoStriped#indices}} based on TODO msg. Just a reminder that {{BlockInfoStripedUnderConstruction#blockIndices}} is only applicable in over-replication as well. But if we get rid of indices for non-over-replicated {{replicas}}, we need to update {{addReplicaIfNotPresent}} as well, to always insert in the right position # How about using a Map for those over-replicated replica (in addition to triplets and the {{replicas}} array)? It's less efficient but should simplify the code. It's a rare condition anyway. # If the non-over-replicated locations in {{LocatedBlockStriped}} are sorted, we can remove the required {{indices}} filed in PB and add an optional field for the excess replicas and their indices. Erasure coding: extend LocatedBlocks to support reading from striped files -- Key: HDFS-7853 URL: https://issues.apache.org/jira/browse/HDFS-7853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-7853.000.patch We should extend {{LocatedBlocks}} class so {{getBlockLocations}} can work with striping layout (possibly an extra list specifying the index of each location in the group) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7746: Hadoop Flags: Reviewed Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347595#comment-14347595 ] Jing Zhao commented on HDFS-7746: - Thanks for working on this, Nicholas! The patch looks good to me and the failed test should be unrelated. +1 Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7782) Read a striping layout file from client side
[ https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7782: Attachment: HDFS-7782-000.patch Initial patch to illustrate the idea. It uses hedged reading from the group of DNs in the stripe and only supports pread now. Read a striping layout file from client side Key: HDFS-7782 URL: https://issues.apache.org/jira/browse/HDFS-7782 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Zhe Zhang Attachments: HDFS-7782-000.patch If client wants to read a file, he is not necessary to know and handle what layout the file is. This sub task adds logic to DFSInputStream to support reading striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-7885: -- Assignee: (was: Suresh Srinivas) Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongming Liang updated HDFS-1522: - Fix Version/s: 3.0.0 Labels: patch (was: ) Target Version/s: 3.0.0 (was: 2.7.0) Affects Version/s: (was: 0.21.0) 3.0.0 Release Note: This merges Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant. Hard-coded literals of blk_ in various files are also updated to use the same constant. Hadoop Flags: Reviewed Status: Patch Available (was: In Progress) Existing test cases are used for testing. The code changes are reviewed by Konstantin Shvachko. Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Labels: patch Fix For: 3.0.0 Attachments: HDFS-1522.002.patch, HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7884) NullPointerException in BlockSender
Tsz Wo Nicholas Sze created HDFS-7884: - Summary: NullPointerException in BlockSender Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Priority: Blocker {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7746: -- Fix Version/s: (was: 2.7.0) Status: Patch Available (was: Open) Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7885) Datanode should not trust the generation stamp provided by client
vitthal (Suhas) Gogate created HDFS-7885: Summary: Datanode should not trust the generation stamp provided by client Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client
[ https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347533#comment-14347533 ] Arun Suresh commented on HDFS-7858: --- This testcase failure seems unrelated.. Improve HA Namenode Failover detection on the client Key: HDFS-7858 URL: https://issues.apache.org/jira/browse/HDFS-7858 Project: Hadoop HDFS Issue Type: Improvement Reporter: Arun Suresh Assignee: Arun Suresh Attachments: HDFS-7858.1.patch, HDFS-7858.2.patch, HDFS-7858.2.patch In an HA deployment, Clients are configured with the hostnames of both the Active and Standby Namenodes.Clients will first try one of the NNs (non-deterministically) and if its a standby NN, then it will respond to the client to retry the request on the other Namenode. If the client happens to talks to the Standby first, and the standby is undergoing some GC / is busy, then those clients might not get a response soon enough to try the other NN. Proposed Approach to solve this : 1) Since Zookeeper is already used as the failover controller, the clients could talk to ZK and find out which is the active namenode before contacting it. 2) Long-lived DFSClients would have a ZK watch configured which fires when there is a failover so they do not have to query ZK everytime to find out the active NN 2) Clients can also cache the last active NN in the user's home directory (~/.lastNN) so that short-lived clients can try that Namenode first before querying ZK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347545#comment-14347545 ] Lei (Eddy) Xu commented on HDFS-7884: - [~szetszwo] would you mind me to take this JIRA if you have not started on this yet. Moreover, could you post the context that generates this {{NullPointerException}}? Thanks! NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vitthal (Suhas) Gogate updated HDFS-7885: - Assignee: Suresh Srinivas Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Assignee: Suresh Srinivas Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347527#comment-14347527 ] Dongming Liang commented on HDFS-1522: -- Hi Konstantin, Thank you for reviewing this fix. I will update as suggested and then submit a new patch. Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Attachments: HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347546#comment-14347546 ] Hadoop QA commented on HDFS-7434: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702505/HDFS-7434.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9733//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9733//console This message is automatically generated. DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongming Liang updated HDFS-1522: - Attachment: HDFS-1522.002.patch Updated to adjust column size and fix import with review feedback. Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Attachments: HDFS-1522.002.patch, HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347349#comment-14347349 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-trunk-Commit #7256 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7256/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4929) [NNBench mark] Lease mismatch error when running with multiple mappers
[ https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347635#comment-14347635 ] Brahma Reddy Battula commented on HDFS-4929: Kindly review the patch !!! [NNBench mark] Lease mismatch error when running with multiple mappers -- Key: HDFS-4929 URL: https://issues.apache.org/jira/browse/HDFS-4929 Project: Hadoop HDFS Issue Type: Bug Components: benchmarks Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS4929.patch Command : ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` -replicationFactorPerFile 3 -maps 100 -reduces 10 Trace : 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.105.214:36320: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4929) [NNBench mark] Lease mismatch error when running with multiple mappers
[ https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347632#comment-14347632 ] Brahma Reddy Battula commented on HDFS-4929: Any thoughts on this issue..? Thanks.. [NNBench mark] Lease mismatch error when running with multiple mappers -- Key: HDFS-4929 URL: https://issues.apache.org/jira/browse/HDFS-4929 Project: Hadoop HDFS Issue Type: Bug Components: benchmarks Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS4929.patch Command : ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` -replicationFactorPerFile 3 -maps 100 -reduces 10 Trace : 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.105.214:36320: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347646#comment-14347646 ] Zhe Zhang commented on HDFS-7855: - Thanks Bo for the patch and Jing for the review. From Jenkins console it seems {{TestFileLengthOnClusterRestart}} was killed. Bo maybe you want to test it locally? The Javadoc warning seems to be a blank {{@return}} statement Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HDFS-7878: -- Assignee: Sergey Shelukhin API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347637#comment-14347637 ] Jing Zhao commented on HDFS-7855: - Thanks for working on this, Bo! The patch looks good to me in general. Some minors: # The new DFSPacket class does not need to be public # In {{DFSPacket#writeTo}}, {{assert checksumPos == dataStart;}} should be unnecessary since it's always true. We can use this chance to delete it. # The getter methods for final fields (e.g., {{isHeartbeatPacket}} and {{getSeqno}}) does not need to acquire object's monitor. # Looks like we can also convert {{lastPacketInBlock}} to be final since its modification pattern is always like: {code} currentPacket = createPacket(0, 0, bytesCurBlock, currentSeqno++); currentPacket.lastPacketInBlock = true; {code} # We have a javadoc warning reported by Jenkins. Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HDFS-7878: --- Attachment: HDFS-7878.patch this patch exposes fileId via normal FileStatus. I can add separate API instead, which will be a smaller change, but it will add a separate API... please advise API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347653#comment-14347653 ] Sergey Shelukhin commented on HDFS-7878: [~jingzhao] can you please review? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HDFS-7878: --- Status: Patch Available (was: Open) API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347654#comment-14347654 ] Jing Zhao commented on HDFS-7878: - Instead of defining a new DFSFileStatus class, can we define a new {{getFileId}} API inside of DistributedFileSystem and use HdfsFileStatus there? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7872) Erasure Coding: INodeFile.dumpTreeRecursively() supports to print striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347662#comment-14347662 ] Jing Zhao commented on HDFS-7872: - Thanks for working on this, [~tfukudom]! Instead of checking and retrieving FileWithStripedBlocksFeature, how about directly calling {{getBlocks}} (which can handle both contiguous and striped blocks) and printing out the first element of the result? Erasure Coding: INodeFile.dumpTreeRecursively() supports to print striped blocks Key: HDFS-7872 URL: https://issues.apache.org/jira/browse/HDFS-7872 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Takuya Fukudome Assignee: Takuya Fukudome Attachments: HDFS-7872.1.patch We need to let dumpTreeRecursively be able to print striped blocks (or maybe just the first striped block). {code} @Override public void dumpTreeRecursively(PrintWriter out, StringBuilder prefix, final int snapshotId) { super.dumpTreeRecursively(out, prefix, snapshotId); out.print(, fileSize= + computeFileSize(snapshotId)); // only compare the first block out.print(, blocks=); out.print(blocks == null || blocks.length == 0? null: blocks[0]); // TODO print striped blocks out.println(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space
[ https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347690#comment-14347690 ] Brahma Reddy Battula commented on HDFS-5215: Any thoughts on this jira..? dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space -- Key: HDFS-5215 URL: https://issues.apache.org/jira/browse/HDFS-5215 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-5215.patch {code}public long getAvailable() throws IOException { long remaining = getCapacity()-getDfsUsed(); long available = usage.getAvailable(); if (remaining available) { remaining = available; } return (remaining 0) ? remaining : 0; } {code} Here we are not considering the reserved space while getting the Available Space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space
[ https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347693#comment-14347693 ] Hadoop QA commented on HDFS-5215: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610538/HDFS-5215.patch against trunk revision ed70fa1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9736//console This message is automatically generated. dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space -- Key: HDFS-5215 URL: https://issues.apache.org/jira/browse/HDFS-5215 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-5215.patch {code}public long getAvailable() throws IOException { long remaining = getCapacity()-getDfsUsed(); long available = usage.getAvailable(); if (remaining available) { remaining = available; } return (remaining 0) ? remaining : 0; } {code} Here we are not considering the reserved space while getting the Available Space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7864) Erasure Coding: Update safemode calculation for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347700#comment-14347700 ] Jing Zhao commented on HDFS-7864: - bq. I think striped blocks in this Jira's description actually denotes to BlockGroup in HDFSErasureCodingDesign-20150206.pdf , right? Yes, you're right. We should update the doc later for this. bq. so a BlockGroup should be counted as 9 received blocks, right? A block group (or a striped block) consists of 9 blocks in this case. But for safemode calculation, I guess we can still count it as 1 block. This is because for a single striped block (suppose it's m+k), we need a different logic to declare it's safe (as increase the safe amount by 1): the NN has received m blocks belonging to it from block reports (since we need m blocks to recover all the data/parity blocks). Erasure Coding: Update safemode calculation for striped blocks -- Key: HDFS-7864 URL: https://issues.apache.org/jira/browse/HDFS-7864 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: GAO Rui We need to update the safemode calculation for striped blocks. Specifically, each striped block now consists of multiple data/parity blocks stored in corresponding DataNodes. The current code's calculation is thus inconsistent: each striped block is only counted as 1 expected block, while each of its member block may increase the number of received blocks by 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7826) Erasure Coding: Update INodeFile quota computation for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7826: Status: Open (was: Patch Available) Thanks for the work, Kai! Currently the Jenkins can only run a patch against trunk thus we do not need to submit patch for a feature branch. I will review the patch later. Erasure Coding: Update INodeFile quota computation for striped blocks - Key: HDFS-7826 URL: https://issues.apache.org/jira/browse/HDFS-7826 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-7826.1.patch Currently INodeFile's quota computation only considers contiguous blocks (i.e., {{INodeFile#blocks}}). We need to update it to support striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HDFS-7878: --- Attachment: HDFS-7878.01.patch Updated to just have the API... API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347762#comment-14347762 ] Kihwal Lee commented on HDFS-7434: -- +1 looks good to me. DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7671) hdfs user guide should point to the common rack awareness doc
[ https://issues.apache.org/jira/browse/HDFS-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347780#comment-14347780 ] Kai Sasaki commented on HDFS-7671: -- [~aw] Thank you for comments, but I have a question about first one and I cannot figure out second and third points. Links should be to relative paths when referring to other documentation in the tree. The link where I want to attach belongs to common docs. So the target is not included under document root of HDFS docs. Is there any good way to make relative path to other hadoop project docs? The actual documentation you want to link to is the html file generated by the RackAwareness.md file. Do you mean that actual link should be to the .html file or the other? The rewrite shouldn't include any technical details since the rack awareness doc covers all of that. Should all current other contents except for the direct link to common doc be removed? Or should I move current contents to common docs? hdfs user guide should point to the common rack awareness doc - Key: HDFS-7671 URL: https://issues.apache.org/jira/browse/HDFS-7671 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Kai Sasaki Attachments: HDFS-7671.1.patch HDFS user guide has a section on rack awareness that should really just be a pointer to the common doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7740) Test truncate with DataNodes restarting
[ https://issues.apache.org/jira/browse/HDFS-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347921#comment-14347921 ] Konstantin Shvachko commented on HDFS-7740: --- Yi, do you have any more context on this. Should we create a jira? Test truncate with DataNodes restarting --- Key: HDFS-7740 URL: https://issues.apache.org/jira/browse/HDFS-7740 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Yi Liu Fix For: 2.7.0 Attachments: HDFS-7740.001.patch, HDFS-7740.002.patch, HDFS-7740.003.patch Add a test case, which ensures replica consistency when DNs are failing and restarting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347938#comment-14347938 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702631/HDFS-7878.01.patch against trunk revision c66c3ac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9737//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9737//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1522: -- Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.7.0 Target Version/s: 2.7.0 (was: 3.0.0) Status: Resolved (was: Patch Available) I just committed this. Congratulations Dongming. Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Labels: patch Fix For: 2.7.0 Attachments: HDFS-1522.002.patch, HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348017#comment-14348017 ] Hudson commented on HDFS-7746: -- FAILURE: Integrated in Hadoop-trunk-Commit #7262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7262/]) HDFS-7746. Add a test randomly mixing append, truncate and snapshot operations. (szetszwo: rev ded0200e9c98dea960db756bb208ff475d710e28) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestAppendSnapshotTruncate.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348016#comment-14348016 ] Jing Zhao commented on HDFS-7855: - bq. One question is that lastPacketInBlock is set to false when creating a DFSPacket object. And it is modified to true when DFSOutputStream wants it to be the last packet For the last packet, looks like we always create a new packet and immediately set lastPacketInBlock to true? If this is the case, we only need to modify DFSPacket's constructor method and set lastPacketInBlock's value in the constructor. In this way we can convert it to final. Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7434: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to branch-2 and trunk. Thanks for fixing this, Daryn. DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7826) Erasure Coding: Update INodeFile quota computation for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347812#comment-14347812 ] Zhe Zhang commented on HDFS-7826: - Oops I didn't realize this JIRA handles {{computeFileSize}}. I was about to suggest the following code under HDFS-7749. {code} public final long computeFileSize(boolean includesLastUcBlock, boolean usePreferredBlockSize4LastUcBlock) { BlockInfo[] blocksInfile = isStriped() ? getStripedBlocksFeature().getBlocks() : blocks; if (blocksInfile == null || blocksInfile.length == 0) { return 0; } final int last = blocksInfile.length - 1; //check if the last block is BlockInfoUnderConstruction long size = blocksInfile[last].getNumBytes(); if (blocksInfile[last] instanceof BlockInfoContiguousUnderConstruction || blocksInfile[last] instanceof BlockInfoStripedUnderConstruction) { if (!includesLastUcBlock) { size = 0; } else if (usePreferredBlockSize4LastUcBlock) { size = getPreferredBlockSize(); } } //sum other blocks for(int i = 0; i last; i++) { size += blocksInfile[i].getNumBytes(); } return size; } {code} Erasure Coding: Update INodeFile quota computation for striped blocks - Key: HDFS-7826 URL: https://issues.apache.org/jira/browse/HDFS-7826 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-7826.1.patch Currently INodeFile's quota computation only considers contiguous blocks (i.e., {{INodeFile#blocks}}). We need to update it to support striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7853) Erasure coding: extend LocatedBlocks to support reading from striped files
[ https://issues.apache.org/jira/browse/HDFS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347836#comment-14347836 ] Zhe Zhang commented on HDFS-7853: - The extended {{DFSStripedInputStream}} uses {{LocatedStripedBlock}} like the following: {code} private LocatedBlock[] parseStripedBlockGroup(LocatedBlock bg) { LocatedBlock[] lbs = new LocatedBlock[HdfsConstants.NUM_DATA_BLOCKS]; for (short i = 0; i HdfsConstants.NUM_DATA_BLOCKS; i++) { ExtendedBlock blk = new ExtendedBlock(bg.getBlock()); short j = bg instanceof LocatedStripedBlock ? ((LocatedStripedBlock) bg).getBlockIndicies()[i] : i; blk.setBlockId(bg.getBlock().getBlockId() + i); lbs[j] = new LocatedBlock(blk, new DatanodeInfo[]{bg.getLocations()[i]}, new String[]{bg.getStorageIDs()[i]}, new StorageType[]{bg.getStorageTypes()[i]}, bg.getStartOffset() + i * cellSize, bg.isCorrupt(), null); } return lbs; } {code} My [test|https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14347808page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347808] writes a file with 2 blocks and reads it back. I found {{blockIndices}} is always sequential ({{getBlockIndicies()\[i\]}} always equal to i ). But the stored locations are not necessarily sorted based on indices in the group. So sometimes the test passes and sometimes it gets ReplicaNotFound error. My test manually applied HDFS-7729 though. Without it, I guess we should add synthetic block reports to unit-test the above point? Erasure coding: extend LocatedBlocks to support reading from striped files -- Key: HDFS-7853 URL: https://issues.apache.org/jira/browse/HDFS-7853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-7853.000.patch We should extend {{LocatedBlocks}} class so {{getBlockLocations}} can work with striping layout (possibly an extra list specifying the index of each location in the group) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7844) Create an off-heap hash table implementation
[ https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347903#comment-14347903 ] Colin Patrick McCabe commented on HDFS-7844: {{CommonConfigurationKeys.java}}: add {{hadoop.memory.manager}} configuration key, which controls whether to use on-heap or off-heap (basically) {{MemoryManager.java}}: inferface which all memory managers implement. Factory methods for creating one from a Hadoop {{Configuration}}. {{ByteArrayMemoryManager.java}}: a safe on-heap memory manager. Does extensive verification of memory accesses to ensure that they're valid. {{NativeMemoryManager.java}}: off-heap memory manager which delegates to {{Unsafe}}, which basically calls {{malloc}}. {{ProbingHashSet.java}}: an off-heap hash table implementation. It uses probing rather than separate chaining (as suggested by the name), and doubles in size when it becomes more than half full, to maintain O(1) access. {{ProbingHashSet#Adaptor}}: a class which deals with storing and loading entries from the hash table. In the case of the block map, this just means putting a single 8-byte long (the address of the off-heap BlocksInfo data) into the correct place in the hash table. {{ProbingHashSet#Key}}: these are keys that can be compared and so forth. Used to search for an element, and when we have an element, these are used to determine what it's hash code is and what else it is identical to. There is also an iterator provided which can iterate over the whole hash table... very similar to HashTable#Iterator. One twist is that it can still be used if the table is modified after the iterator is created. It will return reduced consistency results in that case, but still be useful for many cases. Create an off-heap hash table implementation Key: HDFS-7844 URL: https://issues.apache.org/jira/browse/HDFS-7844 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7844-scl.001.patch Create an off-heap hash table implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7746) Add a test randomly mixing append, truncate and snapshot
[ https://issues.apache.org/jira/browse/HDFS-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7746: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Thanks Jing for reviewing the patch. I have committed this. Add a test randomly mixing append, truncate and snapshot Key: HDFS-7746 URL: https://issues.apache.org/jira/browse/HDFS-7746 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7746_20150305.patch TestFileTruncate.testSnapshotWithAppendTruncate already does a good job for covering many test cases. Let's add a random test for mixing many append, truncate and snapshot operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347999#comment-14347999 ] Colin Patrick McCabe commented on HDFS-7836: I think it makes sense to have total heapsize in a JMX counter, if it's not already there somewhere. It's a pretty easy number to get from the OS, although there are confounding factors like shared libraries and shared memory segments. But in general, those should be minor contributors. BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Attachments: BlockManagerScalabilityImprovementsDesign.pdf Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348012#comment-14348012 ] Li Bo commented on HDFS-7855: - hi, Zhe. I run {{TestFileLengthOnClusterRestart}} locally and it worked well and took 17s. Maybe the server is busy when running the unit tests. I will upload a new patch and try again. Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347787#comment-14347787 ] Vinod Nair commented on HDFS-7885: -- Suresh, Wanted to give you some background and context here As you may know we are working with Pivotal and they are going to white label HDP as PHD starting with PHD 3.0. They have raised the issue of 3 patches that were in their Hadoop distro that are critical for HAWQ to work. We have asked them to create Apache JIRAs so our experts can evaluate them and consider for inclusion in HDP. Hopefully they will add some more detail soon -- Regards, Vinod K. Nair Partner Product Management | (650) 224-9741 | vn...@hortonworks.com 5470 Great America Parkway, Santa Clara, CA Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout
[ https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347819#comment-14347819 ] Zhe Zhang commented on HDFS-7729: - In my tests I set {{cellSize}} to a smaller value (1024) and got IndexOutOfBound error in {{encode}}. [~libo-intel] could you take a look when you resume work on this JIRA (after HDFS-7793)? Add logic to DFSOutputStream to support writing a file in striping layout -- Key: HDFS-7729 URL: https://issues.apache.org/jira/browse/HDFS-7729 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: Codec-tmp.patch, HDFS-7729-001.patch, HDFS-7729-002.patch, HDFS-7729-003.patch, HDFS-7729-004.patch, HDFS-7729-005.patch, HDFS-7729-006.patch, HDFS-7729-007.patch, HDFS-7729-008.patch, HDFS-7729-009.patch If client wants to directly write a file striping layout, we need to add some logic to DFSOutputStream. DFSOutputStream needs multiple DataStreamers to write each cell of a stripe to a remote datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7782) Read a striping layout file from client side
[ https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347831#comment-14347831 ] Jing Zhao commented on HDFS-7782: - Thanks for the work, Zhe! I will review the patch soon. Read a striping layout file from client side Key: HDFS-7782 URL: https://issues.apache.org/jira/browse/HDFS-7782 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Zhe Zhang Attachments: HDFS-7782-000.patch If client wants to read a file, he is not necessary to know and handle what layout the file is. This sub task adds logic to DFSInputStream to support reading striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347850#comment-14347850 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702620/HDFS-7878.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7853) Erasure coding: extend LocatedBlocks to support reading from striped files
[ https://issues.apache.org/jira/browse/HDFS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347876#comment-14347876 ] Zhe Zhang commented on HDFS-7853: - bq. I guess here the code should be ... + j); ? Good catch! It doesn't fix the test error though, since j is always equal to i right now. Erasure coding: extend LocatedBlocks to support reading from striped files -- Key: HDFS-7853 URL: https://issues.apache.org/jira/browse/HDFS-7853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-7853.000.patch We should extend {{LocatedBlocks}} class so {{getBlockLocations}} can work with striping layout (possibly an extra list specifying the index of each location in the group) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347922#comment-14347922 ] Konstantin Shvachko commented on HDFS-1522: --- +1 on the patch. Test failure is unrelated. Will commit. Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Labels: patch Fix For: 3.0.0 Attachments: HDFS-1522.002.patch, HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7826) Erasure Coding: Update INodeFile quota computation for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347985#comment-14347985 ] Jing Zhao commented on HDFS-7826: - Thanks Kai! bq. Current `getBlocks` can handle striped blocks. So its length is `m` + `k` (m data blocks and k parity blocks) {{getBlocks}} returns a striped block array. Each element in that array is a striped block consisting of m data blocks and k parity blocks. The length of {{getBlocks}}'s result is not related to m and k. Erasure Coding: Update INodeFile quota computation for striped blocks - Key: HDFS-7826 URL: https://issues.apache.org/jira/browse/HDFS-7826 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-7826.1.patch Currently INodeFile's quota computation only considers contiguous blocks (i.e., {{INodeFile#blocks}}). We need to update it to support striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7887) Asynchronous native RPC v9 client
[ https://issues.apache.org/jira/browse/HDFS-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348008#comment-14348008 ] Haohui Mai commented on HDFS-7887: -- Given the popularity of integrating Hadoop with other native applications, there are several attempts to implement a native RPC library. The most recent attempts are HADOOP-10389 and HDFS-7013. The jira proposes to combine the previous efforts and creates a unified libraries. The existing implementations in HADOOP-10389 and HDFS-7013 have several drawbacks: * The implementation is tightly coupled with the native HDFS client, making it unavailable for YARN. * Both HADOOP-10389 and HDFS-7013 only provide synchronous APIs. It fails to be a building block for an asynchronous higher-level client. * HDFS-7013 uses C++ exceptions extensively throughout the code. The community has expressed the concerns on C++ exceptions and would like to get them removed before the code is merged. This jira proposes to implement the following components: * A compiler that generates stubs from protobuf definitions, which can be taken from HADOOP-10389. * An asynchronous runtime for the Hadoop RPC * Supports for SASL, wire encryption, and Kerberos authentication, which can be taken from HDFS-7013. * Unit tests, many of which can be taken from HADOOP-10389 and HDFS-7013. Asynchronous native RPC v9 client - Key: HDFS-7887 URL: https://issues.apache.org/jira/browse/HDFS-7887 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Haohui Mai There are more and more integration happening between Hadoop and applications that are implemented using languages other than Java. To access Hadoop, applications either have to go through JNI (e.g. libhdfs), or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). Unfortunately, neither of them are satisfactory: * Integrating with JNI requires running a JVM inside the application. Some applications (e.g., real-time processing, MPP database) does not want the footprints and GC behavior of the JVM. * The Hadoop RPC protocol has a rich feature set including wire encryption, SASL, Kerberos authentication. Many 3rd-party implementations can fully cover the feature sets thus they might work in limited environment. This jira is to propose implementing an Hadoop RPC library in C++ that provides a common ground to implement higher-level native client for HDFS, YARN, and MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348024#comment-14348024 ] Tsz Wo Nicholas Sze commented on HDFS-7885: --- I see. The problem is only in legacy.blockreader.local, which uses the generation stamp passed by client. Just have checked the remote block reader. It does not have such bug. Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Assignee: Tsz Wo Nicholas Sze Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348026#comment-14348026 ] Li Bo commented on HDFS-7855: - Right, if we modify DFSPacket's constructor we can make {{lastPacketInBlock}} as final. I will change the constructor in new patch. Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7433) DatanodeManager#datanodeMap should be a HashMap, not a TreeMap, to optimize lookup performance
[ https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347784#comment-14347784 ] Kihwal Lee commented on HDFS-7433: -- [~daryn] Do you think HDFS-7434 was the cause of the test failure? DatanodeManager#datanodeMap should be a HashMap, not a TreeMap, to optimize lookup performance -- Key: HDFS-7433 URL: https://issues.apache.org/jira/browse/HDFS-7433 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7433.patch, HDFS-7433.patch, HDFS-7433.patch The datanode map is currently a {{TreeMap}}. For many thousands of datanodes, tree lookups are ~10X more expensive than a {{HashMap}}. Insertions and removals are up to 100X more expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7671) hdfs user guide should point to the common rack awareness doc
[ https://issues.apache.org/jira/browse/HDFS-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347789#comment-14347789 ] Allen Wittenauer commented on HDFS-7671: bq. The link where I want to attach belongs to common docs. So the target is not included under document root of HDFS docs. Is there any good way to make relative path to other hadoop project docs? bq. Do you mean that actual link should be to the .html file or the other? You want something like: {code} [Rack Awareness](../../hadoop-project-dist/hadoop-common/RackAwareness.html) {code} This will point to the rack awareness' html file after the mvn build. bq. Should all current other contents except for the direct link to common doc be removed? Or should I move current contents to common docs? I think the current content (well, the theory and practice, not word for word, obviously) is already covered in that doc. So this section just needs some wordsmithing on why someone should go follow that link to that important topic. :) hdfs user guide should point to the common rack awareness doc - Key: HDFS-7671 URL: https://issues.apache.org/jira/browse/HDFS-7671 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Kai Sasaki Attachments: HDFS-7671.1.patch HDFS user guide has a section on rack awareness that should really just be a pointer to the common doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7826) Erasure Coding: Update INodeFile quota computation for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347970#comment-14347970 ] Kai Sasaki commented on HDFS-7826: -- Thank you for reviewing. I understand about 1. I'll update. I have a question about 2. Current `getBlocks` can handle striped blocks. So its length is `m` + `k` (m data blocks and k parity blocks), right? I think I can simply calculate storage space usage by adding all these block size in block list returned from `getBlocks` because it includes data blocks and parity blocks. Is there any oversight or misunderstanding? Erasure Coding: Update INodeFile quota computation for striped blocks - Key: HDFS-7826 URL: https://issues.apache.org/jira/browse/HDFS-7826 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-7826.1.patch Currently INodeFile's quota computation only considers contiguous blocks (i.e., {{INodeFile#blocks}}). We need to update it to support striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7853) Erasure coding: extend LocatedBlocks to support reading from striped files
[ https://issues.apache.org/jira/browse/HDFS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347868#comment-14347868 ] Jing Zhao commented on HDFS-7853: - Thanks for the review, Zhe! bq. Just a reminder that BlockInfoStripedUnderConstruction#blockIndices is only applicable in over-replication as well. This may not be true. For a newly created BlockInfoStripedUC, {{blockIndices}} is not useful since we simply assign block (in the group) to storage according to the target storage sequence. However, if the namenode restarts before the block is completed, the replicas array is later rebuilt based on the incoming block reports, and we need to record the corresponding block index somewhere (or keep some null elements in the ReplicaUC array). bq. If the non-over-replicated locations in LocatedBlockStriped are sorted Here a sorted array may not be enough, e.g., if we miss several blocks in the middle. Then we may still need an extra index array in the msg. bq. blk.setBlockId(bg.getBlock().getBlockId() + i); I guess here the code should be ... + j); ? Erasure coding: extend LocatedBlocks to support reading from striped files -- Key: HDFS-7853 URL: https://issues.apache.org/jira/browse/HDFS-7853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-7853.000.patch We should extend {{LocatedBlocks}} class so {{getBlockLocations}} can work with striping layout (possibly an extra list specifying the index of each location in the group) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7844) Create an off-heap hash table implementation
[ https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7844 started by Colin Patrick McCabe. -- Create an off-heap hash table implementation Key: HDFS-7844 URL: https://issues.apache.org/jira/browse/HDFS-7844 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7844-scl.002.patch Create an off-heap hash table implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347889#comment-14347889 ] Sergey Shelukhin commented on HDFS-7878: This QA was for old patch... API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7844) Create an off-heap hash table implementation
[ https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7844: --- Attachment: HDFS-7844-scl.001.patch Create an off-heap hash table implementation Key: HDFS-7844 URL: https://issues.apache.org/jira/browse/HDFS-7844 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7844-scl.001.patch Create an off-heap hash table implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7887) Asynchronous native RPC v9 client
[ https://issues.apache.org/jira/browse/HDFS-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347992#comment-14347992 ] Allen Wittenauer commented on HDFS-7887: Is this really a good, long term strategy given our use of protobuf now that gRPC exists? Asynchronous native RPC v9 client - Key: HDFS-7887 URL: https://issues.apache.org/jira/browse/HDFS-7887 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Haohui Mai There are more and more integration happening between Hadoop and applications that are implemented using languages other than Java. To access Hadoop, applications either have to go through JNI (e.g. libhdfs), or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). Unfortunately, neither of them are satisfactory: * Integrating with JNI requires running a JVM inside the application. Some applications (e.g., real-time processing, MPP database) does not want the footprints and GC behavior of the JVM. * The Hadoop RPC protocol has a rich feature set including wire encryption, SASL, Kerberos authentication. Many 3rd-party implementations can fully cover the feature sets thus they might work in limited environment. This jira is to propose implementing an Hadoop RPC library in C++ that provides a common ground to implement higher-level native client for HDFS, YARN, and MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout
[ https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348066#comment-14348066 ] Li Bo commented on HDFS-7729: - hi, Zhe, I will check this problem soon. This JIRA has been discussing a lot on modifying {{DFSOutputStream}}, how about creating a new JIRA for subclassing {{DFSOutputStream}} based on HDFS-7793? Add logic to DFSOutputStream to support writing a file in striping layout -- Key: HDFS-7729 URL: https://issues.apache.org/jira/browse/HDFS-7729 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: Codec-tmp.patch, HDFS-7729-001.patch, HDFS-7729-002.patch, HDFS-7729-003.patch, HDFS-7729-004.patch, HDFS-7729-005.patch, HDFS-7729-006.patch, HDFS-7729-007.patch, HDFS-7729-008.patch, HDFS-7729-009.patch If client wants to directly write a file striping layout, we need to add some logic to DFSOutputStream. DFSOutputStream needs multiple DataStreamers to write each cell of a stripe to a remote datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347808#comment-14347808 ] Zhe Zhang commented on HDFS-7285: - To follow up on the PoC prototype plan, I created a very rough test by manually applying the following patches, and it seems to work -- based on the [description | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14339006page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339006] above :) # HDFS-7729 (this one needs major refactor after HDFS-7793) # HDFS-7853 # HDFS-7782 A few bugs have been found and I'll post them under individual JIRAs. Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: ECAnalyzer.py, ECParser.py, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1522) Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant
[ https://issues.apache.org/jira/browse/HDFS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347962#comment-14347962 ] Hudson commented on HDFS-1522: -- FAILURE: Integrated in Hadoop-trunk-Commit #7260 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7260/]) HDFS-1522. Combine two BLOCK_FILE_PREFIX constants into one. Contributed by Dongming Liang. (shv: rev 430b5371883e22abb65f37c3e3d4afc3f421fc89) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestCrcCorruption.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java Merge Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant - Key: HDFS-1522 URL: https://issues.apache.org/jira/browse/HDFS-1522 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dongming Liang Labels: patch Fix For: 2.7.0 Attachments: HDFS-1522.002.patch, HDFS-1522.patch Two semantically identical constant {{Block.BLOCK_FILE_PREFIX}} and {{DataStorage.BLOCK_FILE_PREFIX}} should merged into one. Should be defined in {{Block}}, imo. Also use cases of blok_, like in {{DirectoryScanner}} should be replaced by the this constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7887) Asynchronous native RPC v9 client
Haohui Mai created HDFS-7887: Summary: Asynchronous native RPC v9 client Key: HDFS-7887 URL: https://issues.apache.org/jira/browse/HDFS-7887 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Haohui Mai There are more and more integration happening between Hadoop and applications that are implemented using languages other than Java. To access Hadoop, applications either have to go through JNI (e.g. libhdfs), or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). Unfortunately, neither of them are satisfactory: * Integrating with JNI requires running a JVM inside the application. Some applications (e.g., real-time processing, MPP database) does not want the footprints and GC behavior of the JVM. * The Hadoop RPC protocol has a rich feature set including wire encryption, SASL, Kerberos authentication. Many 3rd-party implementations can fully cover the feature sets thus they might work in limited environment. This jira is to propose implementing an Hadoop RPC library in C++ that provides a common ground to implement higher-level native client for HDFS, YARN, and MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7872) Erasure Coding: INodeFile.dumpTreeRecursively() supports to print striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Fukudome updated HDFS-7872: -- Attachment: HDFS-7872.2.patch Thank you for reviewing and your advices, [~jingzhao]! I have attached a new patch which directly call {{getBlocks}}. Can you review it for me? Thank you. Erasure Coding: INodeFile.dumpTreeRecursively() supports to print striped blocks Key: HDFS-7872 URL: https://issues.apache.org/jira/browse/HDFS-7872 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Takuya Fukudome Assignee: Takuya Fukudome Attachments: HDFS-7872.1.patch, HDFS-7872.2.patch We need to let dumpTreeRecursively be able to print striped blocks (or maybe just the first striped block). {code} @Override public void dumpTreeRecursively(PrintWriter out, StringBuilder prefix, final int snapshotId) { super.dumpTreeRecursively(out, prefix, snapshotId); out.print(, fileSize= + computeFileSize(snapshotId)); // only compare the first block out.print(, blocks=); out.print(blocks == null || blocks.length == 0? null: blocks[0]); // TODO print striped blocks out.println(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7855: Status: Patch Available (was: In Progress) Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch, HDFS-7855-006.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7855: Attachment: HDFS-7855-006.patch Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch, HDFS-7855-006.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7855) Separate class Packet from DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7855: Status: In Progress (was: Patch Available) Separate class Packet from DFSOutputStream -- Key: HDFS-7855 URL: https://issues.apache.org/jira/browse/HDFS-7855 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7855-001.patch, HDFS-7855-002.patch, HDFS-7855-003.patch, HDFS-7855-004.patch, HDFS-7855-005.patch Class Packet is an inner class in DFSOutputStream and also used by DataStreamer. This sub task separates Packet out of DFSOutputStream to aid the separation in HDFS-7854. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347779#comment-14347779 ] Suresh Srinivas commented on HDFS-7885: --- [~vgogate], can you please add details about what test you are doing and what issues you are seeing? Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-7885: -- Comment: was deleted (was: Suresh, Wanted to give you some background and context here As you may know we are working with Pivotal and they are going to white label HDP as PHD starting with PHD 3.0. They have raised the issue of 3 patches that were in their Hadoop distro that are critical for HAWQ to work. We have asked them to create Apache JIRAs so our experts can evaluate them and consider for inclusion in HDP. Hopefully they will add some more detail soon -- Regards, Vinod K. Nair Partner Product Management | (650) 224-9741 | vn...@hortonworks.com 5470 Great America Parkway, Santa Clara, CA ) Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
Yi Liu created HDFS-7886: Summary: TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes Key: HDFS-7886 URL: https://issues.apache.org/jira/browse/HDFS-7886 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Yi Liu Priority: Minor https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
[ https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reassigned HDFS-7886: Assignee: Yi Liu TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes Key: HDFS-7886 URL: https://issues.apache.org/jira/browse/HDFS-7886 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7740) Test truncate with DataNodes restarting
[ https://issues.apache.org/jira/browse/HDFS-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347939#comment-14347939 ] Yi Liu commented on HDFS-7740: -- I create HDFS-7886, thanks. Test truncate with DataNodes restarting --- Key: HDFS-7740 URL: https://issues.apache.org/jira/browse/HDFS-7740 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Yi Liu Fix For: 2.7.0 Attachments: HDFS-7740.001.patch, HDFS-7740.002.patch, HDFS-7740.003.patch Add a test case, which ensures replica consistency when DNs are failing and restarting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347963#comment-14347963 ] Zhanwei Wang commented on HDFS-7885: In function {{getBlockLocalPathInfo}} the input parameter {{block}} is passed by the client. Since client will buffer file's metadata, block.getGenerationStamp() may be older then the real generationStamp on Datanode. Datanode will report that it cannot find metadata file and then client fail to read. {code} @Override // FsDatasetSpi public BlockLocalPathInfo getBlockLocalPathInfo(ExtendedBlock block) throws IOException { File datafile = getBlockFile(block); File metafile = FsDatasetUtil.getMetaFile(datafile, block.getGenerationStamp()); BlockLocalPathInfo info = new BlockLocalPathInfo(block, datafile.getAbsolutePath(), metafile.getAbsolutePath()); return info; } {code} Test case enable read-circuit and set {{dfs.client.use.legacy.blockreader.local}} to true 1) crete a file with two blocks. 2) open it for read, but not read. (client fetch block metadata) 3) append to it. (increase generation stamp of last block) 4) continue to read. (will fail) Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reassigned HDFS-7885: - Assignee: Tsz Wo Nicholas Sze Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Assignee: Tsz Wo Nicholas Sze Priority: Critical Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7434) DatanodeID hashCode should not be mutable
[ https://issues.apache.org/jira/browse/HDFS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347805#comment-14347805 ] Hudson commented on HDFS-7434: -- FAILURE: Integrated in Hadoop-trunk-Commit #7258 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7258/]) HDFS-7434. DatanodeID hashCode should not be mutable. Contributed by Daryn Sharp. (kihwal: rev 722b4794693d8bad1dee0ca5c2f99030a08402f9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeID.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DatanodeID hashCode should not be mutable - Key: HDFS-7434 URL: https://issues.apache.org/jira/browse/HDFS-7434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: HDFS-7434.patch Mutable hash codes may lead to orphaned instances in a collection. Instances must always be removed prior to modification of hash code values, and re-inserted. Although current code appears to do this, the mutable hash code is a landmine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)