[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=676130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676130 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 02:01 Start Date: 04/Nov/21 02:01 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #3593: URL: https://github.com/apache/hadoop/pull/3593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 676130) Time Spent: 6.5h (was: 6h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=676007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676007 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:49 Start Date: 04/Nov/21 01:49 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741582271 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Thanks, that's a good advice, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 676007) Time Spent: 6h 20m (was: 6h 10m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675976 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:46 Start Date: 04/Nov/21 01:46 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958791127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675976) Time Spent: 6h 10m (was: 6h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675747 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:24 Start Date: 04/Nov/21 01:24 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958887599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675747) Time Spent: 6h (was: 5h 50m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 6h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675726 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:22 Start Date: 04/Nov/21 01:22 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958610440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675726) Time Spent: 5h 50m (was: 5h 40m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675606 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:07 Start Date: 04/Nov/21 01:07 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741582271 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Thanks, that's a good advice, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675606) Time Spent: 5h 40m (was: 5.5h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675587 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 01:05 Start Date: 04/Nov/21 01:05 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958791127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675587) Time Spent: 5.5h (was: 5h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675499 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 00:56 Start Date: 04/Nov/21 00:56 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #3593: URL: https://github.com/apache/hadoop/pull/3593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675499) Time Spent: 5h 20m (was: 5h 10m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675250 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 00:32 Start Date: 04/Nov/21 00:32 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958887599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675250) Time Spent: 5h 10m (was: 5h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675220 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 00:27 Start Date: 04/Nov/21 00:27 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958610440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675220) Time Spent: 5h (was: 4h 50m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675094 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 00:10 Start Date: 04/Nov/21 00:10 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741582271 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Thanks, that's a good advice, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675094) Time Spent: 4h 50m (was: 4h 40m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=675074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675074 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 04/Nov/21 00:08 Start Date: 04/Nov/21 00:08 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958791127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 675074) Time Spent: 4h 40m (was: 4.5h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674926 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 20:20 Start Date: 03/Nov/21 20:20 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #3593: URL: https://github.com/apache/hadoop/pull/3593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674926) Time Spent: 4.5h (was: 4h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674870=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674870 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 19:18 Start Date: 03/Nov/21 19:18 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-959845514 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 2s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 29s | | trunk passed | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 17s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 58s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 57s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 37s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 21s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 21s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 52s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 15s | | the patch passed | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 1s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 349m 27s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 37s | | The patch does not generate ASF License warnings. | | | | 454m 41s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint | | uname | Linux 1298076a1247 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 51e61547d07d9a0c236b89e5b804aaa8f362f28d | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674615 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 13:52 Start Date: 03/Nov/21 13:52 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-959130953 Thanks, looks good. I will commit when the CI checks come back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674615) Time Spent: 4h 10m (was: 4h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674557 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 11:44 Start Date: 03/Nov/21 11:44 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958953019 @sodonnel Thanks, documentation file `HDFSCommands.md` is updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674557) Time Spent: 4h (was: 3h 50m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674524 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 10:45 Start Date: 03/Nov/21 10:45 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958887599 @cndaimin I was about to commit this, and I remembered we should update the documentation to include this command. The documentation is in a markdown file and gets published with the release, like here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Debug_Commands That page is generated from: ``` hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md ``` Would you be able to add a section for this new command under the Debug_Commands section please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674524) Time Spent: 3h 50m (was: 3h 40m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674488 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 09:48 Start Date: 03/Nov/21 09:48 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958791127 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 22s | | trunk passed | | +1 :green_heart: | compile | 1m 32s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 17s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 59s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 35s | | trunk passed | | +1 :green_heart: | javadoc | 0m 59s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 41s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 9s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 9s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 52s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 16s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 18s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 324m 13s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 431m 32s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 04f9538a1b9b 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 21c1887fd7d0ede169c42e11b0c793c717dc7c47 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/3/testReport/ | | Max. process+thread count | 1996 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674341 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 02:44 Start Date: 03/Nov/21 02:44 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-958610440 @sodonnel Thanks for your review. Update: Removed the unused import and added a test on verifying file with 2 block groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674341) Time Spent: 3.5h (was: 3h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674337 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 03/Nov/21 02:38 Start Date: 03/Nov/21 02:38 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741582271 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Thanks, that's a good advice, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674337) Time Spent: 3h 20m (was: 3h 10m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674054 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 21:34 Start Date: 02/Nov/21 21:34 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741306774 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Could you add one more test case for a file that has multiple block groups, so we test the command looping over more than 1 block? You are using EC 3-2, so write a file that is 6MB, with a 1MB block size. That should create 2 block groups, with a length of 3MB each. Each block would then have a single 1MB EC chunk in it. In `DFSTestUtil` there is a method to pass the blocksize already, so the test would be almost the same as the ones above: ``` public static void createFile(FileSystem fs, Path fileName, int bufferLen, long fileLen, long blockSize, short replFactor, long seed) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674054) Time Spent: 2h 50m (was: 2h 40m) > Debug tool to verify the correctness of erasure coding on file >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674089 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 21:39 Start Date: 02/Nov/21 21:39 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740842196 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674056 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 21:35 Start Date: 02/Nov/21 21:35 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957941102 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 45s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 57s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 57s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 41s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 51s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 17s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 363m 34s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 469m 10s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8846eb1a8063 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d30b66ba08b5ad4404363477591cb1681c12cb6c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=674046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674046 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 21:34 Start Date: 02/Nov/21 21:34 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957273204 @sodonnel Thanks for your review. Update: I have fixed the review comments and added some test in `TestDebugAdmin#testVerifyECCommand`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 674046) Time Spent: 2h 40m (was: 2.5h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673905 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 21:19 Start Date: 02/Nov/21 21:19 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957960788 Thanks for the update @cndaimin - There is just one style issue detected and I have one suggestion about adding another test case inside your existing test. Aside from that, I think this change looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673905) Time Spent: 2.5h (was: 2h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673661 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 18:22 Start Date: 02/Nov/21 18:22 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957273204 @sodonnel Thanks for your review. Update: I have fixed the review comments and added some test in `TestDebugAdmin#testVerifyECCommand`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673661) Time Spent: 2h 20m (was: 2h 10m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673584 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 18:14 Start Date: 02/Nov/21 18:14 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740842196 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673512 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 18:05 Start Date: 02/Nov/21 18:05 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957941102 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 45s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 57s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 57s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 41s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 51s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 17s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 363m 34s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 469m 10s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8846eb1a8063 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d30b66ba08b5ad4404363477591cb1681c12cb6c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673509 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 18:05 Start Date: 02/Nov/21 18:05 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741306774 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Could you add one more test case for a file that has multiple block groups, so we test the command looping over more than 1 block? You are using EC 3-2, so write a file that is 6MB, with a 1MB block size. That should create 2 block groups, with a length of 3MB each. Each block would then have a single 1MB EC chunk in it. In `DFSTestUtil` there is a method to pass the blocksize already, so the test would be almost the same as the ones above: ``` public static void createFile(FileSystem fs, Path fileName, int bufferLen, long fileLen, long blockSize, short replFactor, long seed) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673509) Time Spent: 1h 50m (was: 1h 40m) > Debug tool to verify the correctness of erasure coding on file >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673320 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 17:46 Start Date: 02/Nov/21 17:46 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957960788 Thanks for the update @cndaimin - There is just one style issue detected and I have one suggestion about adding another test case inside your existing test. Aside from that, I think this change looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673320) Time Spent: 1h 40m (was: 1.5h) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673293 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 17:16 Start Date: 02/Nov/21 17:16 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957960788 Thanks for the update @cndaimin - There is just one style issue detected and I have one suggestion about adding another test case inside your existing test. Aside from that, I think this change looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673293) Time Spent: 1.5h (was: 1h 20m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673291 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 17:15 Start Date: 02/Nov/21 17:15 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r741306774 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDebugAdmin.java ## @@ -166,8 +179,91 @@ public void testComputeMetaCommand() throws Exception { @Test(timeout = 6) public void testRecoverLeaseforFileNotFound() throws Exception { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(1).build(); +cluster.waitActive(); assertTrue(runCmd(new String[] { "recoverLease", "-path", "/foo", "-retries", "2" }).contains( "Giving up on recoverLease for /foo after 1 try")); } + + @Test(timeout = 6) + public void testVerifyECCommand() throws Exception { +final ErasureCodingPolicy ecPolicy = SystemErasureCodingPolicies.getByID( +SystemErasureCodingPolicies.RS_3_2_POLICY_ID); +cluster = DFSTestUtil.setupCluster(conf, 6, 5, 0); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); + +assertEquals("ret: 1, verifyEC -file Verify HDFS erasure coding on " + +"all block groups of the file.", runCmd(new String[]{"verifyEC"})); + +assertEquals("ret: 1, File /bar does not exist.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + +fs.create(new Path("/bar")).close(); +assertEquals("ret: 1, File /bar is not erasure coded.", +runCmd(new String[]{"verifyEC", "-file", "/bar"})); + + +final Path ecDir = new Path("/ec"); +fs.mkdir(ecDir, FsPermission.getDirDefault()); +fs.enableErasureCodingPolicy(ecPolicy.getName()); +fs.setErasureCodingPolicy(ecDir, ecPolicy.getName()); + +assertEquals("ret: 1, File /ec is not a regular file.", +runCmd(new String[]{"verifyEC", "-file", "/ec"})); + +fs.create(new Path(ecDir, "foo")); +assertEquals("ret: 1, File /ec/foo is not closed.", +runCmd(new String[]{"verifyEC", "-file", "/ec/foo"})); + +final short repl = 1; +final long k = 1024; +final long m = k * k; +final long seed = 0x1234567L; +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_65535"), 65535, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_65535"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_256k"), 256 * k, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_256k"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_1m"), m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_1m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_2m"), 2 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_2m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_3m"), 3 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_3m"}) +.contains("All EC block group status: OK")); +DFSTestUtil.createFile(fs, new Path(ecDir, "foo_5m"), 5 * m, repl, seed); +assertTrue(runCmd(new String[]{"verifyEC", "-file", "/ec/foo_5m"}) +.contains("All EC block group status: OK")); + Review comment: Could you add one more test case for a file that has multiple block groups, so we test the command looping over more than 1 block? You are using EC 3-2, so write a file that is 6MB, with a 1MB block size. That should create 2 block groups, with a length of 3MB each. Each block would then have a single 1MB EC chunk in it. In `DFSTestUtil` there is a method to pass the blocksize already, so the test would be almost the same as the ones above: ``` public static void createFile(FileSystem fs, Path fileName, int bufferLen, long fileLen, long blockSize, short replFactor, long seed) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673291) Time Spent: 1h 20m (was: 1h 10m) > Debug tool to verify the correctness of erasure coding on file >
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673280 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 16:53 Start Date: 02/Nov/21 16:53 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957941102 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 45s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 57s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 57s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 41s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 51s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 17s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 363m 34s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 469m 10s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8846eb1a8063 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d30b66ba08b5ad4404363477591cb1681c12cb6c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673053 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 09:44 Start Date: 02/Nov/21 09:44 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-957273204 @sodonnel Thanks for your review. Update: I have fixed the review comments and added some test in `TestDebugAdmin#testVerifyECCommand`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 673053) Time Spent: 1h (was: 50m) > Debug tool to verify the correctness of erasure coding on file > -- > > Key: HDFS-16286 > URL: https://issues.apache.org/jira/browse/HDFS-16286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Block data in erasure coded block group may corrupt and the block meta > (checksum) is unable to discover the corruption in some cases such as EC > reconstruction, related issues are: HDFS-14768, HDFS-15186, HDFS-15240. > In addition to HDFS-15759, there needs a tool to check erasure coded file > whether any block group has data corruption in case of other conditions > rather than EC reconstruction, or the feature HDFS-15759(validation during EC > reconstruction) is not open(which is close by default now). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673051 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 09:39 Start Date: 02/Nov/21 09:39 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740874125 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=673033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673033 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 02/Nov/21 09:05 Start Date: 02/Nov/21 09:05 Worklog Time Spent: 10m Work Description: cndaimin commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740842196 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=672650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-672650 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 01/Nov/21 12:54 Start Date: 01/Nov/21 12:54 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740188593 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=672649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-672649 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 01/Nov/21 12:52 Start Date: 01/Nov/21 12:52 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#discussion_r740187297 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DebugAdmin.java ## @@ -387,6 +414,211 @@ int run(List args) throws IOException { } } + /** + * The command for verifying the correctness of erasure coding on an erasure coded file. + */ + private class VerifyECCommand extends DebugCommand { +private DFSClient client; +private int dataBlkNum; +private int parityBlkNum; +private int cellSize; +private boolean useDNHostname; +private CachingStrategy cachingStrategy; +private int stripedReadBufferSize; +private CompletionService readService; +private RawErasureDecoder decoder; +private BlockReader[] blockReaders; + + +VerifyECCommand() { + super("verifyEC", + "verifyEC -file ", + " Verify HDFS erasure coding on all block groups of the file."); +} + +int run(List args) throws IOException { + if (args.size() < 2) { +System.out.println(usageText); +System.out.println(helpText + System.lineSeparator()); +return 1; + } + String file = StringUtils.popOptionWithArgument("-file", args); + Path path = new Path(file); + DistributedFileSystem dfs = AdminHelper.getDFS(getConf()); + this.client = dfs.getClient(); + + FileStatus fileStatus; + try { +fileStatus = dfs.getFileStatus(path); + } catch (FileNotFoundException e) { +System.err.println("File " + file + " does not exist."); +return 1; + } + + if (!fileStatus.isFile()) { +System.err.println("File " + file + " is not a regular file."); +return 1; + } + if (!dfs.isFileClosed(path)) { +System.err.println("File " + file + " is not closed."); +return 1; + } + this.useDNHostname = getConf().getBoolean(DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME, + DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT); + this.cachingStrategy = CachingStrategy.newDefaultStrategy(); + this.stripedReadBufferSize = getConf().getInt( + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_KEY, + DFSConfigKeys.DFS_DN_EC_RECONSTRUCTION_STRIPED_READ_BUFFER_SIZE_DEFAULT); + + LocatedBlocks locatedBlocks = client.getLocatedBlocks(file, 0, fileStatus.getLen()); + if (locatedBlocks.getErasureCodingPolicy() == null) { +System.err.println("File " + file + " is not erasure coded."); +return 1; + } + ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy(); + this.dataBlkNum = ecPolicy.getNumDataUnits(); + this.parityBlkNum = ecPolicy.getNumParityUnits(); + this.cellSize = ecPolicy.getCellSize(); + this.decoder = CodecUtil.createRawDecoder(getConf(), ecPolicy.getCodecName(), + new ErasureCoderOptions( + ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits())); + int blockNum = dataBlkNum + parityBlkNum; + this.readService = new ExecutorCompletionService<>( + DFSUtilClient.getThreadPoolExecutor(blockNum, blockNum, 60, + new LinkedBlockingQueue<>(), "read-", false)); + this.blockReaders = new BlockReader[dataBlkNum + parityBlkNum]; + + for (LocatedBlock locatedBlock : locatedBlocks.getLocatedBlocks()) { +System.out.println("Checking EC block group: blk_" + locatedBlock.getBlock().getBlockId()); +LocatedStripedBlock blockGroup = (LocatedStripedBlock) locatedBlock; + +try { + verifyBlockGroup(blockGroup); + System.out.println("Status: OK"); +} catch (Exception e) { + System.err.println("Status: ERROR, message: " + e.getMessage()); + return 1; +} finally { + closeBlockReaders(); +} + } + System.out.println("\nAll EC block group status: OK"); + return 0; +} + +private void verifyBlockGroup(LocatedStripedBlock blockGroup) throws Exception { + final LocatedBlock[] indexedBlocks = StripedBlockUtil.parseStripedBlockGroup(blockGroup, + cellSize, dataBlkNum, parityBlkNum); + + int blockNumExpected = Math.min(dataBlkNum, + (int) ((blockGroup.getBlockSize() - 1) / cellSize + 1)) + parityBlkNum; + if (blockGroup.getBlockIndices().length < blockNumExpected) { +throw new Exception("Block group is under-erasure-coded."); + } + + long
[jira] [Work logged] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file
[ https://issues.apache.org/jira/browse/HDFS-16286?focusedWorklogId=670741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670741 ] ASF GitHub Bot logged work on HDFS-16286: - Author: ASF GitHub Bot Created on: 27/Oct/21 13:54 Start Date: 27/Oct/21 13:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3593: URL: https://github.com/apache/hadoop/pull/3593#issuecomment-952955590 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 20s | | trunk passed | | +1 :green_heart: | compile | 1m 24s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 58s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 56s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 29s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 8s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 8s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 50s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 15s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 19s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 31s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 322m 17s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 426m 19s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3593 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux ba38ee2e0214 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c33bb018ac0a5a6d4365e98f4e123d263732555f | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/1/testReport/ | | Max. process+thread count | 2058 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3593/1/console | | versions |