[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223973#comment-17223973 ] Hadoop QA commented on HDFS-14090: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 52s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 33s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 22s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 21s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/285/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt] | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 3s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 35s{color} | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/285/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li updated HDFS-14090: -- Attachment: HDFS-14090.018.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, > HDFS-14090.018.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223946#comment-17223946 ] Hadoop QA commented on HDFS-14090: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 4s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 31s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 25s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/284/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt] | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 26s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 42s{color} | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/284/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223898#comment-17223898 ] Hadoop QA commented on HDFS-15643: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 24s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 7s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 46s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 23s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 20s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 22s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 28s{color} | [/pa
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li updated HDFS-14090: -- Attachment: HDFS-14090.017.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15663) NetworkBinding has a runtime class dependency on a third-party shaded class
[ https://issues.apache.org/jira/browse/HDFS-15663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223889#comment-17223889 ] Wei-Chiu Chuang commented on HDFS-15663: [~ste...@apache.org] any ideas? > NetworkBinding has a runtime class dependency on a third-party shaded class > --- > > Key: HDFS-15663 > URL: https://issues.apache.org/jira/browse/HDFS-15663 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Chris Wensel >Priority: Major > > The hadoop-aws library has a dependency on > 'com.amazonaws':aws-java-sdk-bundle' which in turn is a fat jar of all AWS > SDK libraries and shaded dependencies. > > This dependency is 181MB. > > Some applications using the S3AFilesystem may be sensitive to having a large > footprint. For example, building an application using Parquet and bundled > with Docker. > > Typically, in prior Hadoop versions, the bundle was replaced by the specific > AWS SDK dependencies, dropping the overall footprint. > > In 3.3 (and maybe prior versions) this strategy does not work because of the > following exception: > {{java.lang.NoClassDefFoundError: > com/amazonaws/thirdparty/apache/http/conn/socket/ConnectionSocketFactory}} > {{ at > org.apache.hadoop.fs.s3a.S3AUtils.initProtocolSettings(S3AUtils.java:1335)}} > {{ at > org.apache.hadoop.fs.s3a.S3AUtils.initConnectionSettings(S3AUtils.java:1290)}} > {{ at org.apache.hadoop.fs.s3a.S3AUtils.createAwsConf(S3AUtils.java:1247)}} > {{ at > org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:61)}} > {{ at > org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:644)}} > {{ at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:390)}} > {{ at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)}} > {{ at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)}} > {{ at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)}} > {{ at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)}} > {{ at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)}} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15663) NetworkBinding has a runtime class dependency on a third-party shaded class
Chris Wensel created HDFS-15663: --- Summary: NetworkBinding has a runtime class dependency on a third-party shaded class Key: HDFS-15663 URL: https://issues.apache.org/jira/browse/HDFS-15663 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.0 Reporter: Chris Wensel The hadoop-aws library has a dependency on 'com.amazonaws':aws-java-sdk-bundle' which in turn is a fat jar of all AWS SDK libraries and shaded dependencies. This dependency is 181MB. Some applications using the S3AFilesystem may be sensitive to having a large footprint. For example, building an application using Parquet and bundled with Docker. Typically, in prior Hadoop versions, the bundle was replaced by the specific AWS SDK dependencies, dropping the overall footprint. In 3.3 (and maybe prior versions) this strategy does not work because of the following exception: {{java.lang.NoClassDefFoundError: com/amazonaws/thirdparty/apache/http/conn/socket/ConnectionSocketFactory}} {{ at org.apache.hadoop.fs.s3a.S3AUtils.initProtocolSettings(S3AUtils.java:1335)}} {{ at org.apache.hadoop.fs.s3a.S3AUtils.initConnectionSettings(S3AUtils.java:1290)}} {{ at org.apache.hadoop.fs.s3a.S3AUtils.createAwsConf(S3AUtils.java:1247)}} {{ at org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:61)}} {{ at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:644)}} {{ at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:390)}} {{ at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)}} {{ at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)}} {{ at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)}} {{ at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)}} {{ at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223829#comment-17223829 ] Ayush Saxena commented on HDFS-15659: - In general you can't even revert a released Jira, but if we find a way which appears more promising in later stages, we can change the old code in a new Jira(provided the folks who did earlier doesn't object) Here also, we can sort out things in one or may be two new Jiras rather than bothering the resolved jiras, I am absolutely fine, going this way. Just one thing we might need separate patches for the other branches, direct cherry-pick might not work(little more work for the contributor), and have a check that we don't land up breaking any test relying on this conf to be turned on. If in the end jenkins and everyone stays happy. We can happily go ahead with this. :) > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Priority: Major > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223800#comment-17223800 ] Ayush Saxena commented on HDFS-15643: - Though the build failed, but the test passed. :P [https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/testReport/org.apache.hadoop.hdfs/TestFileChecksum/] > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeIn
[jira] [Commented] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223766#comment-17223766 ] Ahmed Hussein commented on HDFS-15659: -- I agree. I believe I saw that flag in individual test cases (not just HDFS-9776 , and HDFS-15461 ). Instead of reverting Jiras, I suggest we file a new jira that will refactor the usage of that flag in all the unit tests. This will reduce the work for others because those jiras were already cherry-picked in 3.x branches. > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Priority: Major > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223766#comment-17223766 ] Ahmed Hussein edited comment on HDFS-15659 at 10/30/20, 5:01 PM: - I agree. I believe I saw that flag in individual test cases (not just HDFS-9776 , and HDFS-15461 ). [~aajisaka] Instead of reverting Jiras, I suggest we file a new jira that will refactor the usage of that flag in all the unit tests. This will reduce the work for others because those jiras were already cherry-picked in 3.x branches. was (Author: ahussein): I agree. I believe I saw that flag in individual test cases (not just HDFS-9776 , and HDFS-15461 ). Instead of reverting Jiras, I suggest we file a new jira that will refactor the usage of that flag in all the unit tests. This will reduce the work for others because those jiras were already cherry-picked in 3.x branches. > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Priority: Major > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223748#comment-17223748 ] Hadoop QA commented on HDFS-15643: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 56s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 15s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 48s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s{color} | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt] | {color:red} hadoop-hdfs in trunk failed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 10s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 30s{color} | [/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt] | {color:red} hadoop-hdfs in trunk failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 23s{color} | [/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt] | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 23s{color} | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 23s{color} | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/282/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red}
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223724#comment-17223724 ] Ahmed Hussein commented on HDFS-15643: -- Once, this fix gets merged into trunk, I will close the existing PRs I created. Then open a new one that refactors that Junit test and improve the cleaning inside "@After" (HDFS-15648) > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset
[jira] [Updated] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15237: - Priority: Blocker (was: Major) > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Blocker > Labels: pull-request-available > Fix For: 3.2.2 > > Attachments: HDFS-15237.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at > org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} > And Then I found some error in datanode like this > {code} > 2020-03-20 20:54:16,573 INFO > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: > SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = > false > 2020-03-20 20:54:16,577 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > bd-hadoop-128050.zeus.lianjia.com:9866:DataXceiver error processing > BLOCK_GROUP_CHECKSUM operation src: /10.201.1.38:33264 dst: > /10.200.128.50:9866 > java.lang.UnsupportedOperationException > at java.nio.ByteBuffer.array(ByteBuffer.java:994) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockChecksumReconstructor.reconstruct(StripedBlockChecksumReconstructor.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.recalculateChecksum(BlockChecksumHelper.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:489) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1047) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java
[jira] [Comment Edited] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223720#comment-17223720 ] Ahmed Hussein edited comment on HDFS-15643 at 10/30/20, 4:16 PM: - {quote}I tried the patch in https://github.com/apache/hadoop/pull/2421/files but it failed.{quote} I have a little bit bad feeling about that. I suspect there is something else somewhere that needs fixing. was (Author: ahussein): {quote}I tried the patch in https://github.com/apache/hadoop/pull/2421/files but it failed.{quote} I have a little bit bad feeling about that. I am suspicious there is something else somewhere that needs fixing. > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(Futu
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223720#comment-17223720 ] Ahmed Hussein commented on HDFS-15643: -- {quote}I tried the patch in https://github.com/apache/hadoop/pull/2421/files but it failed.{quote} I have a little bit bad feeling about that. I am suspicious there is something else somewhere that needs fixing. > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37
[jira] [Commented] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223698#comment-17223698 ] Brahma Reddy Battula commented on HDFS-15624: - {quote}Wanted to check HDFS-15660 too, but now my namenode isn't starting now because of this, and It will take time to sort it. But I am still convinced it is different, so no need to hold this, But still I will respect opinions on how to go ahead. {quote} I think, you can post in mailing list others also can try to help you sort out. {quote}Right now, PR is handling the backward compatibility related issues (due to change in StorageType order) and inclusion of new Storage policy, by bumping the LayoutVersion and adding check against NVDIMM releated operations to block during upgrade. {quote} I dn't think bumping the namelayout is best solution, need to check other way. ( may be like checking the client version during the upgrade.) {quote}HDFS-15660 will be handled soon enough to solve issues of both PROVIDED and NVDIMM in a generic way. {quote} Yes, I too prefer generic way. {quote}In any case, if not able to fix any of these by the release time (which I think we still have some time), then we can think of revert. {quote} with provided storage we've given couple of releases, there are alternatives to avoid this. > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 6h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15643: Attachment: HDFS-15643-01.patch > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41307,DS-3d824f14-3cd0-46b1-bef1-caa808bf278d,DISK], > > DatanodeInf
[jira] [Commented] (HDFS-15662) Complete the javadoc of hadoop-federation-balance.
[ https://issues.apache.org/jira/browse/HDFS-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223645#comment-17223645 ] Hadoop QA commented on HDFS-15662: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 44s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 45s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 41s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | [/results-checkstyle-hadoop-tools_hadoop-federation-balance.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/281/artifact/out/results-checkstyle-hadoop-tools_hadoop-federation-balance.txt] | {color:orange} hadoop-tools/hadoop-federation-balance: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 48s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | | {color:green} hadoop-tools_hadoop-federation-balance-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 generated 0 new + 0 unchanged - 39 fixed = 0 total (was 39) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | | {color:green} hadoop-tools_hadoop-federation-balance-jdkPrivateBuild-1.8.0_272-8u
[jira] [Updated] (HDFS-15662) Complete the javadoc of hadoop-federation-balance.
[ https://issues.apache.org/jira/browse/HDFS-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-15662: --- Description: Complete the javadoc of hadoop-federation-balance. Remove the unused variable in TestTrashProcedure.java was:Complete the javadoc of hadoop-federation-balance. > Complete the javadoc of hadoop-federation-balance. > -- > > Key: HDFS-15662 > URL: https://issues.apache.org/jira/browse/HDFS-15662 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-15662.001.patch > > > Complete the javadoc of hadoop-federation-balance. > Remove the unused variable in TestTrashProcedure.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15662) Complete the javadoc of hadoop-federation-balance.
[ https://issues.apache.org/jira/browse/HDFS-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun reassigned HDFS-15662: -- Assignee: Jinglun > Complete the javadoc of hadoop-federation-balance. > -- > > Key: HDFS-15662 > URL: https://issues.apache.org/jira/browse/HDFS-15662 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > > Complete the javadoc of hadoop-federation-balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15662) Complete the javadoc of hadoop-federation-balance.
[ https://issues.apache.org/jira/browse/HDFS-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-15662: --- Attachment: HDFS-15662.001.patch Status: Patch Available (was: Open) > Complete the javadoc of hadoop-federation-balance. > -- > > Key: HDFS-15662 > URL: https://issues.apache.org/jira/browse/HDFS-15662 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-15662.001.patch > > > Complete the javadoc of hadoop-federation-balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15662) Complete the javadoc of hadoop-federation-balance.
Jinglun created HDFS-15662: -- Summary: Complete the javadoc of hadoop-federation-balance. Key: HDFS-15662 URL: https://issues.apache.org/jira/browse/HDFS-15662 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jinglun Complete the javadoc of hadoop-federation-balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223584#comment-17223584 ] Toshihiko Uchida edited comment on HDFS-15240 at 10/30/20, 11:42 AM: - [~ferhui] The unit tests without the fix failed on trunk in my environment, as expected. Could you try [^test-HDFS-15240.006.patch], which is a patch containing only the unit tests of [^HDFS-15240.006.patch]? I uploaded the result and output: [^org.apache.hadoop.hdfs.TestReconstructStripedFile.txt] and [^org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt]. was (Author: touchida): @Hui Fei The unit tests without the fix failed on trunk in my environment, as expected. Could you try [^test-HDFS-15240.006.patch], which is a patch containing only the unit tests of [^HDFS-15240.006.patch]? I uploaded the result and output: [^org.apache.hadoop.hdfs.TestReconstructStripedFile.txt] and [^org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt]. > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, > HDFS-15240.006.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs
[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223584#comment-17223584 ] Toshihiko Uchida commented on HDFS-15240: - @Hui Fei The unit tests without the fix failed on trunk in my environment, as expected. Could you try [^test-HDFS-15240.006.patch], which is a patch containing only the unit tests of [^HDFS-15240.006.patch]? I uploaded the result and output: [^org.apache.hadoop.hdfs.TestReconstructStripedFile.txt] and [^org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt]. > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, > HDFS-15240.006.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolEx
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiko Uchida updated HDFS-15240: Attachment: org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, > HDFS-15240.006.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {c
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiko Uchida updated HDFS-15240: Attachment: org.apache.hadoop.hdfs.TestReconstructStripedFile.txt > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, > HDFS-15240.006.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {code:jav
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiko Uchida updated HDFS-15240: Attachment: test-HDFS-15240.006.patch > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, > HDFS-15240.006.patch, image-2020-07-16-15-56-38-608.png, > test-HDFS-15240.006.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {code:java} > return new StripingChunkReadResult(futures.remove(future), > StripingChunkReadResult.CANCELLED); {code} > futures.remove(future) cause NPE. So
[jira] [Updated] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15624: Labels: pull-request-available release-blocker (was: pull-request-available) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 6h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15624: Affects Version/s: 3.4.0 > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223575#comment-17223575 ] Ayush Saxena commented on HDFS-15643: - Thanx [~aajisaka], I made the change to reduce one copy operation as suggested by you here : https://github.com/apache/hadoop/pull/2424 > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4
[jira] [Updated] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15643: Summary: EC: Fix checksum computation in case of native encoders (was: TestFileChecksumCompositeCrc fails intermittently) > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWithStorage[127.0.0.1
[jira] [Commented] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223567#comment-17223567 ] Akira Ajisaka commented on HDFS-15643: -- [~touchida] told me HDFS-15237 is related to this issue. Link HDFS-15237 > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWithStorage[127.
[jira] [Work logged] (HDFS-15548) Allow configuring DISK/ARCHIVE storage types on same device mount
[ https://issues.apache.org/jira/browse/HDFS-15548?focusedWorklogId=506630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506630 ] ASF GitHub Bot logged work on HDFS-15548: - Author: ASF GitHub Bot Created on: 30/Oct/20 10:37 Start Date: 30/Oct/20 10:37 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2288: URL: https://github.com/apache/hadoop/pull/2288#issuecomment-719476635 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 28m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 54s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 21m 26s | | trunk passed | | +1 :green_heart: | compile | 4m 10s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 3m 48s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 1m 14s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 20s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 0s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 3m 5s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 5m 32s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 2s | | the patch passed | | +1 :green_heart: | compile | 4m 5s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | cc | 4m 5s | | the patch passed | | +1 :green_heart: | javac | 4m 5s | | the patch passed | | +1 :green_heart: | compile | 3m 40s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | cc | 3m 40s | | the patch passed | | +1 :green_heart: | javac | 3m 40s | | the patch passed | | -0 :warning: | checkstyle | 1m 5s | [/diff-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2288/11/artifact/out/diff-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 699 unchanged - 0 fixed = 700 total (was 699) | | +1 :green_heart: | mvnsite | 2m 1s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 14m 21s | | patch has no errors when building and testing our client artifacts. | | -1 :x: | javadoc | 0m 49s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2288/11/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. | | +1 :green_heart: | javadoc | 1m 55s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 5m 37s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 2m 20s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 97m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2288/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 242m 29s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestGetFileChecksum | | | hadoop.hdfs.TestFileChecksumComposi
[jira] [Commented] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit
[ https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223549#comment-17223549 ] Hadoop QA commented on HDFS-15651: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 21s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 2s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 18s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 11s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 8s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/278/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt] | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 52 unchanged - 0 fixed = 53 total (was 52) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 46s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | | {color:green} t
[jira] [Commented] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223534#comment-17223534 ] Akira Ajisaka commented on HDFS-15643: -- {quote}But might be we can go with the first approach, seems little safe, in case of EC if the index gets mixed up in any way, the data gets corrupted. {quote} Agreed. Let's refactor in a separate jira. > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[Da
[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15641: --- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~wanghongbing] and [~ferhui]. > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9776) TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky
[ https://issues.apache.org/jira/browse/HDFS-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-9776: -- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein]. > TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky > - > > Key: HDFS-9776 > URL: https://issues.apache.org/jira/browse/HDFS-9776 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Vinayakumar B >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: TestHAAppend.testMultipleAppendsDuringCatchupTailing.log > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Initial analysys of Recent test failure in > {{TestHAAppend#testMultipleAppendsDuringCatchupTailing}} > [here|https://builds.apache.org/job/PreCommit-HDFS-Build/14420/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestHAAppend/testMultipleAppendsDuringCatchupTailing/] > > has found that, if the Active NameNode goes down immediately after truncate > operation, but before BlockRecovery command sent to datanode, > Then this block will never be truncated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15461) TestDFSClientRetries#testGetFileChecksum fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15461: --- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein]. > TestDFSClientRetries#testGetFileChecksum fails intermittently > - > > Key: HDFS-15461 > URL: https://issues.apache.org/jira/browse/HDFS-15461 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: dfsclient, test >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {{TestDFSClientRetries.testGetFileChecksum}} fails intermittently on hadoop > trunk > {code:bash} > [INFO] Running org.apache.hadoop.hdfs.TestGetFileChecksum > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 10.491 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestGetFileChecksum > [ERROR] testGetFileChecksum(org.apache.hadoop.hdfs.TestGetFileChecksum) Time > elapsed: 4.248 s <<< ERROR! > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[127.0.0.1:52468,DS-e35b6720-8ac2-4e5e-98df-306985da6924,DISK], > > DatanodeInfoWithStorage[127.0.0.1:52472,DS-91ec34d5-3f0a-494e-aed6-b01fa0131d8a,DISK]], > > original=[DatanodeInfoWithStorage[127.0.0.1:52472,DS-91ec34d5-3f0a-494e-aed6-b01fa0131d8a,DISK], > > DatanodeInfoWithStorage[127.0.0.1:52468,DS-e35b6720-8ac2-4e5e-98df-306985da6924,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1304) > at > org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1372) > at > org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:719) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestGetFileChecksum.testGetFileChecksum » IO Failed to replace a > bad datanode ... > [INFO] > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 > [INFO] > [ERROR] There are test failures. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223521#comment-17223521 ] Ayush Saxena commented on HDFS-15643: - A simple change could be at L143 : {code:java} - outputData = Arrays.copyOf(targetBuffer.array(), remainingLen); + outputData = Arrays.copyOf(outputData, remainingLen); {code} So, the number should get reduced to 2 for native Encoders, for the non native it would be still 1 only. To reduce one more copy as well, can refactor the method something like this : {noformat} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockChecksumReconstructor.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockChecksumReconstructor.java index b2e64966a18..848d8dee654 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockChecksumReconstructor.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockChecksumReconstructor.java @@ -86,8 +86,7 @@ public void reconstruct() throws IOException { reconstructTargets(toReconstructLen); // step3: calculate checksum - checksumDataLen += checksumWithTargetOutput( - targetBuffer.array(), toReconstructLen); + checksumDataLen += checksumWithTargetOutput(toReconstructLen); updatePositionInBlock(toReconstructLen); requestedLen -= toReconstructLen; @@ -132,7 +131,7 @@ protected DataOutputBuffer getChecksumWriter() { return checksumWriter; } - private long checksumWithTargetOutput(byte[] outputData, int toReconstructLen) + private long checksumWithTargetOutput(int toReconstructLen) throws IOException { long checksumDataLength = 0; // Calculate partial block checksum. There are two cases. @@ -140,7 +139,7 @@ private long checksumWithTargetOutput(byte[] outputData, int toReconstructLen) // case-2) length of data bytes which is less than bytesPerCRC if (requestedLen <= toReconstructLen) { int remainingLen = Math.toIntExact(requestedLen); - outputData = Arrays.copyOf(targetBuffer.array(), remainingLen); + byte[] outputData = getBufferArray(targetBuffer, remainingLen); int partialLength = remainingLen % getChecksum().getBytesPerChecksum(); @@ -175,6 +174,7 @@ private long checksumWithTargetOutput(byte[] outputData, int toReconstructLen) // calculated checksum for the requested length, return checksum length. return checksumDataLength; } + byte[] outputData = getBufferArray(targetBuffer, targetBuffer.remaining()); getChecksum().calculateChunkedSums(outputData, 0, outputData.length, checksumBuf, 0); @@ -207,4 +207,18 @@ private void clearBuffers() { public long getChecksumDataLen() { return checksumDataLen; } -} + + public static byte[] getBufferArray(ByteBuffer targetBuffer, + int remainingLen) { + byte[] buff = new byte[remainingLen]; + if (targetBuffer.hasArray()) { + buff = targetBuffer.array(); + if (remainingLen != targetBuffer.remaining()) { + Arrays.copyOf(buff, remainingLen); + } + } else { + targetBuffer.slice().get(buff); + } + return buff;{noformat} But might be we can go with the first approach, seems little safe, in case of EC if the index gets mixed up in any way, the data gets corrupted. Let me know your thoughts, Will update accordingly > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > Dat
[jira] [Updated] (HDFS-15459) TestBlockTokenWithDFSStriped fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15459: --- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein]. > TestBlockTokenWithDFSStriped fails intermittently > - > > Key: HDFS-15459 > URL: https://issues.apache.org/jira/browse/HDFS-15459 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: test > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15459.001.patch, > TestBlockTokenWithDFSStriped.testRead.log > > > {{TestBlockTokenWithDFSStriped}} fails intermittently on trunk with a NPE. I > have intuition that this failure is caused by another Unit tests timing out. > {code:bash} > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 94.448 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped > [ERROR] > testRead(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped) > Time elapsed: 9.455 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.isBlockTokenExpired(TestBlockTokenWithDFS.java:633) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.isBlockTokenExpired(TestBlockTokenWithDFSStriped.java:139) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.doTestRead(TestBlockTokenWithDFS.java:508) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testRead(TestBlockTokenWithDFSStriped.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15644) Failed volumes can cause DNs to stop block reporting
[ https://issues.apache.org/jira/browse/HDFS-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15644: --- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein]. > Failed volumes can cause DNs to stop block reporting > > > Key: HDFS-15644 > URL: https://issues.apache.org/jira/browse/HDFS-15644 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement, datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: refactor > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15644-branch-2.10.002.patch, HDFS-15644.001.patch, > HDFS-15644.002.patch > > > [~daryn] found a corner case where remove failed volumes can cause a NPE in > [FsDataSetImpl.getBlockReports()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1939]. > +Scenario:+ > * Inside {{Datanode#HandleVolumeFailures()}}, removing a failed volume is a > 2-step process. > ** First it's removed from from the volumes list > ** Later in time are the replicas scrubbed from the volume map > * A concurrent thread generating blockReports may access the replicaMap > accessing a non existing VolumeID. > He made a fix for that and we have been using it on our clusters since > Hadoop-2.7. > By analyzing the code, the bug is still applicable to Trunk. > * The path Datanode#removeVolumes() is safe because the two step process in > {{FsDataImpl.removeVolumes()}} > [FsDatasetImpl.java#L577|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L577] > is protected by {{datasetWriteLock}} . > * The path Datanode#handleVolumeFailures() is not safe because the failed > volume is removed from the list without acquiring > {{datasetWriteLock}}.[FsVolumList#239|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L239] > The race condition can cause the caller of getBlockReports() to throw NPE if > the RUR is referring to a volume that has already been removed > [FsDatasetImpl.java#L1976|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1976]. > {code:java} > case RUR: > ReplicaInfo orig = b.getOriginalReplica(); > builders.get(volStorageID).add(orig); > break; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15622) Deleted blocks linger in the replications queue
[ https://issues.apache.org/jira/browse/HDFS-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15622: --- Fix Version/s: 3.2.2 cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein]. > Deleted blocks linger in the replications queue > --- > > Key: HDFS-15622 > URL: https://issues.apache.org/jira/browse/HDFS-15622 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15622.001.patch, HDFS-15622.002.patch > > > We had incident whereas after resolving a missing blocks incident by > restarting two dead nodes, there were still 8 missing, but the list was > empty. Metasave shows the 8 blocks are "orphaned" meaning the files were > already deleted. It is unclear why they were left in the replication queue. > * The containing node was flaky and started stoped multiple time. > * The block allocation didn't work well due to the cluster-level storage > space exhaustion. > * The NN was in safe mode. > Triggering a full block report from the node didn't have any effect. It will > clear up if a failover happens as the repl queue will be reinitialized. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15639) [JDK 11] Fix Javadoc errors in hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223512#comment-17223512 ] Xiaoqiao He commented on HDFS-15639: cherry-pick to branch-3.2.2 and verify at local, Thanks [~aajisaka] and [~tasanuma]. > [JDK 11] Fix Javadoc errors in hadoop-hdfs-client > - > > Key: HDFS-15639 > URL: https://issues.apache.org/jira/browse/HDFS-15639 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This is caused by HDFS-15567. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15639) [JDK 11] Fix Javadoc errors in hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15639: --- Fix Version/s: 3.2.2 > [JDK 11] Fix Javadoc errors in hadoop-hdfs-client > - > > Key: HDFS-15639 > URL: https://issues.apache.org/jira/browse/HDFS-15639 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This is caused by HDFS-15567. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15618: --- Fix Version/s: 3.2.2 > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, > HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223513#comment-17223513 ] Xiaoqiao He commented on HDFS-15618: cherry-pick to branch-3.2.2 and verify at local, Thanks [~ahussein] and [~kihwal]. > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, > HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks
[ https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15627: --- Fix Version/s: 3.2.2 > Audit log deletes before collecting blocks > -- > > Key: HDFS-15627 > URL: https://issues.apache.org/jira/browse/HDFS-15627 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15627.001.patch > > > Deletes currently collect blocks in the write lock, write the edit, > incrementally block delete, finally +audit log+. It should be collect blocks, > edit log, +audit log+, incremental delete. Once the edit is durable it's > consistent to audit log the delete. There is no sense in deferring the audit > into the indeterminate future. > The problem occurs when thereto server hung due to large deletes but it won't > be easy to identify the problem. That should have been easily identified as > the first delete logged after the hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15627) Audit log deletes before collecting blocks
[ https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223509#comment-17223509 ] Xiaoqiao He commented on HDFS-15627: commit this to branch-3.2.2 and verify it clean. > Audit log deletes before collecting blocks > -- > > Key: HDFS-15627 > URL: https://issues.apache.org/jira/browse/HDFS-15627 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15627.001.patch > > > Deletes currently collect blocks in the write lock, write the edit, > incrementally block delete, finally +audit log+. It should be collect blocks, > edit log, +audit log+, incremental delete. Once the edit is durable it's > consistent to audit log the delete. There is no sense in deferring the audit > into the indeterminate future. > The problem occurs when thereto server hung due to large deletes but it won't > be easy to identify the problem. That should have been easily identified as > the first delete logged after the hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reassigned HDFS-15641: -- Assignee: Hongbing Wang (was: Xiaoqiao He) > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reassigned HDFS-15641: -- Assignee: Xiaoqiao He (was: Hongbing Wang) > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Xiaoqiao He >Priority: Critical > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223482#comment-17223482 ] Akira Ajisaka commented on HDFS-15643: -- The patch copies the data in memory three times if \{{requestedLen <= toReconstructLen}}: * L90 -> L216: Copy from direct byte buffer from bytes[] via getBufferArray() * L143 -> L216: Copy from direct byte buffer from bytes[] via getBufferArray() * L143: Copy from bytes[] to bytes[] via Arrays.copyOf() The copy operations may be reduced. > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripe
[jira] [Commented] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223477#comment-17223477 ] Akira Ajisaka commented on HDFS-15643: -- I tried the above patch above and the test passed in my docker environment. Thank you [~ayushtkn] > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,DISK], > > DatanodeInfoWi
[jira] [Comment Edited] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223477#comment-17223477 ] Akira Ajisaka edited comment on HDFS-15643 at 10/30/20, 7:37 AM: - I tried the above patch and the test passed in my docker environment. Thank you [~ayushtkn] was (Author: ajisakaa): I tried the above patch above and the test passed in my docker environment. Thank you [~ayushtkn] > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_100
[jira] [Commented] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223460#comment-17223460 ] Akira Ajisaka commented on HDFS-15643: -- Thank you [~ayushtkn] for pointing out. I'll try https://issues.apache.org/jira/secure/attachment/13014377/Test-Fix-01.patch > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Blocker > Labels: pull-request-available > Attachments: Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1602771356932:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:42217,DS-6c29e4b7-e4f1-4302-ad23-fb078f37d783,