[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242483#comment-15242483 ] Kai Zheng commented on HDFS-7859: - bq. IIUC, persistence is more necessary when we supports custom schemas, isn't it? I thought you're right. For the builtin schema and policies, IIRC, there was a consideration that we still need to persist the schema and policy to indicate the software upgrades (so the builtin ones may be changed). bq. Do we have any plan to implement HDFS-7337? I thought many considerations originally targeted for the issue have already been implemented elsewhere, therefore the only thing left is custom codec and schema support. I don't think there is a strong requirement for this feature but we can implement it perhaps in phase II I guess. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242464#comment-15242464 ] Rakesh R commented on HDFS-7859: The proposed patch in this JIRA handles saving and loading the schema in fsimage/editlog. IIUC, persistence is more necessary when we supports custom schemas, isn't it?. I could see we are still discussing the ways to support HDFS-7337 and not yet reached a common agreement. Do we have any plan to implement HDFS-7337?. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10297) Increase default balance bandwidth and concurrent moves
John Zhuge created HDFS-10297: - Summary: Increase default balance bandwidth and concurrent moves Key: HDFS-10297 URL: https://issues.apache.org/jira/browse/HDFS-10297 Project: Hadoop HDFS Issue Type: Improvement Components: balancer & mover Affects Versions: 2.6.0 Reporter: John Zhuge Assignee: John Zhuge Priority: Minor Adjust the default values to better support the current level of customer host and network configurations. Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network. Increase the default for property {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and Balancer. The default number of DN receiver threads is 4096. The default number of balancer mover threads is 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242399#comment-15242399 ] Takashi Ohnishi commented on HDFS-10216: Thank you [~jingzhao] for committing ! Thank you [~jzhuge] for helpful reviewing !! > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9820: Description: A common use scenario (scenaio 1): # create snapshot sx in clusterX, # do some experiemnts in clusterX, which creates some files. # throw away the files changed and go back to sx. Another scenario (scenario 2) is, there is a production cluster and a backup cluster, we periodically sync up the data from production cluster to the backup cluster with distcp. The cluster in scenario 1 could be the backup cluster in scenario 2. For scenario 1: HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are some complexity and challenges. Before that jira is implemented, we count on distcp to copy from snapshot to the current state. However, the performance of this operation could be very bad because we have to go through all files even if we only changed a few files. For scenario 2: HDFS-7535 improved distcp performance by avoiding copying files that changed name since last backup. On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data from source to target cluster, by only copying changed files since last backup. The way it works is use snapshot diff to find out all files changed, and copy the changed files only. See https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ This jira is to propose a variation of HDFS-8828, to find out the files changed in target cluster since last snapshot sx, and copy these from snapshot sx of either the source or the target cluster, to restore target cluster's current state to sx. Specifically, If a file/dir is - renamed, rename it back - created in target cluster, delete it - modified, put it to the copy list - run distcp with the copy list, copy from the source cluster's corresponding snapshot This could be a new command line switch -rdiff in distcp. As a native restore feature, HDFS-4167 would still be ideal to have. However, HDFS-9820 would hopefully be easier to implement, before HDFS-4167 is in place. was: HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are some complexity and challenges. HDFS-7535 improved distcp performance by avoiding copying files that changed name since last backup. On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data from source to target cluster, by only copying changed files since last backup. The way it works is use snapshot diff to find out all files changed, and copy the changed files only. See https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ This jira is to propose a variation of HDFS-8828, to find out the files changed in target cluster since last snapshot sx, and copy these from the source target's same snapshot sx, to restore target cluster to sx. If a file/dir is - renamed, rename it back - created in target cluster, delete it - modified, put it to the copy list - run distcp with the copy list, copy from the source cluster's corresponding snapshot This could be a new command line switch -rdiff in distcp. HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 would hopefully be easier to implement. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > A common use scenario (scenaio 1): > # create snapshot sx in clusterX, > # do some experiemnts in clusterX, which creates some files. > # throw away the files changed and go back to sx. > Another scenario (scenario 2) is, there is a production cluster and a backup > cluster, we periodically sync up the data from production cluster to the > backup cluster with distcp. > The cluster in scenario 1 could be the backup cluster in scenario 2. > For scenario 1: > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. Before that jira is implemented, we count on > distcp to copy from snapshot to the current state. However, the performance > of this operation could be very bad because we have to go through all files > even if we only changed a few files. > For scenario 2: > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since
[jira] [Commented] (HDFS-10224) Implement an asynchronous DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242299#comment-15242299 ] Hadoop QA commented on HDFS-10224: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s {color} | {color:red} root: patch generated 10 new + 143 unchanged - 2 fixed = 153 total (was 145) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 49s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_77 with JDK v1.8.0_77 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 29s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 7s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 53s
[jira] [Commented] (HDFS-10258) Erasure Coding: support small cluster whose #DataNode < # (Blocks in a BlockGroup)
[ https://issues.apache.org/jira/browse/HDFS-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242287#comment-15242287 ] Li Bo commented on HDFS-10258: -- Thanks for Kai's idea, I will try to find a solution with the lowest cost. > Erasure Coding: support small cluster whose #DataNode < # (Blocks in a > BlockGroup) > -- > > Key: HDFS-10258 > URL: https://issues.apache.org/jira/browse/HDFS-10258 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > > Currently EC has not supported small clusters whose datanode number is > smaller than the block numbers in a block group. This sub task will solve > this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9940) Balancer should not use property name dfs.datanode.balance.max.concurrent.moves
[ https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-9940: - Summary: Balancer should not use property name dfs.datanode.balance.max.concurrent.moves (was: Balancer should not use property name ) > Balancer should not use property name > dfs.datanode.balance.max.concurrent.moves > --- > > Key: HDFS-9940 > URL: https://issues.apache.org/jira/browse/HDFS-9940 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > > It is very confusing for both Balancer and Datanode to use the same property > {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the > Balancer because the property has "datanode" in the name string. Many > customers forget to set the property for the Balancer. > Change the Balancer to use a new property > {{dfs.balancer.max.concurrent.moves}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9940) Balancer should not use property name
[ https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-9940: - Summary: Balancer should not use property name (was: Rename dfs.balancer.max.concurrent.moves to avoid confusion) > Balancer should not use property name > -- > > Key: HDFS-9940 > URL: https://issues.apache.org/jira/browse/HDFS-9940 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > > It is very confusing for both Balancer and Datanode to use the same property > {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the > Balancer because the property has "datanode" in the name string. Many > customers forget to set the property for the Balancer. > Change the Balancer to use a new property > {{dfs.balancer.max.concurrent.moves}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242208#comment-15242208 ] Mingliang Liu commented on HDFS-10284: -- I see no related failing tests. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242203#comment-15242203 ] Hadoop QA commented on HDFS-10284: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 164m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile | | | org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer | | |
[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242189#comment-15242189 ] Mingliang Liu commented on HDFS-10293: -- I don't see related failing tests. Still, I'm surprised that we have so many intermittently failing tests. I have not seen a pre-commit that passes all UT for a while. > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10293.000.patch > > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10289) Balancer configures DNs directly
[ https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242183#comment-15242183 ] John Zhuge commented on HDFS-10289: --- Split into 2 tasks: * HDFS-10294: Balancer configure DN properties still with NN APIs * HDFS-10295: Switch DN APIs and only config DNs involved in the balancing HDFS-10295 is a further enhancement so it is conceivable for a simpler HDFS-10294 to be accepted first. > Balancer configures DNs directly > > > Key: HDFS-10289 > URL: https://issues.apache.org/jira/browse/HDFS-10289 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Critical > > Balancer directly configures the 2 balance-related properties > (bandwidthPerSec and concurrentMoves) on the DNs involved. > Details: > * Before each balancing iteration, set the properties on all DNs involved in > the current iteration. > * The DN property changes will not survive restart. > * Balancer gets the property values from command line or its config file. > * Need new DN APIs to query and set the 2 properties. > * No need to edit the config file on each DN or run {{hdfs dfsadmin > -setBalancerBandwidth}} to configure every DN in the cluster. > Pros: > * Improve ease of use because all configurations are done at one place, the > balancer. We saw many customers often forgot to set concurrentMoves properly > since it is required on both DN and Balancer. > * Support new DNs added between iterations > * Handle DN restarts between iterations > * May be able to dynamically adjust the thresholds in different iterations. > Don't know how useful though. > Cons: > * New DN property API > * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin > -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by > admin. > Questions: > * Can we create {{BalancerConcurrentMovesCommand}} similar to > {{BalancerBandwidthCommand}}? Can Balancer use them directly without going > through NN? > One proposal to implement HDFS-7466 calls for an API to query DN properties. > DN Conf Servlet returns all config properties. It does not return individual > property and it does not return the value set by {{hdfs dfsadmin > -setBalancerBandwidth}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10295) Balancer configures DN properties directly on-demand
John Zhuge created HDFS-10295: - Summary: Balancer configures DN properties directly on-demand Key: HDFS-10295 URL: https://issues.apache.org/jira/browse/HDFS-10295 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: John Zhuge Assignee: John Zhuge This is a further enhancement to HDFS-10294. Instead of using NN APIs, use new DN APIs to query and set necessary properties on the DNs involved. Details: * Before each balancing iteration, set the properties on all DNs involved in the current iteration. * Need new DN APIs to query and set the balancing properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10296) FileContext.getDelegationTokens() fails to obtain KMS delegation token
Andreas Neumann created HDFS-10296: -- Summary: FileContext.getDelegationTokens() fails to obtain KMS delegation token Key: HDFS-10296 URL: https://issues.apache.org/jira/browse/HDFS-10296 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Environment: CDH 5.6 with a Java KMS Reporter: Andreas Neumann This little program demonstrates the problem: With FileSystem, we can get both the HDFS and the kms-dt token, whereas with FileContext, we can only obtain the HDFS delegation token. {code} public class SimpleTest { public static void main(String[] args) throws IOException { YarnConfiguration hConf = new YarnConfiguration(); String renewer = "renewer"; FileContext fc = FileContext.getFileContext(hConf); Listtokens = fc.getDelegationTokens(new Path("/"), renewer); for (Token token : tokens) { System.out.println("Token from FC: " + token); } FileSystem fs = FileSystem.get(hConf); for (Token token : fs.addDelegationTokens(renewer, new Credentials())) { System.out.println("Token from FS: " + token); } } } {code} Sample output (host/user name x'ed out): {noformat} Token from FC: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DELEGATION_TOKEN token 49 for xxx) Token from FS: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DELEGATION_TOKEN token 50 for xxx) Token from FS: Kind: kms-dt, Service: xx.xx.xx.xx:16000, Ident: 00 04 63 64 61 70 07 72 65 6e 65 77 65 72 00 8a 01 54 16 96 c2 95 8a 01 54 3a a3 46 95 0e 02 {noformat} Apparently FileContext does not return the KMS token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10294) Balancer configures DN properties
[ https://issues.apache.org/jira/browse/HDFS-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-10294: -- Priority: Major (was: Critical) > Balancer configures DN properties > - > > Key: HDFS-10294 > URL: https://issues.apache.org/jira/browse/HDFS-10294 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge > > Balancer configures the 2 balance-related properties (bandwidthPerSec and > concurrentMoves) using NN API {{get/setBalancerBandwidth}} and the new > {{get/setBalancerConcurrentMoves}}. > Details: > * Upon the start of the balancer, set the DN properties. > * Use NN API to query and set the 2 properties. There might be a slight delay > for the property changes to be propagated to all DNs. > * The DN property changes will not survive restart. > * Balancer gets the property values from command line or its config file. > * No need to edit the config file on each DN or run {{hdfs dfsadmin > -setBalancerBandwidth}} to configure every DN in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242159#comment-15242159 ] Hadoop QA commented on HDFS-10293: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 132m 15s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 111m 30s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 275m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | |
[jira] [Created] (HDFS-10294) Balancer configures DN properties
John Zhuge created HDFS-10294: - Summary: Balancer configures DN properties Key: HDFS-10294 URL: https://issues.apache.org/jira/browse/HDFS-10294 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: 2.6.0 Reporter: John Zhuge Assignee: John Zhuge Priority: Critical Balancer configures the 2 balance-related properties (bandwidthPerSec and concurrentMoves) using NN API {{get/setBalancerBandwidth}} and the new {{get/setBalancerConcurrentMoves}}. Details: * Upon the start of the balancer, set the DN properties. * Use NN API to query and set the 2 properties. There might be a slight delay for the property changes to be propagated to all DNs. * The DN property changes will not survive restart. * Balancer gets the property values from command line or its config file. * No need to edit the config file on each DN or run {{hdfs dfsadmin -setBalancerBandwidth}} to configure every DN in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242150#comment-15242150 ] Yongjun Zhang commented on HDFS-9820: - Hi [~jingzhao], Thanks for proposing offline discussion, I was thinking about the same:-) Just shared contact info. Because of the similarity to HDFS-7535/HDFS-8828, the change indeed can be small (I have tried). In latest patch, some changes tries to address the in-symmetric output (HDFS-10263) by always going with forward snapshot diff; some other changes are intended to reorg the code for better readability. For completeness' sake, if you could comment back to the comments I made in my prior update, it would be appreciated. Thanks. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242139#comment-15242139 ] Masatake Iwasaki commented on HDFS-9905: [~jojochuang], can you update the patch to address comments above? > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242127#comment-15242127 ] Masatake Iwasaki commented on HDFS-9905: bq. I don't know what would cause SocketTimeoutException to give a null message instead of the expected Read timed out. Though underlying implementation of [PlainSocketImpl|http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/34c594b52b73/src/solaris/native/java/net/PlainSocketImpl.c] and [SocketInputStream|http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/34c594b52b73/src/solaris/native/java/net/SocketInputStream.c] throws SocketTimeoutException with expected message, SocketTimeoutException without message could be thrown by {{SocksSocketImpl#remainingMillis}} before reaching to those code paths if connect timeout is set to very small value. I'm +1 on the fix of {{WebHdfsFileSystem#AbstractRunner#runWithRetry}} suggested by [~eepayne] in addition to 001. > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242112#comment-15242112 ] Jing Zhao commented on HDFS-9820: - [~yzhangal], I have a very good understanding about HDFS-10263 But I'm not sure if you understand my point about why this issue can by solved in a much easier way... Please let me know if you want an offline discussion. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart
[ https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242094#comment-15242094 ] Xiaoyu Yao commented on HDFS-10207: --- One additional comments on the patch: backoff reconfigure is added for both client/service/lifeline ports. This is not necessary as we never want to backoff rpc requests on service and lifeline port. We only need to support reconfigure backoff for client rpc port. > Support enable Hadoop IPC backoff without namenode restart > -- > > Key: HDFS-10207 > URL: https://issues.apache.org/jira/browse/HDFS-10207 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-10207-HDFS-9000.000.patch, > HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, > HDFS-10207-HDFS-9000.003.patch > > > It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a > namenode restart to protect namenode from being overloaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10224) Implement an asynchronous DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242061#comment-15242061 ] Xiaobing Zhou commented on HDFS-10224: -- v001 is posted. FutureX is renamed to AsyncX and unit tests are added. [~szetszwo]szetszwo] thank you for review. > Implement an asynchronous DistributedFileSystem > --- > > Key: HDFS-10224 > URL: https://issues.apache.org/jira/browse/HDFS-10224 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10224-HDFS-9924.000.patch, > HDFS-10224-HDFS-9924.001.patch, HDFS-10224-and-HADOOP-12909.000.patch > > > This is proposed to implement an asynchronous DistributedFileSystem based on > AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10224) Implement an asynchronous DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-10224: - Attachment: HDFS-10224-HDFS-9924.001.patch > Implement an asynchronous DistributedFileSystem > --- > > Key: HDFS-10224 > URL: https://issues.apache.org/jira/browse/HDFS-10224 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10224-HDFS-9924.000.patch, > HDFS-10224-HDFS-9924.001.patch, HDFS-10224-and-HADOOP-12909.000.patch > > > This is proposed to implement an asynchronous DistributedFileSystem based on > AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242030#comment-15242030 ] Yongjun Zhang commented on HDFS-9820: - Many thanks [~jingzhao]. Good discussion! 1. {quote} No. This is incorrect. We allow distcp -diff s1 .. "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example {quote} if some changes is made while we were running distcp or after, but before s2 is created, then the stuff copied is not exact the content of s2. Right? 2. {quote} This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between from and ".") to check. This check cannot be dropped as in your current patch. {quote} I certainly agree that we should do the checking. I emphasized the assumption I and II in my last comment. However, since the checking can only be done in the beginning of distcp, if some changes are made before s2 is created, they will be missed in the checking. So I think we need to document that no change should be made when we do this operation. 3. {quote} I mean "" or "." should never be used as the fromState in distcp -diff command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here. {quote} Then in the case HDFS-9820 tries to solve, are you suggesting to create a snapshot s2 first (for the sake of doing a check), before reverting it back to s1? The issue described in #2 above also applies. 4. {quote} This is also wrong. In command line "." is the alias of the current state. {quote} I saw distcp was using {{""}}, maybe we should change to stick to using {{"."}}. 5. {quote} For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263). prepareDiffList changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this. {quote} Renaming x in s1 to y in s2 means that x is the original name before the rename, as reported in snapShotDiff(s1, s2), where s1 is fromSS, s2 is toSS; When we look at the reversion, the rename operation become renaming y in s2 to x in s1, so y should be the original name before the rename. as I expect to see from the reports of in snapshotDiff(s2, s1), where s2 is fromSS, s1 is toSS. However, snapshotDiff(s2, s1) still uses the names in s1 as the original name (x in this case, I really expect it to be y), though It does change the order operands, comparing with snapshotDiff(s1,s2), This is the issue I reported in HDFS-10263. You can see some example there. Basically I expect snapshotDiff(fromSS, toSS) to use names in fromSS. In the reversion case, it's the "." state. This is the symmetry I was referring to. Does this explanation make sense? Thanks again! > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241943#comment-15241943 ] Jing Zhao commented on HDFS-9820: - bq. One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first No. This is incorrect. We allow {{distcp -diff s1 .}}. "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example. bq. We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case (assumption I) This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between {{from}} and ".") to check. This check cannot be dropped as in your current patch. bq. Do you mean if "" ever appear as one parameter of -diff I mean "" or "." should *never* be used as the {{fromState}} in {{distcp -diff}} command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here. bq. Because "" is just an alias of current state "snapshot" This is also wrong. In command line "." is the alias of the current state. bq. -diff is what I feel more intuitive bq. But if this is what you prefer, we can relax the order requirement, and let "" means revert operation. Would you please confirm? What I mean is: we should always use "-diff ", but instead of using ".", we should use "s2". No change is necessary on {{DistCpOptions}}. bq. bypass DistCpSync#prepareDiffList For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263). {{prepareDiffList}} changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241932#comment-15241932 ] Mingliang Liu commented on HDFS-10283: -- Failing tests are not related as the changes are only for test case {{o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently}}, which passed in the pre-commit run. I also ran it locally ~10 times and it was good. > o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending > fails intermittently > -- > > Key: HDFS-10283 > URL: https://issues.apache.org/jira/browse/HDFS-10283 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10283.000.patch > > > The test fails with exception as following: > {code} > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK], > > DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]], > > original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK], > > DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162) > at > org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232) > at > org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10284: - Attachment: HDFS-10284.001.patch > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241905#comment-15241905 ] Hudson commented on HDFS-10281: --- FAILURE: Integrated in Hadoop-trunk-Commit #9616 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9616/]) HDFS-10281. TestPendingCorruptDnMessages fails intermittently. (kihwal: rev b9c9d03591a49be31f3fbc738d01a31700bfdbc4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPendingCorruptDnMessages.java > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241897#comment-15241897 ] Mingliang Liu commented on HDFS-10284: -- I found the {{BlockManagerSafeMode$SafeModeMonitor#canLeave}} is not checking the {{namesystem#inTransitionToActive()}}, while it should. I think according to the fix of [HDFS-10192], we should add this check to prevent the {{smmthread}} from calling {{leaveSafeMode()}}. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241868#comment-15241868 ] Yongjun Zhang commented on HDFS-9820: - Thanks a lot [~jingzhao]! My thoughts to share: 1. {quote} Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target. {quote} This is HDFS-7535/HDFS-8828. One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first, find snapshot diff between s1 and s2, and apply this diff to target cluster, then finally create s2 on target cluster. We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case (*assumption I*). 2. Do you mean if {{""}} ever appear as one parameter of {{-diff}}, then it's a revert operation, otherwise it's forward operation? In theory, we could copy incremental changes from source cluster to destination cluster without creating a new snapshot (s2 in our example). Say, after s1 is made in source cluster, and s1 is sync-ed to target cluster, and s1 is also created in target cluster, we could interpret {{distcp -diff s1 "" source target}}. as to incrementally copy changes made after s1 in source cluster to target, right? Because {{""}} is just an alias of current state "snapshot", I personally feel it's more intuitive to count on the parameter order, and let ({{-diff s1 s2}} mean the forward change from s1 to s2, {{-diff s2 s1}} mean the revert change from s2 to s1. Say, assume a cluster is already at state s2, and we do {{-diff s1 s2}}, it would be a non-op; If we do {{-diff s2 s1}}, it means to go back to s1. In other words, {{-diff }} is what I feel more intuitive. But if this is what you prefer, we can relax the order requirement, and let {{""}} means revert operation. Would you please confirm? And would you please let me know whether my comment #1 in my previous reply makes sense to you? 3. Not quite follow what you meant by "bypass DistCpSync#prepareDiffList. ". Some more details would help. Many thanks. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241859#comment-15241859 ] Mingliang Liu commented on HDFS-10281: -- Thanks for the review and commit, [~kihwal]. > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10281: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2 and branch-2.8. Thanks for fixing this, [~liuml07]. > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241844#comment-15241844 ] Eric Payne commented on HDFS-9905: -- bq. java.net.SocksSocketImpl is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out". bq. Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals so that if the string doesn't match, it logs the exception. Thanks, [~iwasakims] and [~jojochuang] for your work on this issue. I don't know what would cause {{SocketTimeoutException}} to give a null message instead of the expected {{Read timed out}}. However, your point about the original stack trace being lost is a very good one: bq. the exception object was reinterpreted in the exception handling, so the original stack trace was lost. In {{WebHdfsFileSystem#AbstractRunner#runWithRetry}}, the code that recreates the exception with the node name should also propagate the stack trace: {code} ioe = ioe.getClass().getConstructor(String.class) .newInstance(node + ": " + ioe.getMessage()); {code} Should be: {code} IOException newIoe = ioe.getClass().getConstructor(String.class) .newInstance(node + ": " + ioe.getMessage()); newIoe.setStackTrace(ioe.getStackTrace()); ioe = newIoe; {code} I can open a separate JIRA for this if you want. > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241836#comment-15241836 ] Kihwal Lee commented on HDFS-10281: --- The test failures don't seem to be related, as this patch only touched one test which didn't fail here. + lgtm > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241815#comment-15241815 ] Mingliang Liu commented on HDFS-10292: -- A space between "block" and {{last}} will be better, but it's fine without it. > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241811#comment-15241811 ] Jing Zhao commented on HDFS-10293: -- +1 pending Jenkins. > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10293.000.patch > > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241808#comment-15241808 ] Hadoop QA commented on HDFS-10281: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 32s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 119m 55s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.hdfs.TestDFSClientRetries | | | org.apache.hadoop.hdfs.TestLeaseRecovery | | |
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241806#comment-15241806 ] Mingliang Liu commented on HDFS-10284: -- [~brahmareddy], would you review this for me? Thanks. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241802#comment-15241802 ] Mingliang Liu commented on HDFS-10281: -- Thanks for studying the pre-commit build issue. Let's wait for the test result. I ran it ~10 times locally and it was good. > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241781#comment-15241781 ] Kihwal Lee commented on HDFS-10281: --- https://builds.apache.org/job/PreCommit-HDFS-Build/15143/console It did try to build first time but failed. {noformat} error: pathspec 'trunk' did not match any file(s) known to git. ERROR: git checkout --force trunk is failing {noformat} This one seems to be working okay.. https://builds.apache.org/job/PreCommit-HDFS-Build/15166/console > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241778#comment-15241778 ] Hudson commented on HDFS-10292: --- FAILURE: Integrated in Hadoop-trunk-Commit #9615 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9615/]) HDFS-10292. Add block id when client got Unable to close file exception. (kihwal: rev 2c155afe2736a5571bbb3bdfb2fe6f9709227229) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10292: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed it to trunk, branch-2 and branch-2.8. Thanks for the patch, [~brahmareddy]. > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241761#comment-15241761 ] Kihwal Lee commented on HDFS-10292: --- +1 lgtm. I will commit it shortly. > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10293: - Attachment: HDFS-10293.000.patch The code is as following: {code} static int readAll(FSDataInputStream in, byte[] buf) throws IOException { int readLen = 0; int ret; while ((ret = in.read(buf, readLen, buf.length - readLen)) >= 0 && readLen <= buf.length) { readLen += ret; } return readLen; } {code} If the {{readLen}} equals to {{buf.length}}, then {{buf.length - readLen}} will be zero, and {{in.read()}} will simply returns zero without reading from the stream. This case, no exception will be thrown, and the code is stuck in the while-loop. One possible fix is to strict the condition as {{ret = in.read(buf, readLen, buf.length - readLen)) > 0 && readLen < buf.length}}. A probable better fix is to use the {{IOUtils.readFully()}}, which will throw an IOException if it reads premature EOF from inputStream, see the v0 patch. > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10293.000.patch > > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10293: - Status: Patch Available (was: Open) > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10293.000.patch > > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10293: - Issue Type: Sub-task (was: Bug) Parent: HDFS-8031 > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10293) StripedFileTestUtil#readAll flaky
Mingliang Liu created HDFS-10293: Summary: StripedFileTestUtil#readAll flaky Key: HDFS-10293 URL: https://issues.apache.org/jira/browse/HDFS-10293 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding, test Affects Versions: 3.0.0 Reporter: Mingliang Liu Assignee: Mingliang Liu The flaky test helper method cause several UT test failing intermittently. For example, the {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} timed out in a recent run (see [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), which can be easily reproduced locally. Debugging at the code, chances are that the helper method is stuck in an infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart
[ https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241716#comment-15241716 ] Xiaoyu Yao commented on HDFS-10207: --- [~xiaobingo], can you rebase the patch to the trunk as it won't apply now. Also check if testNameNodeGetReconfigurableProperties need to update with the new reconfigurable property? Thanks! > Support enable Hadoop IPC backoff without namenode restart > -- > > Key: HDFS-10207 > URL: https://issues.apache.org/jira/browse/HDFS-10207 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-10207-HDFS-9000.000.patch, > HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, > HDFS-10207-HDFS-9000.003.patch > > > It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a > namenode restart to protect namenode from being overloaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10263) Provide symmetric entries in reversed snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241707#comment-15241707 ] John Zhuge commented on HDFS-10263: --- Thanks [~yzhangal] for the great report. {{SnapshotDiffReport}} API should be stable and well-documented, not only for {{distcp -diff}}, but also for other polyglot applications, in order for it to be better adopted. It may be a good starting point to beef up {{TestSnapshotDiffReport}} with a complete set of behavior based unit tests. > Provide symmetric entries in reversed snapshot diff report > -- > > Key: HDFS-10263 > URL: https://issues.apache.org/jira/browse/HDFS-10263 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, snapshots >Reporter: Yongjun Zhang >Assignee: Jing Zhao > > Steps to reproduce: > 1. Take a snapshot s1 at: > {code} > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/bar > -rw-r--r-- 1 yzhang supergroup 1024 2016-04-05 14:48 /target/bar/f1 > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/foo > -rw-r--r-- 1 yzhang supergroup 1024 2016-04-05 14:48 /target/foo/f1 > {code} > 2. Make the following change: > {code} > private int changeData7(Path dir) throws Exception { > final Path foo = new Path(dir, "foo"); > final Path foo2 = new Path(dir, "foo2"); > final Path foo_f1 = new Path(foo, "f1"); > final Path foo2_f2 = new Path(foo2, "f2"); > final Path foo2_f1 = new Path(foo2, "f1"); > final Path foo_d1 = new Path(foo, "d1"); > final Path foo_d1_f3 = new Path(foo_d1, "f3"); > int numDeletedAndModified = 0; > dfs.rename(foo, foo2); > dfs.delete(foo2_f1, true); > > DFSTestUtil.createFile(dfs, foo_f1, BLOCK_SIZE, DATA_NUM, 0L); > DFSTestUtil.appendFile(dfs, foo_f1, (int) BLOCK_SIZE); > dfs.rename(foo_f1, foo2_f2); > numDeletedAndModified += 1; // "M ./foo" > DFSTestUtil.createFile(dfs, foo_d1_f3, BLOCK_SIZE, DATA_NUM, 0L); > return numDeletedAndModified; > } > {code} > that results in > {code} > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/bar > -rw-r--r-- 1 yzhang supergroup 1024 2016-04-05 14:48 /target/bar/f1 > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/foo > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/foo/d1 > -rw-r--r-- 1 yzhang supergroup 1024 2016-04-05 14:48 /target/foo/d1/f3 > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/foo2 > -rw-r--r-- 1 yzhang supergroup 2048 2016-04-05 14:48 /target/foo2/f2 > {code} > 3. take snapshot s2 here > 4. Do the following to revert the change done in step 2 > {code} > private int revertChangeData7(Path dir) throws Exception { > final Path foo = new Path(dir, "foo"); > final Path foo2 = new Path(dir, "foo2"); > final Path foo_f1 = new Path(foo, "f1"); > final Path foo2_f2 = new Path(foo2, "f2"); > final Path foo2_f1 = new Path(foo2, "f1"); > final Path foo_d1 = new Path(foo, "d1"); > final Path foo_d1_f3 = new Path(foo_d1, "f3"); > int numDeletedAndModified = 0; > > dfs.delete(foo_d1, true); > dfs.rename(foo2_f2, foo_f1); > > dfs.delete(foo, true); > > DFSTestUtil.createFile(dfs, foo2_f1, BLOCK_SIZE, DATA_NUM, 0L); > DFSTestUtil.appendFile(dfs, foo2_f1, (int) BLOCK_SIZE); > dfs.rename(foo2, foo); > > return numDeletedAndModified; > } > {code} > that get the following results: > {code} > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/bar > -rw-r--r-- 1 yzhang supergroup 1024 2016-04-05 14:48 /target/bar/f1 > drwxr-xr-x - yzhang supergroup 0 2016-04-05 14:48 /target/foo > -rw-r--r-- 1 yzhang supergroup 2048 2016-04-05 14:48 /target/foo/f1 > {code} > 4. Take snapshot s3 here. > Below is the different snapshots > {code} > s1-s2: Difference between snapshot s1 and snapshot s2 under directory /target: > M . > + ./foo > R ./foo -> ./foo2 > M ./foo > + ./foo/f2 > - ./foo/f1 > s2-s1: Difference between snapshot s2 and snapshot s1 under directory /target: > M . > - ./foo > R ./foo2 -> ./foo > M ./foo > - ./foo/f2 > + ./foo/f1 > s2-s3: Difference between snapshot s2 and snapshot s3 under directory /target: > M . > - ./foo > R ./foo2 -> ./foo > M ./foo2 > + ./foo2/f1 > - ./foo2/f2 > s3-s2: Difference between snapshot s3 and snapshot s2 under directory /target: > M . > + ./foo > R ./foo -> ./foo2 > M ./foo2 > - ./foo2/f1 > + ./foo2/f2 > {code} > The s2-s1 snapshot is supposed to be the same as s2-s3, because the change > from s2 to s3 is an exact reversion of the change from s1 to s2. We can see >
[jira] [Commented] (HDFS-7499) Add NFSv4 + Kerberos / client authentication support
[ https://issues.apache.org/jira/browse/HDFS-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241677#comment-15241677 ] John Zhuge commented on HDFS-7499: -- Wonder whether we can add an HDFS based layout type for [pNFS|http://tools.ietf.org/html/rfc5661#section-12], similar to [Object-Based Parallel NFS (pNFS) Operations|http://tools.ietf.org/html/rfc5664]. The storage protocol can be a C/C++ based {{DFSClient}}. > Add NFSv4 + Kerberos / client authentication support > > > Key: HDFS-7499 > URL: https://issues.apache.org/jira/browse/HDFS-7499 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.4.0 > Environment: HDP2.1 >Reporter: Hari Sekhon > > We have a requirement for secure file share access to HDFS on a kerberized > cluster. > This is spun off from HDFS-7488 where adding Kerberos to the front end client > was considered, I believe this would require NFSv4 support? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
[ https://issues.apache.org/jira/browse/HDFS-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241668#comment-15241668 ] Hudson commented on HDFS-10286: --- FAILURE: Integrated in Hadoop-trunk-Commit #9613 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9613/]) HDFS-10286. Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties. (xyao: rev 809226752dd109e16956038017dece16ada6ee0f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java > Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties > > > Key: HDFS-10286 > URL: https://issues.apache.org/jira/browse/HDFS-10286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10286.000.patch > > > HDFS-10209 introduced a new reconfigurable properties which requires an > update to the validation in > TestDFSAdmin#testNameNodeGetReconfigurableProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
[ https://issues.apache.org/jira/browse/HDFS-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10286: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Thanks [~xiaobingo] for the contribution. I committed the patch to trunk and branch-2. > Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties > > > Key: HDFS-10286 > URL: https://issues.apache.org/jira/browse/HDFS-10286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10286.000.patch > > > HDFS-10209 introduced a new reconfigurable properties which requires an > update to the validation in > TestDFSAdmin#testNameNodeGetReconfigurableProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241643#comment-15241643 ] Jing Zhao commented on HDFS-9820: - Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target. Let's say at this time you want to restore the target back to s1. Theoretically we only need to do "distcp -diff s2 s1", where s2 is the {{from}} snapshot and s1 is the {{to}} snapshot. Note there is no diff between the current states and s2 on the target. Only after verifying this can we continue the incremental dictcp. Because of the lack of HDFS-10263, we need to make slight changes when applying the reversed diff, i.e., to bypass {{DistCpSync#prepareDiffList}}. This requires the distcp tool to understand s2 is actually after s1, and we can call {{getListing}} against {{path-of-snapshottable-dir/.snapshot}} to achieve this. We should allow user to pass in two snapshots in any order. The only restriction here is the {{from}} snapshot must also exist in the target cluster and there is no difference between this snapshot and the current status in the target cluster. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10287) MiniDFSCluster should implement AutoCloseable
[ https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241634#comment-15241634 ] Mingliang Liu commented on HDFS-10287: -- This will be a good improvement and users of {{MiniDFSCluster}} will be grateful. If backward compatibility is a concern and {{close()}} is idempotent (seems true), implementing {{Closeable}} can be an alternative. > MiniDFSCluster should implement AutoCloseable > - > > Key: HDFS-10287 > URL: https://issues.apache.org/jira/browse/HDFS-10287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Trivial > > {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support > [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]. > It will make test code a little cleaner and more reliable. > Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be > backported to Hadoop version prior to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241630#comment-15241630 ] Hudson commented on HDFS-10216: --- FAILURE: Integrated in Hadoop-trunk-Commit #9612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9612/]) HDFS-10216. Distcp -diff throws exception when handling relative path. (jing9: rev 404f57f328b00a42ec8b952ad08cd7a80207c7f2) * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241629#comment-15241629 ] Hudson commented on HDFS-10280: --- FAILURE: Integrated in Hadoop-trunk-Commit #9612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9612/]) HDFS-10280. Document new dfsadmin command -evictWriters. Contributed by (kihwal: rev c970f1d00525e4273075cff7406dcbd71305abd5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10281: - Attachment: HDFS-10281.001.patch Thanks [~jnp] for reviewing. Re-upload the patch for triggering Jenkins. No changes in the patch. > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241612#comment-15241612 ] Masatake Iwasaki commented on HDFS-9905: ... s/TestWebHdfsTimeouts#runWithRetry/WebHdfsFileSystem#runWithRetry/ > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10280: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2 and branch-2.8. Thanks for reporting and fixing this, [~jojochuang]. > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241602#comment-15241602 ] Masatake Iwasaki commented on HDFS-9905: s/TestWebHdfsTimeouts#runWithRetry/WebHdfsTimeouts#runWithRetry/ > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241584#comment-15241584 ] Kihwal Lee commented on HDFS-10280: --- +1 lgtm > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241574#comment-15241574 ] Kihwal Lee commented on HDFS-9905: -- [~eepayne], do you by any chance have any idea about this test failures? > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10216: - Priority: Major (was: Critical) > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10216: - Resolution: Fixed Fix Version/s: 2.9.0 Target Version/s: (was: 2.8.0) Status: Resolved (was: Patch Available) +1. I've committed this to trunk and branch-2. Thanks for the fix, [~bwtakacy]. Thanks for reporting the issue and reviewing the patch, [~jzhuge]. > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10216: - Affects Version/s: (was: 2.6.0) 2.8.0 > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241569#comment-15241569 ] Masatake Iwasaki commented on HDFS-9905: {code} private static final int SHORT_SOCKET_TIMEOUT = 5; {code} I had to decrease the value of SHORT_SOCKET_TIMEOUT to reproduce the issue in a few tries on my environment. Maybe just increasing the value to 20 or 30 is enough to avoid the issue even on heavily loaded build servers. > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241567#comment-15241567 ] Mingliang Liu commented on HDFS-10284: -- Failing tests are not related. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241555#comment-15241555 ] Masatake Iwasaki commented on HDFS-9905: Hmm... the stacktrace printed by {{GenericTestUtils.assertExceptionContains}} does not show the root cause because {{TestWebHdfsTimeouts#runWithRetry}} recreate SocketTimeoutException to add host address to the message. I got following stack by commenting out the recreating exception part of {{TestWebHdfsTimeouts#runWithRetry}}. [java.net.SocksSocketImpl|http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/34c594b52b73/src/share/classes/java/net/SocksSocketImpl.java#l103] is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out". {noformat} java.lang.AssertionError: Expected to find ': Read timed out' but got unexpected exception:java.net.SocketTimeoutException at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:684) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:637) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:709) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1466) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:198) {noformat} > TestWebHdfsTimeouts fails occasionally > -- > > Key: HDFS-9905 > URL: https://issues.apache.org/jira/browse/HDFS-9905 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9905.001.patch > > > When checking for a timeout, it does get {{SocketTimeoutException}}, but the > message sometimes does not contain "connect timed out". Since the original > exception is not logged, we do not know details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241515#comment-15241515 ] Colin Patrick McCabe commented on HDFS-10282: - Thanks, [~kihwal]. > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.9.0 > > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241514#comment-15241514 ] Wei-Chiu Chuang commented on HDFS-10280: I had to violate checkstyle warning, because the existing code already violated, and I just indent the existing code. Unless we want to fix the existing code, but I see not much value in it. > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9349) Support reconfiguring fs.protected.directories without NN restart
[ https://issues.apache.org/jira/browse/HDFS-9349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241455#comment-15241455 ] Arpit Agarwal commented on HDFS-9349: - Look for commit ddfe677. > Support reconfiguring fs.protected.directories without NN restart > - > > Key: HDFS-9349 > URL: https://issues.apache.org/jira/browse/HDFS-9349 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-9349-HDFS-9000.003.patch, > HDFS-9349-HDFS-9000.004.patch, HDFS-9349-HDFS-9000.005.patch, > HDFS-9349-HDFS-9000.006.patch, HDFS-9349-HDFS-9000.007.patch, > HDFS-9349-HDFS-9000.008.patch, HDFS-9349.001.patch, HDFS-9349.002.patch > > > This is to reconfigure > {code} > fs.protected.directories > {code} > without restarting NN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10292: Affects Version/s: 2.7.2 > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241426#comment-15241426 ] Hadoop QA commented on HDFS-10280: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 228 unchanged - 0 fixed = 229 total (was 228) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 59s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 29s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 184m 53s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.fs.TestWebHdfsFileContextMainOperations | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | JDK
[jira] [Commented] (HDFS-10289) Balancer configures DNs directly
[ https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241396#comment-15241396 ] John Zhuge commented on HDFS-10289: --- Thanks [~anu], I will take a look. > Balancer configures DNs directly > > > Key: HDFS-10289 > URL: https://issues.apache.org/jira/browse/HDFS-10289 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Critical > > Balancer directly configures the 2 balance-related properties > (bandwidthPerSec and concurrentMoves) on the DNs involved. > Details: > * Before each balancing iteration, set the properties on all DNs involved in > the current iteration. > * The DN property changes will not survive restart. > * Balancer gets the property values from command line or its config file. > * Need new DN APIs to query and set the 2 properties. > * No need to edit the config file on each DN or run {{hdfs dfsadmin > -setBalancerBandwidth}} to configure every DN in the cluster. > Pros: > * Improve ease of use because all configurations are done at one place, the > balancer. We saw many customers often forgot to set concurrentMoves properly > since it is required on both DN and Balancer. > * Support new DNs added between iterations > * Handle DN restarts between iterations > * May be able to dynamically adjust the thresholds in different iterations. > Don't know how useful though. > Cons: > * New DN property API > * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin > -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by > admin. > Questions: > * Can we create {{BalancerConcurrentMovesCommand}} similar to > {{BalancerBandwidthCommand}}? Can Balancer use them directly without going > through NN? > One proposal to implement HDFS-7466 calls for an API to query DN properties. > DN Conf Servlet returns all config properties. It does not return individual > property and it does not return the value set by {{hdfs dfsadmin > -setBalancerBandwidth}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-10216: -- Hadoop Flags: Reviewed +1 LGTM > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241378#comment-15241378 ] Hadoop QA commented on HDFS-10292: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 39s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 24s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 46s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798748/HDFS-10292.patch | | JIRA Issue | HDFS-10292 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 349fbc280626 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-10289) Balancer configures DNs directly
[ https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241375#comment-15241375 ] Anu Engineer commented on HDFS-10289: - As part of diskBalancer work we have added an API that might be useful for you. There is a DN RPC in HDFS-1312 branch which allows you to query generic properties from DataNode. Please look at HDFS-9647 if you are interested. if you find this API to be useful for you, you are most welcome to use HDFS-1312 to develop this feature. Unfortunately the API is named getDiskBalancerSetting or DiskBalancerSettingRequestProto. You might want to rename that call to getDatanodeSetting or something to that effect to make it generic. > Balancer configures DNs directly > > > Key: HDFS-10289 > URL: https://issues.apache.org/jira/browse/HDFS-10289 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Critical > > Balancer directly configures the 2 balance-related properties > (bandwidthPerSec and concurrentMoves) on the DNs involved. > Details: > * Before each balancing iteration, set the properties on all DNs involved in > the current iteration. > * The DN property changes will not survive restart. > * Balancer gets the property values from command line or its config file. > * Need new DN APIs to query and set the 2 properties. > * No need to edit the config file on each DN or run {{hdfs dfsadmin > -setBalancerBandwidth}} to configure every DN in the cluster. > Pros: > * Improve ease of use because all configurations are done at one place, the > balancer. We saw many customers often forgot to set concurrentMoves properly > since it is required on both DN and Balancer. > * Support new DNs added between iterations > * Handle DN restarts between iterations > * May be able to dynamically adjust the thresholds in different iterations. > Don't know how useful though. > Cons: > * New DN property API > * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin > -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by > admin. > Questions: > * Can we create {{BalancerConcurrentMovesCommand}} similar to > {{BalancerBandwidthCommand}}? Can Balancer use them directly without going > through NN? > One proposal to implement HDFS-7466 calls for an API to query DN properties. > DN Conf Servlet returns all config properties. It does not return individual > property and it does not return the value set by {{hdfs dfsadmin > -setBalancerBandwidth}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241365#comment-15241365 ] Yongjun Zhang commented on HDFS-9820: - Hi [~jingzhao], Thanks a lot for your review and comments! Here is my reply in the same order of your questions, hope they make sense to you: # Without HDFS-10263 fix, internally I always use forward snapshot diff, and do transformation from there. Not sure if your first question implies you suggest we still use reversed diff that doesn't have HDFS-10263 fix, and translate the result to be symmetric as forward snapshot diff (same as what HDFS-10263 would have achieved). If so, because the result still need another (existing) transformation as we currently do, that would cause the complexity I referred to in HDFS-10263. # We now use {{-diff "" }} at command line to do the same behavior as {{-rdiff }} as in last patch rev. Due to lack of HDFS-10263, I swapped the source and target internally (and added the {{useRdiff}} flag to indicate the swapping), and always use forward snapshot diff. # Seems you mean we should allow user to pass snapshot names in any order, either {{-diff s1 s2}} or {{-diff s2 s1}}, and let the program to order s1 s2? What I was thinking was, we need to use the order user passed to indicate whether we are doing forward diff (HDFS-8828) or reverse diff (HDFS-9820). Thus {{-diff s1 s2}} and {{-diff s2 s1}} means different thing to me. I may have misunderstood you though. In addition, after HDFS-10263 is in place, we can make the implementation more symmetric (HDFS-8828 vs HDFS-9820). Thanks much. > Improve distcp to support efficient restore to an earlier snapshot > -- > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch > > > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from the > source target's same snapshot sx, to restore target cluster to sx. > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > HDFS-4167 would still be nice to have. It just seems to me that HDFS-9820 > would hopefully be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10292: Priority: Minor (was: Major) > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10292: Status: Patch Available (was: Open) > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10292: Attachment: HDFS-10292.patch Uploaded the patch, kindly review. > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10292) Add block id when client got Unable to close file exception
Brahma Reddy Battula created HDFS-10292: --- Summary: Add block id when client got Unable to close file exception Key: HDFS-10292 URL: https://issues.apache.org/jira/browse/HDFS-10292 Project: Hadoop HDFS Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Add block id when client got Unable to close file exception,, It's good to have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer reassigned HDFS-9271: - Assignee: James Clampffer (was: Aliaksei Sandryhaila) > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: James Clampffer > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner > * fsync -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241136#comment-15241136 ] Hudson commented on HDFS-10282: --- FAILURE: Integrated in Hadoop-trunk-Commit #9611 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9611/]) HDFS-10282. The VolumeScanner should warn about replica files which are (kihwal: rev 0d1c1152f1ce2706f92109bfbdff0d62e98e6797) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/FsDatasetTestUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/VolumeScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.9.0 > > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241110#comment-15241110 ] Kihwal Lee commented on HDFS-10280: --- Kicked the build manually. > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10282: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.9.0 > > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241100#comment-15241100 ] Kihwal Lee commented on HDFS-10282: --- Committed this to trunk and branch-2. > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.9.0 > > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241087#comment-15241087 ] Kihwal Lee commented on HDFS-10282: --- +1 > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241080#comment-15241080 ] Hadoop QA commented on HDFS-10216: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 16s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 39s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 4s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798718/HDFS-10216.4.patch | | JIRA Issue | HDFS-10216 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b748750ed614 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / df18b6e9 | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_77
[jira] [Commented] (HDFS-10258) Erasure Coding: support small cluster whose #DataNode < # (Blocks in a BlockGroup)
[ https://issues.apache.org/jira/browse/HDFS-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241069#comment-15241069 ] Kai Zheng commented on HDFS-10258: -- Thanks [~libo-intel] for the consideration. It's good to support a smaller cluster because it will allow someone to play with erasure coding with only a few nodes. However, I'm not sure if it's good to change much logic to allow #DataNode < # (Blocks in a BlockGroup) (if fortunately we don't have to, fine) because it may not make so much sense. Instead, we can consider to leverage the 3+2 schema and policy. > Erasure Coding: support small cluster whose #DataNode < # (Blocks in a > BlockGroup) > -- > > Key: HDFS-10258 > URL: https://issues.apache.org/jira/browse/HDFS-10258 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > > Currently EC has not supported small clusters whose datanode number is > smaller than the block numbers in a block group. This sub task will solve > this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing
[ https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241066#comment-15241066 ] Steve Loughran commented on HDFS-10291: --- 1. test is doing a short circuited read of a 13 byte file {{doTestShortCircuitRead(false, size= 13, readOffset= 0)}} 2. # creates 13 bytes of data, saves it: {{byte[] fileData = AppendTestUtil.randomBytes(seed, size)}} 3. dest buffer is created for this {code} byte[] actual = new byte[expected.length-readOffset]; {code} 4 does some small reads: {code} //Read a small number of bytes first. int nread = stm.read(actual, 0, 3); nread += stm.read(actual, nread, 2); //Read across chunk boundary nread += stm.read(actual, nread, 517);// ** HERE ** {code} The exception is being raised as the code is asking to read 517 bytes into a buffer 13 bytes long. This breaks IOStream's rules: you can't ask for more than you have space for. It says that clearly in the IOStream API spec; what was added in HADOOP-12994 was the checking of passing in too big a length or negative offsets. I think this a bug in the test. Whatever it is trying to do, it shouldn't be trying to do it on such a small buffer. What's interesting though is when you delve into the code: the block reader logic doesn't look at the length of the read at all. That is, it appears to fill up the entire byte array passed in, from the offset supplied, stopping at the end of the buffer or file, whichever comes first. Which is something that other code (i.e. production code) could be relying on. They shouldn't, as the code will break when working with any FS other than HDFS, but there is a risk that they might. What to do? # retain checks, fix test. # log at warning and shrink len parameter when passed down. People shouldn't be doing this, but HDFS will reluctantly let you. > TestShortCircuitLocalRead failing > - > > Key: HDFS-10291 > URL: https://issues.apache.org/jira/browse/HDFS-10291 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > > {{TestShortCircuitLocalRead}} failing as length of read is considered off end > of buffer. There's an off-by-one error somewhere in the test or the new > validation code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing
[ https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241038#comment-15241038 ] Steve Loughran commented on HDFS-10291: --- {code} java.lang.IndexOutOfBoundsException: Requested more bytes than destination buffer size at org.apache.hadoop.fs.FSInputStream.validatePositionedReadArgs(FSInputStream.java:107) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:975) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.checkFileContent(TestShortCircuitLocalRead.java:157) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.doTestShortCircuitReadImpl(TestShortCircuitLocalRead.java:286) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.doTestShortCircuitRead(TestShortCircuitLocalRead.java:241) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.testSmallFileLocalRead(TestShortCircuitLocalRead.java:308) {code} > TestShortCircuitLocalRead failing > - > > Key: HDFS-10291 > URL: https://issues.apache.org/jira/browse/HDFS-10291 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > > {{TestShortCircuitLocalRead}} failing as length of read is considered off end > of buffer. There's an off-by-one error somewhere in the test or the new > validation code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10291) TestShortCircuitLocalRead failing
Steve Loughran created HDFS-10291: - Summary: TestShortCircuitLocalRead failing Key: HDFS-10291 URL: https://issues.apache.org/jira/browse/HDFS-10291 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.8.0 Reporter: Steve Loughran Assignee: Steve Loughran {{TestShortCircuitLocalRead}} failing as length of read is considered off end of buffer. There's an off-by-one error somewhere in the test or the new validation code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takashi Ohnishi updated HDFS-10216: --- Status: Patch Available (was: Open) > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takashi Ohnishi updated HDFS-10216: --- Attachment: HDFS-10216.4.patch > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241012#comment-15241012 ] Takashi Ohnishi commented on HDFS-10216: {quote} * Line 709 is too long * Line 710-711 can be merged into 1 line: new DistCp(...).execute() {quote} All right. I have fixed them in the v4 patch. And, I have added a verification the result of distcp. > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takashi Ohnishi updated HDFS-10216: --- Status: Open (was: Patch Available) > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi >Priority: Critical > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240996#comment-15240996 ] Walter Su commented on HDFS-9412: - {{TestBalancer}} passes locally. +1 for the last patch. > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, > HDFS-9412.0002.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)