[jira] [Resolved] (HDFS-9173) Erasure Coding: Lease recovery for striped file
[ https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HDFS-9173. - Resolution: Fixed License issue was fixed by HDFS-9582. Closing. > Erasure Coding: Lease recovery for striped file > --- > > Key: HDFS-9173 > URL: https://issues.apache.org/jira/browse/HDFS-9173 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Walter Su >Assignee: Walter Su > Fix For: 3.0.0 > > Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, > HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, > HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch, > HDFS-9173.08.patch, HDFS-9173.09.patch, HDFS-9173.09.patch, HDFS-9173.10.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7866) Erasure coding: NameNode manages EC schemas
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HDFS-7866: - Attachment: HDFS-7866.6.patch The v6 patch adds {{TestErasureCodingPolicies::testMultiplePoliciesCoExist}} to test using different EC policies within a cluster. > Erasure coding: NameNode manages EC schemas > --- > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066038#comment-15066038 ] Hadoop QA commented on HDFS-9412: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 35s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 32, now 32). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 19s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 34, now 34). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 161, now 161). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 15s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 1s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 21s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hado
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages EC schemas
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066031#comment-15066031 ] Rui Li commented on HDFS-7866: -- Thanks Kai for your comments. bp. However, please don't set it as default policy or change the default policy at all. Yeah I didn't mean to change the default policy. The change in the patch is temporary to make the tests pick the new policy. bp. Would you share the details about making use of the remain bits in the replicaiton field? Sure. The HeaderFormat is a 64-bit long and its layout is {{[4-bit storagePolicyID][1-bit isStriped][11-bit replication][48-bit preferredBlockSize]}}. The patch doesn't change the layout -- it just uses the 11-bit replication to store the policy ID. Therefore it can support up to {{2^11-1=2047}} different policies. I think that'll be enough for all the combinations of codecs, cell sizes and layouts. bq. do we add/have tests to test multiple policies can work at the same time in a cluster Good point. I'll add a test to cover this. > Erasure coding: NameNode manages EC schemas > --- > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages EC schemas
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066024#comment-15066024 ] Kai Zheng commented on HDFS-7866: - Thanks Rui for the great work and moving this on! bq. Add a new system EC policy RS-3-2 and set it as the default policy. The 3-2 schema is nothing specific and just for testing. And we can revert to our current default policy (or make it configurable) when the patch is ready. It would be good to have the new built-in policy RS-3-2, considering it will help a lot in some testing, experimental trying of EC on a small cluster (5 nodes). [~zhz] how do you think of this? However, please don't set it as default policy or change the default policy at all. bq. EC policy can be retrieved either by name or by ID. The ID is stored in HEADER::replication in InodeFile, as suggested by the TODO comment. This sounds good. Would you share the details about making use of the remain bits in the replicaiton field? How many policies will it support in the way? How many polices would we have if we could estimate assuming 4+ codecs (XOR, RS, LRC, HitchHicker) and their derivations, 10+ cell sizes (64KB, 128KB, 256KB, ..., 32MB and so on), and the EC form (striping or contiguous). Consuming the whole remain bits for this purpose (not sure how you did) may be not a good idea, considering there might be some other usages. bq. Lots of modifications to the tests to make them work with multiple policies. This may better fit as a follow-on task. But to verify the patch I have to make the tests pass here. I thought fixing these tests is essential for this work and glad you have made it work. I don't have yet looked into the large patch yet, do we add/have tests to test multiple policies can work at the same time in a cluster? > Erasure coding: NameNode manages EC schemas > --- > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066009#comment-15066009 ] GAO Rui commented on HDFS-9494: --- [~rakeshr], thank you very much for your kind review and advice. Let's wait for [~szetszwo] and [~jingzhao]'s comment to see if it's OK to commit the latest patch :) > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9403) Erasure coding: some EC tests are missing timeout
[ https://issues.apache.org/jira/browse/HDFS-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066001#comment-15066001 ] GAO Rui commented on HDFS-9403: --- Hi [~zhz], I just found out that according to maven setting, currently we use Junit 4.11 in trunk branch. So, we could only create a org.junit.rules.Timeout instance by: {code} @Rule public Timeout globalTimeout = new Timeout(6); {code} But, to make the test code more human readable, we might prefer use {{Timeout.seconds(60)}}, but this is not supported until Junit 4.12( http://junit.org/apidocs/org/junit/rules/Timeout.html#seconds(long) ). In this kind of situation, should we using {{new Timeout(6)}} which has been marked as *Deprecated*, or we should try to shift the Junit version? > Erasure coding: some EC tests are missing timeout > - > > Key: HDFS-9403 > URL: https://issues.apache.org/jira/browse/HDFS-9403 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9403-origin-trunk.00.patch, > HDFS-9403-origin-trunk.01.patch > > > EC data writing pipeline is still being worked on, and bugs could introduce > program hang. We should add a timeout for all tests involving striped > writing. I see at least the following: > * {{TestErasureCodingPolicies}} > * {{TestFileStatusWithECPolicy}} > * {{TestDFSStripedOutputStream}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9403) Erasure coding: some EC tests are missing timeout
[ https://issues.apache.org/jira/browse/HDFS-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065993#comment-15065993 ] GAO Rui commented on HDFS-9403: --- Thank you very much for teach me about Junit Rule. I think generally we could set timeout rule for test classes as 1 min. And, for those test classes which including test method timeout longer than 1 min, we could set the timeout rule of those classes as long as the longest test method timeout. I will upload a new patch soon. > Erasure coding: some EC tests are missing timeout > - > > Key: HDFS-9403 > URL: https://issues.apache.org/jira/browse/HDFS-9403 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9403-origin-trunk.00.patch, > HDFS-9403-origin-trunk.01.patch > > > EC data writing pipeline is still being worked on, and bugs could introduce > program hang. We should add a timeout for all tests involving striped > writing. I see at least the following: > * {{TestErasureCodingPolicies}} > * {{TestFileStatusWithECPolicy}} > * {{TestDFSStripedOutputStream}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Tianyi updated HDFS-9412: Status: Patch Available (was: Open) > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-9412..patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9569: Attachment: HDFS-9569.004.patch > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch, HDFS-9569.004.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065965#comment-15065965 ] Yongjun Zhang commented on HDFS-9569: - HI Chris, I think your main concern at this point is not to log the trace stack twice, my main concern has been to ensure that the error issued shows both the trace stack and the file name. I think separating the error logging and exception throwing may not give clear cause-effect relationship. There are different reasons for failure to load fsimage. To be consistent, I think we can wrap the reason as a cause in an IOException. and let the code that catches the exception to find out the cause and deal with the cause accordingly. I'm uploading rev 4 that wraps cause in IOException and ensure the trace stack is printed only once. I wonder whether it will work for you. Thanks. > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9508) Fix NPE in MiniKMS.start()
[ https://issues.apache.org/jira/browse/HDFS-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9508: -- Assignee: (was: Wei-Chiu Chuang) > Fix NPE in MiniKMS.start() > -- > > Key: HDFS-9508 > URL: https://issues.apache.org/jira/browse/HDFS-9508 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang > Labels: supportability > > Sometimes, KMS resource file can not be loaded. When this happens, an > InputStream variable will be a null pointer which will subsequently throw NPE. > This is a supportability JIRA that makes the error message more explicit, and > explain why NPE is thrown. Ultimately, leads us to understand why the > resource files can not be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065941#comment-15065941 ] Hadoop QA commented on HDFS-9569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 42s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 16s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 16m 42s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 32, now 32). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 18m 32s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 34, now 34). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 18s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 182m 53s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 177m 33s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 6s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 431m 9s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.ser
[jira] [Updated] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9583: -- Description: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another should issue a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. was: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another will issue a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. > TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails > > > Key: HDFS-9583 > URL: https://issues.apache.org/jira/browse/HDFS-9583 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ > Looking at the code, the test expects that replacing a block from one data > node to another should issue a delete request to > FsDatasetAsyncDiskService.deleteAsync(), which should have print log > "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, > it never occurred. > I think the test needs a better way to determine if the delete request is > executed, rather than using a fix time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9583: -- Description: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another issues a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fixed time out. was: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another issues a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. > TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails > > > Key: HDFS-9583 > URL: https://issues.apache.org/jira/browse/HDFS-9583 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ > Looking at the code, the test expects that replacing a block from one data > node to another issues a delete request to > FsDatasetAsyncDiskService.deleteAsync(), which should have print log > "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, > it never occurred. > I think the test needs a better way to determine if the delete request is > executed, rather than using a fixed time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9583: -- Description: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another issues a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. was: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another should issue a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. > TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails > > > Key: HDFS-9583 > URL: https://issues.apache.org/jira/browse/HDFS-9583 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ > Looking at the code, the test expects that replacing a block from one data > node to another issues a delete request to > FsDatasetAsyncDiskService.deleteAsync(), which should have print log > "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, > it never occurred. > I think the test needs a better way to determine if the delete request is > executed, rather than using a fix time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
Wei-Chiu Chuang created HDFS-9583: - Summary: TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails Key: HDFS-9583 URL: https://issues.apache.org/jira/browse/HDFS-9583 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another will issue a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065902#comment-15065902 ] Chris Nauroth commented on HDFS-9569: - How about this? {code} } catch (IllegalReservedPathException e) { LOG.error("Failed to load image from " + imageFile); throw e; {code} Notice this doesn't pass {{e}} to {{LOG.error}}. That way, we'll get a log message with the file name, but it won't double-print the stack trace. > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065900#comment-15065900 ] Yongjun Zhang commented on HDFS-9569: - Hi [~cnauroth], Thanks again for your review and comments. About your suggested change (option 1) {code} } catch (IllegalReservedPathException e) { throw e; {code} The intention of my last change (option 2) was to include the fsimage file name in the exception, the above suggested change won't do it. That's why I did one error logging before rethrowing, so the logging has the imageFile name: {code} } catch (IllegalReservedPathException e) { LOG.error("Failed to load image from " + imageFile, e); throw e; {code} Another way to do it is (option 3): {code} } catch (IllegalReservedPathException e) { throw new IllegalReservedPathException("Failed to load image from " + imageFile + " (" + e.getMessage() +")", e); } ... {code} If we want to have the imageFile name in the exception message, which option would you prefer? Thanks. > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header and not well formatted
[ https://issues.apache.org/jira/browse/HDFS-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065898#comment-15065898 ] Hudson commented on HDFS-9582: -- FAILURE: Integrated in Hadoop-trunk-Commit #9006 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9006/]) HDFS-9582. TestLeaseRecoveryStriped file missing Apache License header (umamahesh: rev 52ad9125b8fbf1b4a92ef30969f0ec4c5f9d9852) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecoveryStriped.java > TestLeaseRecoveryStriped file missing Apache License header and not well > formatted > -- > > Key: HDFS-9582 > URL: https://issues.apache.org/jira/browse/HDFS-9582 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-9582-Trunk.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header and not well formatted
[ https://issues.apache.org/jira/browse/HDFS-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-9582: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk. Test failures unrelated to this change. > TestLeaseRecoveryStriped file missing Apache License header and not well > formatted > -- > > Key: HDFS-9582 > URL: https://issues.apache.org/jira/browse/HDFS-9582 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-9582-Trunk.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065871#comment-15065871 ] Chris Nauroth commented on HDFS-9569: - [~yzhangal], thank you for the update. This looks better. It still ends up logging the error and full stack trace twice: once within the {{loadFSImage}} loop, and once again when the exception gets rethrown, escapes to top level, and exits the process. Sorry to nit-pick, but I think eliminating the redundant logging will better help operators identify what they need to do next. I suggest catching and rethrowing {{IllegalReservedPathException}} without logging from within the loop: {code} ... } catch (IllegalReservedPathException e) { throw e; } catch (Exception e) { LOG.error("Failed to load image from " + imageFile, e); ... {code} Also, the test can be simplified to catch {{IllegalReservedPathException}} directly: {code} ... } catch (IllegalReservedPathException e) { GenericTestUtils.assertExceptionContains( "reserved path component in this version", e); } finally { ... {code} I'll be +1 after those changes and another Jenkins run. I'll also give [~kihwal] a chance to comment again before I commit anything, since he did the earlier review. > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065835#comment-15065835 ] Yongjun Zhang commented on HDFS-9569: - Hi [~cnauroth], Thanks for reviewing and good comments! Theoretically different fsimages could be corrupted different ways, so trying all and print error for individual fsimage might be ok. Printing the same error for each fsimage would not be a big problem IMO. But I agree that the reserved path exception tend to be the same for all fsimages to be tried, and we can keep the old control flow of throwing exception at the first fsimage tried. I uploaded rev 3 with your suggestion of introducing IllegalReservedPathException, which I think is a good idea. Would you please take a look? Thanks. > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9569: Attachment: HDFS-9569.003.patch > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9569: Attachment: (was: HDFS-9569.003.patch) > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9569) Log the name of the fsimage being loaded for better supportability
[ https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9569: Attachment: HDFS-9569.003.patch > Log the name of the fsimage being loaded for better supportability > -- > > Key: HDFS-9569 > URL: https://issues.apache.org/jira/browse/HDFS-9569 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Labels: supportability > Fix For: 2.7.3 > > Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, > HDFS-9569.003.patch > > > When NN starts to load fsimage, it does > {code} > void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery, > FSImageFile imageFile, StartupOption startupOption) throws IOException { > LOG.debug("Planning to load image :\n" + imageFile); > .. > long txId = loader.getLoadedImageTxId(); > LOG.info("Loaded image for txid " + txId + " from " + curFile); > {code} > A debug msg is issued at the beginning with the fsimage file name, then at > the end an info msg is issued after loading. > If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we > don't see the first msg. It'd be helpful to always be able to see from NN > logs what fsimage file it's loading. > Two improvements: > 1. Change the above debug to info > 2. If exception happens when loading fsimage, be sure to report the fsimage > name being loaded in the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header and not well formatted
[ https://issues.apache.org/jira/browse/HDFS-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065823#comment-15065823 ] Hadoop QA commented on HDFS-9582: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 55s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 7s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 22s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 186m 12s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 168m 47s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 426m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestBlockReaderLocal | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestLocalDFS | | | hadoop.hdfs.server.datanode.TestDataNodeVolu
[jira] [Updated] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.
[ https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9505: --- Attachment: HDFS-9505.002.patch I attached updated patch as 002. * made it clearer that moving files to trash is a feature of FS shell. * moved contents of "HDFS Trash Management" section to under "File Deletes and Undeletes" section. * moved part of contents about trash feature to FileSystem Shell guide. * removed description about WebDAV and added NFS gateway instead. > HDFS Architecture documentation needs to be refreshed. > -- > > Key: HDFS-9505 > URL: https://issues.apache.org/jira/browse/HDFS-9505 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch > > > The HDFS Architecture document is out of date with respect to the current > design of the system. > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html > There are multiple false statements and omissions of recent features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-9173) Erasure Coding: Lease recovery for striped file
[ https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reopened HDFS-9173: - > Erasure Coding: Lease recovery for striped file > --- > > Key: HDFS-9173 > URL: https://issues.apache.org/jira/browse/HDFS-9173 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Walter Su >Assignee: Walter Su > Fix For: 3.0.0 > > Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, > HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, > HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch, > HDFS-9173.08.patch, HDFS-9173.09.patch, HDFS-9173.09.patch, HDFS-9173.10.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file
[ https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065788#comment-15065788 ] Akira AJISAKA commented on HDFS-9173: - Hi [~zhz], I'm seeing ASF license warning on trunk. Would you please fix it? > Erasure Coding: Lease recovery for striped file > --- > > Key: HDFS-9173 > URL: https://issues.apache.org/jira/browse/HDFS-9173 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Walter Su >Assignee: Walter Su > Fix For: 3.0.0 > > Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, > HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, > HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch, > HDFS-9173.08.patch, HDFS-9173.09.patch, HDFS-9173.09.patch, HDFS-9173.10.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.
[ https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9580: -- Attachment: HDFS-9580.001.patch Rev01 patch. resolve race condition by ensuring the file is properly replicated before shutting down data nodes. > TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected > number of invalidate blocks. > -- > > Key: HDFS-9580 > URL: https://issues.apache.org/jira/browse/HDFS-9580 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9580.001.patch > > > The failure appeared in the trunk jenkins job. > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2646/ > {noformat} > Error Message > Expected invalidate blocks to be the number of DNs expected:<3> but was:<2> > Stacktrace > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > I think there could be a race condition between creating a file and shutting > down data nodes, which failed the test. > {noformat} > 2015-12-19 07:11:02,765 [PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=LAST_IN_PIPELINE, downstreams=0:[]] INFO datanode.DataNode > (BlockReceiver.java:run(1404)) - PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2015-12-19 07:11:02,768 [PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace > (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:45655, dest: > /127.0.0.1:54890, bytes: 134217728, op: HDFS_WRITE, cliID: > DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: > 6a13ec05-e1c1-4086-8a4d-d5a09636afcd, blockid: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: > 954174423 > 2015-12-19 07:11:02,768 [PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode > (BlockReceiver.java:run(1404)) - PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2015-12-19 07:11:02,772 [PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace > (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:33252, dest: > /127.0.0.1:54426, bytes: 134217728, op: HDFS_WRITE, cliID: > DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: > d81751db-02a9-48fe-b697-77623048784b, blockid: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: > 957463510 > 2015-12-19 07:11:02,772 [PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode > (BlockReceiver.java:run(1404)) - PacketResponder: > BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2015-12-19 07:11:02,782 [IPC Server handler 4 on 36404] INFO > blockmanagement.BlockManager > (BlockManager.java:checkBlocksProperlyReplicated(3871)) - BLOCK* > blk_1073741825_1001 is not COMPLETE (ucState = COMMITTED, replication# = 0 < > minimum = 1) in file /testRR > 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO > namenode.EditLogFileOutputStream > (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush > 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO > namenode.EditLogFileOutputStream > (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush > 2015-12-19 07:11:03,190 [IPC Server handler 8 on 36404] INFO > hdfs.StateChange (FSNamesystem.java:completeFile(2557)) - DIR* completeFile: > /testRR is closed by DFSClient_NONMAPREDUCE_147911011_935 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9575) Use byte array for internal block indices in a striped block
[ https://issues.apache.org/jira/browse/HDFS-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9575: -- Hadoop Flags: Reviewed +1 patch looks good. Thanks! > Use byte array for internal block indices in a striped block > > > Key: HDFS-9575 > URL: https://issues.apache.org/jira/browse/HDFS-9575 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9575.000.patch, HDFS-9575.001.patch > > > Currently we use inconsistent data types to specify the internal block > indices. E.g., we use short[] in {{BlockECRecoveryInfo}}, int[] in > {{LocatedStripedBlock}}, and byte[] in {{StripedBlockWithLocations}}. We will > use this jira to unify the data type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header and not well formatted
[ https://issues.apache.org/jira/browse/HDFS-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-9582: -- Status: Patch Available (was: Open) > TestLeaseRecoveryStriped file missing Apache License header and not well > formatted > -- > > Key: HDFS-9582 > URL: https://issues.apache.org/jira/browse/HDFS-9582 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Minor > Attachments: HDFS-9582-Trunk.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)