[jira] [Commented] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.
[ https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797508#comment-15797508 ] Akira Ajisaka commented on HDFS-11180: -- Filed HDFS-11290 for tracking this issue. > Intermittent deadlock in NameNode when failover happens. > > > Key: HDFS-11180 > URL: https://issues.apache.org/jira/browse/HDFS-11180 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Abhishek Modi >Assignee: Akira Ajisaka >Priority: Blocker > Labels: high-availability > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6 > > Attachments: HDFS-11180-branch-2.01.patch, > HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, > HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, > HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log > > > It is happening due to metrics getting updated at the same time when failover > is happening. Please find attached jstack at that point of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11290) TestFSNameSystemMBean should wait until the cache is cleared
Akira Ajisaka created HDFS-11290: Summary: TestFSNameSystemMBean should wait until the cache is cleared Key: HDFS-11290 URL: https://issues.apache.org/jira/browse/HDFS-11290 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.8.0 Reporter: Akira Ajisaka TestFSNamesystemMBean#testWithFSNamesystemWriteLock and #testWithFSEditLogLock get metrics after locking FSNameSystem/FSEditLog, but when the metrics are cached, the tests success even if the metrics acquire the locks. The tests should wait until the cache is cleared. This issue was reported by [~xkrogen] in HDFS-11180. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10675) Datanode support to read from external stores.
[ https://issues.apache.org/jira/browse/HDFS-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797485#comment-15797485 ] Jiajia Li commented on HDFS-10675: -- Hi, [~virajith], we are interested in this feature, the v3 patch is not based on latest trunk code, do you have time to update it? > Datanode support to read from external stores. > --- > > Key: HDFS-10675 > URL: https://issues.apache.org/jira/browse/HDFS-10675 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti > Attachments: HDFS-10675-HDFS-9806.001.patch, > HDFS-10675-HDFS-9806.002.patch, HDFS-10675-HDFS-9806.003.patch > > > This JIRA introduces a new {{PROVIDED}} {{StorageType}} to represent external > stores, along with enabling the Datanode to read from such stores using a > {{ProvidedReplica}} and a {{ProvidedVolume}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.
[ https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797465#comment-15797465 ] Akira Ajisaka commented on HDFS-11180: -- bq. I also found the reason that the test is currently succeeding. It appears that the JMX cache is being populated before the lock is taken on the FSEditLog, then once the lock is taken the metrics are able to be read because they are cached (and so the original method requiring synchronization is not used). I confirmed this using the logs available and also if you add a Thread.sleep(1) (equivalent to the default JMX cache TTL) at the start of the synchronization block in branch-2.7 the test will fail. I couldn't reproduce this, but agreed with you. Let's update the tests in a separate jira. > Intermittent deadlock in NameNode when failover happens. > > > Key: HDFS-11180 > URL: https://issues.apache.org/jira/browse/HDFS-11180 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Abhishek Modi >Assignee: Akira Ajisaka >Priority: Blocker > Labels: high-availability > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6 > > Attachments: HDFS-11180-branch-2.01.patch, > HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, > HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, > HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log > > > It is happening due to metrics getting updated at the same time when failover > is happening. Please find attached jstack at that point of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable
[ https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-11289: --- Status: Patch Available (was: Open) > [SPS]: Make SPS movement monitor timeouts configurable > -- > > Key: HDFS-11289 > URL: https://issues.apache.org/jira/browse/HDFS-11289 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-11289-HDFS-10285-00.patch > > > Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for > making it configurable. > {code} > // TODO: below selfRetryTimeout and checkTimeout can be configurable later > // Now, the default values of selfRetryTimeout and checkTimeout are 30mins > // and 5mins respectively > this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( > 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable
[ https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-11289: --- Attachment: HDFS-11289-HDFS-10285-00.patch Attached simple patch for making the mentioned values to be configurable. Please review > [SPS]: Make SPS movement monitor timeouts configurable > -- > > Key: HDFS-11289 > URL: https://issues.apache.org/jira/browse/HDFS-11289 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-11289-HDFS-10285-00.patch > > > Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for > making it configurable. > {code} > // TODO: below selfRetryTimeout and checkTimeout can be configurable later > // Now, the default values of selfRetryTimeout and checkTimeout are 30mins > // and 5mins respectively > this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( > 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11289) [SPS] Make SPS movement monitor timeouts configurable
[ https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-11289: --- Summary: [SPS] Make SPS movement monitor timeouts configurable (was: Make SPS movement monitor timeouts configurable) > [SPS] Make SPS movement monitor timeouts configurable > - > > Key: HDFS-11289 > URL: https://issues.apache.org/jira/browse/HDFS-11289 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > > Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for > making it configurable. > {code} > // TODO: below selfRetryTimeout and checkTimeout can be configurable later > // Now, the default values of selfRetryTimeout and checkTimeout are 30mins > // and 5mins respectively > this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( > 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable
[ https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-11289: --- Summary: [SPS]: Make SPS movement monitor timeouts configurable (was: [SPS] Make SPS movement monitor timeouts configurable) > [SPS]: Make SPS movement monitor timeouts configurable > -- > > Key: HDFS-11289 > URL: https://issues.apache.org/jira/browse/HDFS-11289 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > > Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for > making it configurable. > {code} > // TODO: below selfRetryTimeout and checkTimeout can be configurable later > // Now, the default values of selfRetryTimeout and checkTimeout are 30mins > // and 5mins respectively > this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( > 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6874) Add GET_BLOCK_LOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-6874: -- Description: GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: ... case GETFILEBLOCKLOCATIONS: { response = Response.status(Response.Status.BAD_REQUEST).build(); break; } was: GET_BLOCK_LOCATIONS operation is missing in HttpFS, which is already supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: ... case GETFILEBLOCKLOCATIONS: { response = Response.status(Response.Status.BAD_REQUEST).build(); break; } > Add GET_BLOCK_LOCATIONS operation to HttpFS > --- > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-6874: -- Summary: Add GETFILEBLOCKLOCATIONS operation to HttpFS (was: Add GET_BLOCK_LOCATIONS operation to HttpFS) > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11286) GETFILESTATUS, RENAME logic breaking due to incomplete path argument
[ https://issues.apache.org/jira/browse/HDFS-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sampada Dehankar updated HDFS-11286: Description: We use ADLS to store customer data and to access the data from our containers HttpFS Server-Client is used. HttpFS functions like GETFILESTATUS, RENAME expect absolute 'path' of the file(s) as the argument. But when the request is received at the server from HttpFs Client, the server is forwarding only the relative path rather than absolute path to ADLS. This is breaking the logic for GETFILESTATUS, RENAME functions. Steps to reproduce GETFILESTATUS Command Bug: Run the following command from the client: Example 1: hadoop fs –ls adl_scheme://account/folderA/folderB/ Server logs show only the relative path "folderA/folderB/" is forwarded to ADLS. Example 2: hadoop fs –ls adl_scheme://account/folderX/folderY/SampleFile Server logs show only the relative path "folderX/folderY/SampleFile" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the path. So the path in example 1 and example 2 would look like this 'adl_scheme://account/folderA/folderB/' and 'adl_scheme://account/folderX/folderY/SampleFile' respectively. Steps to reproduce RENAME Command Bug: Run the following command from the client: Example 1: Hadoop fs –mv /folderA/oldFileName /folderA/newFileName Server logs show only the relative old file path "folderA/oldFileName" and new File path "adl_scheme://account/folderA/newFileName" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the old file name path. was: We use ADLS to store customer data and to access the data from our containers HttpFS Server-Client is used. HttpFS functions like GETFILESTATUS, RENAME expect absolute 'path' of the file(s) as the argument. But when the request is received at the server from HttpFs Client, the server is forwarding only the relative path rather than absolute path to ADLS. This is breaking the logic for GETFILESTATUS, RENAME functions. Steps to reproduce GETFILESTATUS Command Bug: Run the following command from the client: Example 1: hadoop fs –ls adl_scheme://account/folderA/folderB/ Server logs show only the relative path "folderA/folderB/" is forwarded to ADLS. Example 2: hadoop fs –ls adl_scheme://account/folderX/folderY/SampleFile Server logs show only the relative path "folderX/folderY/SampleFile" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the path. So the path in example 1 and example 2 would look like this 'adl_scheme://account/folderA/folderB/' and 'adl_scheme://account/folderX/folderY/SampleFile' respectively. We have the fix ready and currently it is in the testing phase. Steps to reproduce RENAME Command Bug: Run the following command from the client: Example 1: Hadoop fs –mv /folderA/oldFileName /folderA/newFileName Server logs show only the relative old file path "folderA/oldFileName" and new File path "adl_scheme://account/folderA/newFileName" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the old file name path. We have the fix ready and currently it is in the testing phase. > GETFILESTATUS, RENAME logic breaking due to incomplete path argument > - > > Key: HDFS-11286 > URL: https://issues.apache.org/jira/browse/HDFS-11286 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs >Affects Versions: 2.7.1 > Environment: Windows >Reporter: Sampada Dehankar > Labels: bugfix, patch > > We use ADLS to store customer data and to access the data from our containers > HttpFS Server-Client is used. HttpFS functions like GETFILESTATUS, RENAME > expect absolute 'path' of the file(s) as the argument. But when the request > is received at the server from HttpFs Client, the server is forwarding only > the relative path rather than absolute path to ADLS. This is breaking the > logic for GETFILESTATUS, RENAME functions. > > Steps to reproduce GETFILESTATUS Command Bug: > > Run the following command from the client: > > Example 1: > hadoop fs –ls adl_scheme://account/folderA/folderB/ > Server logs show only the relative path "folderA/folderB/" is forwarded to > ADLS. > > Example 2: > hadoop fs –ls adl_scheme://account/folderX/folderY/SampleFile > Server logs show only the relative path "folderX/folderY/SampleFile" is > forwarded to ADLS. > > Fix: > Prepend the ADLS scheme and account name to the path. So the path in example > 1 and example 2 would look like this 'adl_scheme://account/folderA/folderB/' > and 'adl_scheme://account/folderX/folderY/SampleFile' respectively. > > Steps to reproduce RENAME Command Bug: > > Run the followi
[jira] [Commented] (HDFS-11286) GETFILESTATUS, RENAME logic breaking due to incomplete path argument
[ https://issues.apache.org/jira/browse/HDFS-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797272#comment-15797272 ] Sampada Dehankar commented on HDFS-11286: - We have the fix ready and currently it is in the testing phase. > GETFILESTATUS, RENAME logic breaking due to incomplete path argument > - > > Key: HDFS-11286 > URL: https://issues.apache.org/jira/browse/HDFS-11286 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs >Affects Versions: 2.7.1 > Environment: Windows >Reporter: Sampada Dehankar > Labels: bugfix, patch > > We use ADLS to store customer data and to access the data from our containers > HttpFS Server-Client is used. HttpFS functions like GETFILESTATUS, RENAME > expect absolute 'path' of the file(s) as the argument. But when the request > is received at the server from HttpFs Client, the server is forwarding only > the relative path rather than absolute path to ADLS. This is breaking the > logic for GETFILESTATUS, RENAME functions. > > Steps to reproduce GETFILESTATUS Command Bug: > > Run the following command from the client: > > Example 1: > hadoop fs –ls adl_scheme://account/folderA/folderB/ > Server logs show only the relative path "folderA/folderB/" is forwarded to > ADLS. > > Example 2: > hadoop fs –ls adl_scheme://account/folderX/folderY/SampleFile > Server logs show only the relative path "folderX/folderY/SampleFile" is > forwarded to ADLS. > > Fix: > Prepend the ADLS scheme and account name to the path. So the path in example > 1 and example 2 would look like this 'adl_scheme://account/folderA/folderB/' > and 'adl_scheme://account/folderX/folderY/SampleFile' respectively. > > Steps to reproduce RENAME Command Bug: > > Run the following command from the client: > > Example 1: > Hadoop fs –mv /folderA/oldFileName /folderA/newFileName > > Server logs show only the relative old file path "folderA/oldFileName" and > new File path "adl_scheme://account/folderA/newFileName" is forwarded to > ADLS. > > Fix: > > Prepend the ADLS scheme and account name to the old file name path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797163#comment-15797163 ] Weiwei Yang commented on HDFS-11156: Sure [~andrew.wang], I just uploaded the patch for branch-2. The failure was because branch-2 uses jdk 7 to build and trunk uses jdk 8 which allows a local class to access local variables and parameters of the enclosing block that are final or _effectively final_, see more doc [here|http://docs.oracle.com/javase/tutorial/java/javaOO/localclasses.html#accessing-members-of-an-enclosing-class]. Thank you > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156-branch-2.01.patch, HDFS-11156.01.patch, HDFS-11156.02.patch, > HDFS-11156.03.patch, HDFS-11156.04.patch, HDFS-11156.05.patch, > HDFS-11156.06.patch, HDFS-11156.07.patch, HDFS-11156.08.patch, > HDFS-11156.09.patch, HDFS-11156.10.patch, HDFS-11156.11.patch, > HDFS-11156.12.patch, HDFS-11156.13.patch, HDFS-11156.14.patch, > HDFS-11156.15.patch, HDFS-11156.16.patch, Output_JSON_format_v10.jpg, > SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11156: --- Attachment: HDFS-11156-branch-2.01.patch > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156-branch-2.01.patch, HDFS-11156.01.patch, HDFS-11156.02.patch, > HDFS-11156.03.patch, HDFS-11156.04.patch, HDFS-11156.05.patch, > HDFS-11156.06.patch, HDFS-11156.07.patch, HDFS-11156.08.patch, > HDFS-11156.09.patch, HDFS-11156.10.patch, HDFS-11156.11.patch, > HDFS-11156.12.patch, HDFS-11156.13.patch, HDFS-11156.14.patch, > HDFS-11156.15.patch, HDFS-11156.16.patch, Output_JSON_format_v10.jpg, > SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796984#comment-15796984 ] Hudson commented on HDFS-11279: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11069 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11069/]) HDFS-11279. Cleanup unused DataNode#checkDiskErrorAsync(). Contributed (xyao: rev 87bb1c49bb25f75b040028b1cebe3bc5251836d1) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/DatasetVolumeChecker.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/checker/TestDatasetVolumeChecker.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch, > HDFS-11279.002.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11279: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 Status: Resolved (was: Patch Available) Thanks [~hanishakoneru] for the contribution and all for the reviews. I've commit the v002 patch (with the . added at the end of the comments. for checkstyle issue in the last Jenkins) to the trunk. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch, > HDFS-11279.002.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796906#comment-15796906 ] Xiaoyu Yao commented on HDFS-11279: --- Thanks [~hanishakoneru] for updating the patch. Patch v002 looks good to me. I will commit it shortly. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch, > HDFS-11279.002.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
[ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796899#comment-15796899 ] Xiaoyu Yao commented on HDFS-11194: --- TestHeartbeatHandling.java Line 60: is the 300_000 a typo or special usage of timeout rule? {code} public Timeout testTimeout = new Timeout(300_000); {code} TestSlowPeerTracker.java Line 54: same as above. > Maintain aggregated peer performance metrics on NameNode > > > Key: HDFS-11194 > URL: https://issues.apache.org/jira/browse/HDFS-11194 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Arpit Agarwal > Attachments: HDFS-11194.01.patch, HDFS-11194.02.patch, > HDFS-11194.03.patch > > > The metrics collected in HDFS-10917 should be reported to and aggregated on > NameNode as part of heart beat messages. This will make is easy to expose it > through JMX to users who are interested in them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
[ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796898#comment-15796898 ] Yuanbo Liu commented on HDFS-11284: --- [~umamaheswararao] Thanks for your response. {code} When DN send movement result as failure, NN will take care to retry {code} Thanks for pointing out this, I'll go through those code and understand the retry mechanism. {code} Any way, go ahead with HDFS-11150 please. There were some test failure related to that, can you please check? {code} Yes, I'm working on this patch now. I also find another issue that makes me refactor part code of {{StoragePolicySatisfier#computeBlockMovingInfos}}. I'll elaborate it in the JIRA conversation. The major part of persistence code has been finished for a while, but the restart testing helps me find some issues that need to be addressed, so basically my work now is related to make test cases stable. > [SPS]: Avoid running SPS under safemode and fix issues in target node > choosing. > --- > > Key: HDFS-11284 > URL: https://issues.apache.org/jira/browse/HDFS-11284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: TestSatisfier.java > > > Recently I've found in some conditions, SPS is not stable: > * SPS runs under safe mode. > * There're some overlap nodes in the chosen target nodes. > * The real replication number of block doesn't match the replication factor. > For example, the real replication is 2 while the replication factor is 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796829#comment-15796829 ] Jing Zhao commented on HDFS-11273: -- Thanks for the patch, [~hkoneru]! The patch looks good to me. Just two nits: # We can use this chance to cleanup the imports of TransferFsImage and Util. # In Util.java, CONTENT_LENGTH, MD5_HEADER, and deleteTmpFiles do not need to be public. Besides we can add a little more details in the description to explain why moving the code is necessary. > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11273.000.patch > > > TransferFsImage#doGetUrl function is required for JournalNode syncing as > well. We can move the code to a Utility class to avoid duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796824#comment-15796824 ] Gang Xie commented on HDFS-7784: About the GC activities, the following is the gstat output. And it do caused some long-time GC. But comparing it to the one used in full block report, it looks OK. jstat -gcutil 10885 5000 1000 S0 S1 E O P YGC YGCTFGCFGCT GCT 0.00 100.00 67.94 89.32 69.63313 188.870 33.130 192.000 0.00 100.00 67.94 89.32 69.63313 188.870 33.130 192.000 0.00 100.00 67.95 89.32 69.63313 188.870 33.130 192.000 0.00 100.00 81.32 89.32 70.61313 188.870 33.130 192.000 100.00 0.00 19.44 89.68 70.62314 192.495 33.130 195.626 0.00 64.43 60.41 90.04 70.62315 192.938 33.130 196.068 56.75 7.26 100.00 90.27 70.62317 193.167 33.130 196.297 2.27 0.00 43.16 90.38 70.62318 193.653 33.130 196.783 0.00 0.68 91.15 90.38 70.62319 193.729 33.130 196.859 0.00 0.05 38.53 90.38 70.62321 193.875 33.130 197.005 0.01 0.00 82.04 90.38 70.62322 193.951 33.130 197.081 0.00 0.00 19.95 90.38 70.62324 194.084 33.130 197.214 0.00 0.00 0.00 90.38 70.62326 194.235 43.130 197.365 0.00 0.00 98.27 90.33 70.62326 194.235 45.240 199.475 0.00 0.00 40.11 90.27 70.62328 194.372 45.240 199.612 0.00 0.00 90.25 90.20 70.62329 194.449 45.240 199.689 0.00 0.00 30.08 90.13 70.62331 194.605 45.240 199.845 0.00 0.00 74.21 90.05 70.62332 194.676 45.240 199.916 0.00 0.00 14.04 89.95 70.62334 194.819 45.240 200.059 0.00 0.00 62.17 89.85 70.62335 194.894 45.240 200.134 0.00 0.00 4.01 89.79 70.62337 195.042 45.240 200.282 0.00 0.00 48.13 89.74 60.00338 195.116 45.240 200.356 0.00 0.00 80.22 89.74 60.00339 195.192 55.241 200.433 0.00 0.00 4.01 89.74 60.00341 195.349 55.241 200.590 0.00 0.00 24.07 89.74 60.00342 195.423 55.241 200.664 0.00 0.00 50.14 89.74 60.00343 195.498 55.241 200.739 0.00 0.00 96.27 89.74 60.00344 195.571 55.241 200.813 0.00 0.00 38.11 89.74 60.00346 195.708 55.241 200.949 0.00 0.00 86.24 89.74 60.00347 195.785 55.241 201.026 Total time for which application threads were stopped: 1.6167710 seconds Total time for which application threads were stopped: 9.6578530 seconds Total time for which application threads were stopped: 1.0820690 seconds Total time for which application threads were stopped: 1.1189530 seconds Total time for which application threads were stopped: 1.2096840 seconds Total time for which application threads were stopped: 8.6128080 seconds Total time for which application threads were stopped: 7.5763860 seconds Total time for which application threads were stopped: 2.1393520 seconds Total time for which application threads were stopped: 1.9607400 seconds Total time for which application threads were stopped: 3.0785030 seconds Total time for which application threads were stopped: 2.7774960 seconds Total time for which application threads were stopped: 4.5180250 seconds Total time for which application threads were stopped: 1.9637590 seconds Total time for which application threads were stopped: 1.8422970 seconds Total time for which application threads were stopped: 1.9868880 seconds Total time for which application threads were stopped: 2.2927440 seconds Total time for which application threads were stopped: 2.7141160 seconds Total time for which application threads were stopped: 2.9030460 seconds Total time for which application threads were stopped: 5.2282350 seconds Total time for which application threads were stopped: 3.6261510 seconds Total time for which application threads were stopped: 2.1100760 seconds > load fsimage in parallel > > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of
[jira] [Commented] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796809#comment-15796809 ] Gang Xie commented on HDFS-7784: The hardware info: CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with 24 Cores Mem: cat /proc/meminfo MemTotal: 131749888 kB MemFree: 9390596 kB Buffers: 171080 kB Cached: 23657816 kB SwapCached:0 kB Active: 119711620 kB Inactive: 381236 kB Active(anon): 96186924 kB Inactive(anon):81452 kB Active(file): 23524696 kB Inactive(file): 299784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 108 kB Writeback: 0 kB AnonPages: 96264056 kB Mapped:26604 kB Shmem: 4412 kB Slab: 728272 kB SReclaimable: 673344 kB SUnreclaim:54928 kB KernelStack:5392 kB PageTables: 192256 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:65874944 kB Committed_AS: 107921484 kB VmallocTotal: 34359738367 kB VmallocUsed: 488704 kB VmallocChunk: 34289747040 kB HardwareCorrupted: 4 kB AnonHugePages: 90095616 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:8192 kB DirectMap2M: 2015232 kB DirectMap1G:132120576 kB And it's hdd. > load fsimage in parallel > > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796804#comment-15796804 ] Kai Zheng commented on HDFS-7784: - OOO today for customer visit, please expect delayed response. Thanks. > load fsimage in parallel > > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796797#comment-15796797 ] Gang Xie commented on HDFS-7784: The JVM setting: -Xmx102400m -Xms102400m -Xmn5508m -XX:MaxDirectMemorySize=3686m -XX:MaxPermSize=1024m -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:SurvivorRatio=6 -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 -XX:TargetSurvivorRatio=80 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m -XX:CMSWaitDuration=8000 -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=16 -XX:ParallelGCThreads=16 -XX:+CMSConcurrentMTEnabled -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking -XX:MaxTenuringThreshold=3 -XX:+ParallelRefProcEnabled -XX:-OmitStackTraceInFastThrow > load fsimage in parallel > > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796738#comment-15796738 ] Brahma Reddy Battula commented on HDFS-11280: - As changes are only in hdfs-client ,Tests did not run on hadoop-hdfs project.hence jenkins did not catch here.. Actually I suggested earlier,atleast we should run parent project( cd to parent project) testcases..but it did not happen.. {noformat} unit test pre-reqs: cd /testptch/hadoop/hadoop-common-project/hadoop-common mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-patch-0 install -DskipTests -Pnative -Drequire.libwebhdfs -Drequire.snappy -Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop -Pyarn-ui > /testptch/hadoop/patchprocess/maven-unit-prereq-hadoop-common-project_hadoop-common-install.txt 2>&1 cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-patch-0 -Ptest-patch -Pparallel-tests -P!shelltest -Pnative -Drequire.libwebhdfs -Drequire.snappy -Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/hadoop/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt 2>&1 Elapsed: 1m 1s hadoop-hdfs-client in the patch passed. {noformat} > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11218) Add option to skip open files during HDFS Snapshots
[ https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796734#comment-15796734 ] Manoj Govindassamy commented on HDFS-11218: --- We are working on design proposals. One of the proposals that I am pondering on is to tweak the snapshot record and make it believe the open and being written files never got created when the snapshot was taken with skipOpenFiles option. But, this brings in more problems for File Append, File truncate cases and for all previous snapshots. The other proposal is to add extra information to Snapshot records for the open files which can be used later when run along with SnapshotDiff command (refer: HDFS-11220). Once the full proposals are ready, will share it here for public review and then work on patch based on review comments. > Add option to skip open files during HDFS Snapshots > --- > > Key: HDFS-11218 > URL: https://issues.apache.org/jira/browse/HDFS-11218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > *Problem:* > When there are files being written and when HDFS Snapshots are taken in > parallel, Snapshots do capture all these files, but these being written > files in Snapshots do not have the point-in-time file length captured. > At the time of File close or any other meta data modification operation on > that file which was previously open, HDFS reconciles the file length and > records the modification in the last taken Snapshot. All the previously taken > Snapshots continue to have the same open File with no modification recorded. > So, all those previous snapshots end up using the final modification record > in the next available snapshot. > *Proposal:* > HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M > is the number file modifications. So, it would very expensive to record > modifications for all the open files in all the snapshots. For applications > that do not want to capture incomplete / partial being written binary files > in the snapshots, it would be preferable to have an extra option to skip open > files. This way, they don't have to worry about restoring inconsistent files > from the snapshots. > {noformat} > hdfs dfs -createSnapshot -skipOpenFiles > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-11273: - Issue Type: Improvement (was: Bug) > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11273.000.patch > > > TransferFsImage#doGetUrl function is required for JournalNode syncing as > well. We can move the code to a Utility class to avoid duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-11273: - Component/s: (was: hdfs) > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11273.000.patch > > > TransferFsImage#doGetUrl function is required for JournalNode syncing as > well. We can move the code to a Utility class to avoid duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
[ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796663#comment-15796663 ] Xiaoyu Yao commented on HDFS-11194: --- Thanks [~arpitagarwal] for working on this and all for the discussion. I have the following comments on the production side changes. Still reviewing the unit test changes and will post my comments on that soon. 1. BlockReceiver.java NIT: Line 848: "&& mirrorAddr != null" can be removed. Line 849: can be simplified with "peerMetrics.addSendPacketDownstream" 2. BPServiceActor.java Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport(). 3. DatanodeManager.java Line 52-53: NIT: avoid import * import org.apache.hadoop.util.*; import org.apache.hadoop.util.Timer; Line 180. The comments seems incomplete. Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled is true. Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers() multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with parameterized logging? Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has registered? 4. DataNodePeerMetrics.java Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map in Line 137. 5. DatanodeProtocol.proto Line 398-405. This is a very good document. Can we add a field indicating the DN aggregate mechanism? This way the NN can enforce consistent aggregation across all the datanodes. This can be done in a separate ticket. 6. DFSConfigKeys.java Line 677: document for dfs.datanode.slow.peers.report.interval? We can open separate ticket for it. 7. RollingAverage.java Great catch on some missing synchronized on rollOverAvgs. NIT: Line 264: missing @param for minSamples 8. SlowNodeDetector.java Line 99-108: We can make this an interface to allow different aggregation methods (median, 90th percentile) for outlier detection. This can be done in a separate ticket. We can also use Median/Percentile class from apache common to implement different aggregation. Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the implicit sorted.toString() overhead. 9. SlowPeerReports.java Line 44: NIT: typo consistenly -> consistently Line 144: NIT: the document needs to update to match the code which returns a map -> sortedset of string. Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in a separate ticket. > Maintain aggregated peer performance metrics on NameNode > > > Key: HDFS-11194 > URL: https://issues.apache.org/jira/browse/HDFS-11194 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Arpit Agarwal > Attachments: HDFS-11194.01.patch, HDFS-11194.02.patch, > HDFS-11194.03.patch > > > The metrics collected in HDFS-10917 should be reported to and aggregated on > NameNode as part of heart beat messages. This will make is easy to expose it > through JMX to users who are interested in them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11287) Storage class member storageDirs should be private to avoid unprotected access by derived classes
[ https://issues.apache.org/jira/browse/HDFS-11287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796624#comment-15796624 ] Hadoop QA commented on HDFS-11287: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 138 unchanged - 1 fixed = 138 total (was 139) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 22s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11287 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845442/HDFS-11287.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux bef23c53f950 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ebdd2e0 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18012/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18012/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Storage class member storageDirs should be private to avoid unprotected > access by derived classes > - > > Key: HDFS-11287 > URL: https://issues.apache.org/jira/browse/HDFS-11287 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >
[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796527#comment-15796527 ] Manoj Govindassamy edited comment on HDFS-9391 at 1/4/17 12:03 AM: --- sure, will do 1 & 2. 3. >> For the EnteringMaintenanceNodes page, it uses maintenanceOnlyReplicas to >> describe Blocks with no live replicas. Should we use >> OutOfServiceOnlyReplicas? Thanks for bringing up this [~mingma]. There are some inconsistencies even with "Decommissioning" page and would like to get clarified on that number as well. * HDFS-9390 updated the {{Decommissioning}} page to use {{getOutOfServiceOnlyReplicas()}} instead of {{getDecommissionOnlyReplicas()}} * But as part of HDFS-9390, getOutOfServiceOnlyReplicas() which got introduced, included all Maintenance and Decommission replicas. Effectively, the page has been showing all "out of service" replicas, even though the page name is "Decommissioning" Excerpts from Patch v02: {code} if ((liveReplicas == 0) && (num.decommissionedAndDecommissioning() > 0)) { decommissionOnlyReplicas++; } if ((liveReplicas == 0) && (num.maintenanceReplicas() > 0)) { maintenanceOnlyReplicas++; } if ((liveReplicas == 0) && (num.outOfServiceReplicas() > 0)) { outOfServiceOnlyReplicas++; } {code} * So, what should "Decommissioning" page actually show ? In the patch v02 uploaded here, I made this page to include decommission related replicas only. And, not all out of service replicas. * Now coming to "Entering Maintenance" page, what exact replicas should be included here ? If we show up "OutOfServiceOnlyReplicas" then it will include all decommissioning related replicas as well. So, I am using "maintenanceOnlyReplicas" for this page. Do, you still believe showing all "OutOfServiceOnlyReplicas" would be better here ? Please let me know. was (Author: manojg): >> For the EnteringMaintenanceNodes page, it uses maintenanceOnlyReplicas to >> describe Blocks with no live replicas. Should we use >> OutOfServiceOnlyReplicas? Thanks for bringing up this [~mingma]. There are some inconsistencies even with "Decommissioning" page and would like to get clarified on that number as well. * HDFS-9390 updated the {{Decommissioning}} page to use {{getOutOfServiceOnlyReplicas()}} instead of {{getDecommissionOnlyReplicas()}} * But as part of HDFS-9390, getOutOfServiceOnlyReplicas() which got introduced included all Maintenance and Decommission replicas. Effectively, the page has been showing all "out of service" replicas, even though the page name is "Decommissioning" Excerpts from Patch v02: {code} if ((liveReplicas == 0) && (num.decommissionedAndDecommissioning() > 0)) { decommissionOnlyReplicas++; } if ((liveReplicas == 0) && (num.maintenanceReplicas() > 0)) { maintenanceOnlyReplicas++; } if ((liveReplicas == 0) && (num.outOfServiceReplicas() > 0)) { outOfServiceOnlyReplicas++; } {code} * So, what should "Decommissioning" page actually show ? In the patch v02 uploaded here, I made this page to include decommission related replicas only. And, not all out of service replicas. * Now coming to "Entering Maintenance" page, what exact replicas should be included here ? If we show up "OutOfServiceOnlyReplicas" then it will include all decommissioning related replicas as well. So, I am using "maintenanceOnlyReplicas" for this page. Do, you still believe showing all "OutOfServiceOnlyReplicas" would be better here ? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796571#comment-15796571 ] Hanisha Koneru commented on HDFS-11280: --- Verified that the following unit tests are passing now. - TestWebHDFSXAttr - TestWebHDFS - TestWebHdfsTokens - TestWebHdfsWithRestCsrfPreventionFilter - TestAuditLogs > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11218) Add option to skip open files during HDFS Snapshots
[ https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796535#comment-15796535 ] churro morales commented on HDFS-11218: --- This seems quite useful. Are you guys working on this patch currently? > Add option to skip open files during HDFS Snapshots > --- > > Key: HDFS-11218 > URL: https://issues.apache.org/jira/browse/HDFS-11218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > *Problem:* > When there are files being written and when HDFS Snapshots are taken in > parallel, Snapshots do capture all these files, but these being written > files in Snapshots do not have the point-in-time file length captured. > At the time of File close or any other meta data modification operation on > that file which was previously open, HDFS reconciles the file length and > records the modification in the last taken Snapshot. All the previously taken > Snapshots continue to have the same open File with no modification recorded. > So, all those previous snapshots end up using the final modification record > in the next available snapshot. > *Proposal:* > HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M > is the number file modifications. So, it would very expensive to record > modifications for all the open files in all the snapshots. For applications > that do not want to capture incomplete / partial being written binary files > in the snapshots, it would be preferable to have an extra option to skip open > files. This way, they don't have to worry about restoring inconsistent files > from the snapshots. > {noformat} > hdfs dfs -createSnapshot -skipOpenFiles > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796527#comment-15796527 ] Manoj Govindassamy commented on HDFS-9391: -- >> For the EnteringMaintenanceNodes page, it uses maintenanceOnlyReplicas to >> describe Blocks with no live replicas. Should we use >> OutOfServiceOnlyReplicas? Thanks for bringing up this [~mingma]. There are some inconsistencies even with "Decommissioning" page and would like to get clarified on that number as well. * HDFS-9390 updated the {{Decommissioning}} page to use {{getOutOfServiceOnlyReplicas()}} instead of {{getDecommissionOnlyReplicas()}} * But as part of HDFS-9390, getOutOfServiceOnlyReplicas() which got introduced included all Maintenance and Decommission replicas. Effectively, the page has been showing all "out of service" replicas, even though the page name is "Decommissioning" Excerpts from Patch v02: {code} if ((liveReplicas == 0) && (num.decommissionedAndDecommissioning() > 0)) { decommissionOnlyReplicas++; } if ((liveReplicas == 0) && (num.maintenanceReplicas() > 0)) { maintenanceOnlyReplicas++; } if ((liveReplicas == 0) && (num.outOfServiceReplicas() > 0)) { outOfServiceOnlyReplicas++; } {code} * So, what should "Decommissioning" page actually show ? In the patch v02 uploaded here, I made this page to include decommission related replicas only. And, not all out of service replicas. * Now coming to "Entering Maintenance" page, what exact replicas should be included here ? If we show up "OutOfServiceOnlyReplicas" then it will include all decommissioning related replicas as well. So, I am using "maintenanceOnlyReplicas" for this page. Do, you still believe showing all "OutOfServiceOnlyReplicas" would be better here ? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796525#comment-15796525 ] Hadoop QA commented on HDFS-11279: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 234 unchanged - 1 fixed = 235 total (was 235) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 11s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 92m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11279 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845433/HDFS-11279.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a26552b1bf4f 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ebdd2e0 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18008/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18008/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18008/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > At
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796463#comment-15796463 ] Manoj Govindassamy commented on HDFS-9391: -- sure, makes sense. will do this. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
[ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796425#comment-15796425 ] Uma Maheswara Rao G edited comment on HDFS-11284 at 1/3/17 10:46 PM: - HI [~yuanbo], Retry will not happen in DN itself. When DN send movement result as failure, NN will take care to retry(HDFS-11029) . That time, NN find all existing blocks satisfied, then that items will be ignore to send for movement. If still need satisfaction, then it will send by finding new src,targets again. The default time for retry was 30mins. (Higher timeout made because, some times DN itself takes longer time to send results back due to low process nodes, then NN unnecessarily go for retry. This can be refined more on testing) Hope this helps you understand better. {quote} Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last issue seems belong to retry mechanism. I'm thinking about removing/changing this JIRA. {quote} Please keep this JIRA open, until you agree on the reason. Can you confirm one point from your logs that whether the block was deleted due to over replication and used the same node for movement(as movement was scheduled before)? If thats the case, behavior should be fine. Also can you confirm remaining block movements were successful (by looking at logs)? Any way, go ahead with HDFS-11150 please. There were some test failure related to that, can you please check? Thanks a lot for putting efforts. was (Author: umamaheswararao): HI [~yuanbo], Retry will not happen in DN itself. When DN send movement result as failure, NN will take care to retry. That time, NN find all existing blocks satisfied, then that items will be ignore to send for movement. If still need satisfaction, then it will send by finding new src,targets again. The default time for retry was 30mins. (Higher timeout made because, some times DN itself takes longer time to send results back due to low process nodes, then NN unnecessarily go for retry. This can be refined more on testing) Hope this helps you understand better. {quote} Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last issue seems belong to retry mechanism. I'm thinking about removing/changing this JIRA. {quote} Please keep this JIRA open, until you agree on the reason. Can you confirm one point from your logs that whether the block was deleted due to over replication and used the same node for movement(as movement was scheduled before)? If thats the case, behavior should be fine. Also can you confirm remaining block movements were successful (by looking at logs)? Any way, go ahead with HDFS-11150 please. There were some test failure related to that, can you please check? Thanks a lot for putting efforts. > [SPS]: Avoid running SPS under safemode and fix issues in target node > choosing. > --- > > Key: HDFS-11284 > URL: https://issues.apache.org/jira/browse/HDFS-11284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: TestSatisfier.java > > > Recently I've found in some conditions, SPS is not stable: > * SPS runs under safe mode. > * There're some overlap nodes in the chosen target nodes. > * The real replication number of block doesn't match the replication factor. > For example, the real replication is 2 while the replication factor is 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11288) Manually allow block replication/deletion in Safe Mode
[ https://issues.apache.org/jira/browse/HDFS-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796435#comment-15796435 ] Esteban Gutierrez commented on HDFS-11288: -- I think the safe mode semantic shouldn't be changed, probably the easiest approach is to have something like a single user (or admin group) switch for that matter. > Manually allow block replication/deletion in Safe Mode > -- > > Key: HDFS-11288 > URL: https://issues.apache.org/jira/browse/HDFS-11288 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Lukas Majercak > > Currently, the Safe Mode does not allow block replication/deletion, which > makes sense, especially on startup, as we do not want to replicate blocks > unnecessarily. > An issue we have seen in our clusters though, is when the NameNode is getting > overwhelmed with the amounts of needed replications; in which case, we would > like to be able to manually set the NN to be in a state in which R/Ws to FS > are disallowed but the NN continues replicating/deleting blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11289) Make SPS movement monitor timeouts configurable
Uma Maheswara Rao G created HDFS-11289: -- Summary: Make SPS movement monitor timeouts configurable Key: HDFS-11289 URL: https://issues.apache.org/jira/browse/HDFS-11289 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for making it configurable. {code} // TODO: below selfRetryTimeout and checkTimeout can be configurable later // Now, the default values of selfRetryTimeout and checkTimeout are 30mins // and 5mins respectively this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
[ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796425#comment-15796425 ] Uma Maheswara Rao G commented on HDFS-11284: HI [~yuanbo], Retry will not happen in DN itself. When DN send movement result as failure, NN will take care to retry. That time, NN find all existing blocks satisfied, then that items will be ignore to send for movement. If still need satisfaction, then it will send by finding new src,targets again. The default time for retry was 30mins. (Higher timeout made because, some times DN itself takes longer time to send results back due to low process nodes, then NN unnecessarily go for retry. This can be refined more on testing) Hope this helps you understand better. {quote} Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last issue seems belong to retry mechanism. I'm thinking about removing/changing this JIRA. {quote} Please keep this JIRA open, until you agree on the reason. Can you confirm one point from your logs that whether the block was deleted due to over replication and used the same node for movement(as movement was scheduled before)? If thats the case, behavior should be fine. Also can you confirm remaining block movements were successful (by looking at logs)? Any way, go ahead with HDFS-11150 please. There were some test failure related to that, can you please check? Thanks a lot for putting efforts. > [SPS]: Avoid running SPS under safemode and fix issues in target node > choosing. > --- > > Key: HDFS-11284 > URL: https://issues.apache.org/jira/browse/HDFS-11284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: TestSatisfier.java > > > Recently I've found in some conditions, SPS is not stable: > * SPS runs under safe mode. > * There're some overlap nodes in the chosen target nodes. > * The real replication number of block doesn't match the replication factor. > For example, the real replication is 2 while the replication factor is 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796417#comment-15796417 ] Hadoop QA commented on HDFS-11280: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 62 unchanged - 0 fixed = 63 total (was 62) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11280 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845438/HDFS-11280.for.2.8.and.beyond.5.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux aba63058763e 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ebdd2e0 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18011/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18011/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18011/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 >
[jira] [Updated] (HDFS-11287) Storage class member storageDirs should be private to avoid unprotected access by derived classes
[ https://issues.apache.org/jira/browse/HDFS-11287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11287: -- Affects Version/s: 3.0.0-alpha1 Target Version/s: 3.0.0-alpha2 Status: Patch Available (was: Open) > Storage class member storageDirs should be private to avoid unprotected > access by derived classes > - > > Key: HDFS-11287 > URL: https://issues.apache.org/jira/browse/HDFS-11287 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11287.01.patch > > > HDFS-11267 fix made the abstract class Storage.java member variable > storageDirs a thread safe one so that all its derived classes like NNStorage, > JNStorage, DataStorage will not face any ConcurrentModificationException when > there are volume add/remove and listing operations running in parallel. The > fix rebase missed out few changers to the original patch. This jira is to > address the addendum needed for the HDFS-11267 commits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11287) Storage class member storageDirs should be private to avoid unprotected access by derived classes
[ https://issues.apache.org/jira/browse/HDFS-11287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11287: -- Attachment: HDFS-11287.01.patch Attached v01 patch to address the following * Made Storage#storageDirs a private final member * BlockPoolSliceStorage and DataStorage now access parent Storage directories via getStorageDirs. [~eddyxu], can you please take a look at the patch ? > Storage class member storageDirs should be private to avoid unprotected > access by derived classes > - > > Key: HDFS-11287 > URL: https://issues.apache.org/jira/browse/HDFS-11287 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11287.01.patch > > > HDFS-11267 fix made the abstract class Storage.java member variable > storageDirs a thread safe one so that all its derived classes like NNStorage, > JNStorage, DataStorage will not face any ConcurrentModificationException when > there are volume add/remove and listing operations running in parallel. The > fix rebase missed out few changers to the original patch. This jira is to > address the addendum needed for the HDFS-11267 commits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796369#comment-15796369 ] Zheng Shao commented on HDFS-11280: --- The Test Failures were caused by the calling of "conn.getInputStream()" in the case of HTTP Redirect. Since HTTP Redirect will be slow and won't cause the ephemeral port problem, I will skip that change for now. I've uploaded a new patch that passes all the earlier failed tests. > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Attachment: (was: HDFS-11280.for.2.7.and.below.patch) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Status: Patch Available (was: Open) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Attachment: HDFS-11280.for.2.8.and.beyond.5.patch > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, > HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, > HDFS-11280.for.2.8.and.beyond.5.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Status: Open (was: Patch Available) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, > HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, > HDFS-11280.for.2.8.and.beyond.5.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796316#comment-15796316 ] Hadoop QA commented on HDFS-11279: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 234 unchanged - 1 fixed = 235 total (was 235) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 2s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11279 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845427/HDFS-11279.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d84651fce929 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 591fb15 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18007/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18007/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18007/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > At
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796310#comment-15796310 ] Hadoop QA commented on HDFS-11280: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-11280 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11280 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845434/HDFS-11280.for.2.7.and.below.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18009/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, > HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11282) Document the missing metrics of DataNode Volume IO operations
[ https://issues.apache.org/jira/browse/HDFS-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796299#comment-15796299 ] Arpit Agarwal edited comment on HDFS-11282 at 1/3/17 9:56 PM: -- Thanks [~linyiqun]. Comments below. # The rendered table looks wrong. The {{|}} need to be escaped. See !metrics-rendered.png|width=900! # Can you add that enabling per-volume metrics may have a performance impact? # "Average Time" should be replaced with "Mean Time" in descriptions to be accurate. # The description for the count metrics and the rate metric is the same e.g. TotalMetadataOperations and MetadataOperationRateNumOps. I am not sure that is correct. # We can say that metadata operations include stat, list, mkdir, delete, move, open and posix_fadvise. # The description _Average time of file io error operations in milliseconds_ could be improved. It measures the mean time in milliseconds from the start of an operation to hitting a failure (assuming failures were seen on that volume). was (Author: arpitagarwal): Thanks [~linyiqun]. Comments below. # The rendered table looks wrong. The {{|}} need to be escaped. See !metrics-rendered.png! # Can you add that enabling per-volume metrics may have a performance impact? # "Average Time" should be replaced with "Mean Time" in descriptions to be accurate. # The description for the count metrics and the rate metric is the same e.g. TotalMetadataOperations and MetadataOperationRateNumOps. I am not sure that is correct. # We can say that metadata operations include stat, list, mkdir, delete, move, open and posix_fadvise. # The description _Average time of file io error operations in milliseconds_ could be improved. It measures the mean time in milliseconds from the start of an operation to hitting a failure (assuming failures were seen on that volume). > Document the missing metrics of DataNode Volume IO operations > - > > Key: HDFS-11282 > URL: https://issues.apache.org/jira/browse/HDFS-11282 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-11282.001.patch, HDFS-11282.002.patch, > metrics-rendered.png > > > In HDFS-10959, it added many metrics of datanode volume io opearions. But it > hasn't been documented. This JIRA addressed on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11282) Document the missing metrics of DataNode Volume IO operations
[ https://issues.apache.org/jira/browse/HDFS-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-11282: - Attachment: metrics-rendered.png Thanks [~linyiqun]. Comments below. # The rendered table looks wrong. The {{|}} need to be escaped. See !metrics-rendered.png! # Can you add that enabling per-volume metrics may have a performance impact? # "Average Time" should be replaced with "Mean Time" in descriptions to be accurate. # The description for the count metrics and the rate metric is the same e.g. TotalMetadataOperations and MetadataOperationRateNumOps. I am not sure that is correct. # We can say that metadata operations include stat, list, mkdir, delete, move, open and posix_fadvise. # The description _Average time of file io error operations in milliseconds_ could be improved. It measures the mean time in milliseconds from the start of an operation to hitting a failure (assuming failures were seen on that volume). > Document the missing metrics of DataNode Volume IO operations > - > > Key: HDFS-11282 > URL: https://issues.apache.org/jira/browse/HDFS-11282 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-11282.001.patch, HDFS-11282.002.patch, > metrics-rendered.png > > > In HDFS-10959, it added many metrics of datanode volume io opearions. But it > hasn't been documented. This JIRA addressed on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Status: Patch Available (was: Open) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, > HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Attachment: HDFS-11280.for.2.7.and.below.patch > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, > HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, > HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: -- Status: Open (was: Patch Available) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-11279: -- Attachment: HDFS-11279.002.patch Thank you [~xyao] and [~liuml07]. I have removed the unused import and added the javadoc back in v02. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch, > HDFS-11279.002.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796224#comment-15796224 ] Mingliang Liu commented on HDFS-11279: -- {{import java.io.File;}} in DataNode seems unnecessary? It will be good if we keep the javadoc for public test helper method {{getVolume()}}. Other than that it looks good to me. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11288) Manually allow block replication/deletion in Safe Mode
[ https://issues.apache.org/jira/browse/HDFS-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796189#comment-15796189 ] Lukas Majercak commented on HDFS-11288: --- Will send a proposed change momentarily > Manually allow block replication/deletion in Safe Mode > -- > > Key: HDFS-11288 > URL: https://issues.apache.org/jira/browse/HDFS-11288 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Lukas Majercak > > Currently, the Safe Mode does not allow block replication/deletion, which > makes sense, especially on startup, as we do not want to replicate blocks > unnecessarily. > An issue we have seen in our clusters though, is when the NameNode is getting > overwhelmed with the amounts of needed replications; in which case, we would > like to be able to manually set the NN to be in a state in which R/Ws to FS > are disallowed but the NN continues replicating/deleting blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11288) Manually allow block replication/deletion in Safe Mode
Lukas Majercak created HDFS-11288: - Summary: Manually allow block replication/deletion in Safe Mode Key: HDFS-11288 URL: https://issues.apache.org/jira/browse/HDFS-11288 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0-alpha1 Reporter: Lukas Majercak Currently, the Safe Mode does not allow block replication/deletion, which makes sense, especially on startup, as we do not want to replicate blocks unnecessarily. An issue we have seen in our clusters though, is when the NameNode is getting overwhelmed with the amounts of needed replications; in which case, we would like to be able to manually set the NN to be in a state in which R/Ws to FS are disallowed but the NN continues replicating/deleting blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796134#comment-15796134 ] Xiaoyu Yao commented on HDFS-11279: --- Thanks [~hanishakoneru] for the update. +1 pending Jenkins. There is a unused import in DataNode.java which we can fix at commit time to save a Jenkins run. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11287) Storage class member storageDirs should be private to avoid unprotected access by derived classes
Manoj Govindassamy created HDFS-11287: - Summary: Storage class member storageDirs should be private to avoid unprotected access by derived classes Key: HDFS-11287 URL: https://issues.apache.org/jira/browse/HDFS-11287 Project: Hadoop HDFS Issue Type: Bug Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy HDFS-11267 fix made the abstract class Storage.java member variable storageDirs a thread safe one so that all its derived classes like NNStorage, JNStorage, DataStorage will not face any ConcurrentModificationException when there are volume add/remove and listing operations running in parallel. The fix rebase missed out few changers to the original patch. This jira is to address the addendum needed for the HDFS-11267 commits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-11279: -- Attachment: HDFS-11279.001.patch Thank you [~xyao] for reviewing the patch. I have addressed your comments in patch v01. > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch, HDFS-11279.001.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-4169) Add per-disk latency metrics to DataNode
[ https://issues.apache.org/jira/browse/HDFS-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao resolved HDFS-4169. -- Resolution: Duplicate Fix Version/s: 3.0.0-alpha2 Target Version/s: (was: ) > Add per-disk latency metrics to DataNode > > > Key: HDFS-4169 > URL: https://issues.apache.org/jira/browse/HDFS-4169 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Xiaoyu Yao > Fix For: 3.0.0-alpha2 > > > Currently, if one of the drives on the DataNode is slow, it's hard to > determine what the issue is. This can happen due to a failing disk, bad > controller, etc. It would be preferable to expose per-drive metrics/jmx with > latency statistics about how long reads/writes are taking. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795967#comment-15795967 ] Haohui Mai commented on HDFS-11280: --- Any ideas why Jenkins is still happy? > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-4169) Add per-disk latency metrics to DataNode
[ https://issues.apache.org/jira/browse/HDFS-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795959#comment-15795959 ] Jitendra Nath Pandey commented on HDFS-4169: This seems to be a duplicate of HDFS-10959. > Add per-disk latency metrics to DataNode > > > Key: HDFS-4169 > URL: https://issues.apache.org/jira/browse/HDFS-4169 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Xiaoyu Yao > > Currently, if one of the drives on the DataNode is slow, it's hard to > determine what the issue is. This can happen due to a failing disk, bad > controller, etc. It would be preferable to expose per-drive metrics/jmx with > latency statistics about how long reads/writes are taking. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795913#comment-15795913 ] Andrew Wang commented on HDFS-11156: Sorry about that, I've reverted the branch-2 patch. [~cheersyang] want to provide a branch-2 patch? > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795897#comment-15795897 ] Daryn Sharp commented on HDFS-11156: This broke branch-2 due to inner class accessing a non-final. > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795843#comment-15795843 ] Ming Ma commented on HDFS-9391: --- Thanks Manoj. Yep let us keep the existing property as Eddy mentioned. * In {{getMaintenanceOnlyReplicas}} the check of {{if (!isDecommissionInProgress() && !isEnteringMaintenance())}} only needs to check for maintenance part. * It seems you will need to add {{In Maintenance & dead}} to match the addition of {{nodes[i].state = "down-maintenance";}}. * For the {{EnteringMaintenanceNodes}} page, it uses {{maintenanceOnlyReplicas}} to describe {{Blocks with no live replicas}}. Should we use {{OutOfServiceOnlyReplicas}}? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795771#comment-15795771 ] Hudson commented on HDFS-11156: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11063 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11063/]) HDFS-11156. Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API. (wang: rev 7fcc73fc0d248aae1edbd4e1514c5818f6198928) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/resources/GetOpParam.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHDFS.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11279) Cleanup unused DataNode#checkDiskErrorAsync()
[ https://issues.apache.org/jira/browse/HDFS-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795742#comment-15795742 ] Xiaoyu Yao commented on HDFS-11279: --- Thanks [~hanishakoneru] for working on this. The patch looks pretty good to me. I just have two NITs, +1 otherwise. Datanode.java NIT: Line50, import can be moved down to follow the alphabet order importorg.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl; NIT: Line 2470, can you add @VisibleForTesting for DataNode#getVolume? Or I would suggest moving DataNode#getVolume(File basePath) into a test utility class such as DataNodeTestUtils#getVolume {code} Public FsVolumeImpl getVolume(DataNode dn,File basePath) {code} > Cleanup unused DataNode#checkDiskErrorAsync() > - > > Key: HDFS-11279 > URL: https://issues.apache.org/jira/browse/HDFS-11279 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Hanisha Koneru >Priority: Minor > Attachments: HDFS-11279.000.patch > > > After HDFS-11274, we will not trigger checking all datanode volumes upon IO > failure on a single volume. This makes the original implementation > DataNode#checkDiskErrorAsync and DatasetVolumeChecker#checkAllVolumesAsync() > not used in any of the production code. > This ticket is opened to remove these unused code and related tests if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11076) Add unit test for extended Acls
[ https://issues.apache.org/jira/browse/HDFS-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795720#comment-15795720 ] Chen Liang commented on HDFS-11076: --- I was not aware of the FSAclBaseTest, Thanks [~cnauroth] for the reference! I've taken a quick look, it appears to me that scenarios 1 & 2 are indeed covered there. But I'm not sure about 3, 4 & 5. While I'm double checking on this, [~arpitagarwal] and [~liuml07] do you have any comments? > Add unit test for extended Acls > --- > > Key: HDFS-11076 > URL: https://issues.apache.org/jira/browse/HDFS-11076 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Chen Liang >Assignee: Chen Liang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11076.001.patch, HDFS-11076.002.patch, > HDFS-11076.003.patch > > > This JIRA tries to add unit tests for extended ACLs in HDFS, to cover the > following scenarios: > # the default ACL of parent directory should be inherited by newly created > child directory and file > # the access ACL of parent directory should not be inherited by newly created > child directory and file > # changing the default ACL of parent directory should not change the ACL of > existing child directory and file > # child directory can add more default ACL in addition to the ACL inherited > from parent directory > # child directory can also restrict ACL based on the ACL inherited from > parent directory -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-11156: --- Resolution: Fixed Fix Version/s: 3.0.0-alpha2 2.9.0 Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2, thanks for the contribution [~cheersyang]! > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795644#comment-15795644 ] Andrew Wang commented on HDFS-11156: Sorry for the delay, I was on vacation. +1 LGTM, thanks for the hard work and sticking with this JIRA! I'll commit shortly. > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11285) Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED)
[ https://issues.apache.org/jira/browse/HDFS-11285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795630#comment-15795630 ] Andrew Wang commented on HDFS-11285: Hi [~cltlfcjin], that code wasn't removed by the HDFS-7411 refactor, it was moved into HeartbeatManager by HDFS-7725. There's also TestDecommissionStatus#testDecommissionDeadDN which tests this case. Could you provide a unit test that reproduces your issue? > Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never > transition to (Dead, DECOMMISSIONED) > -- > > Key: HDFS-11285 > URL: https://issues.apache.org/jira/browse/HDFS-11285 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lantao Jin > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. In a large cluster, > we met more than 100 nodes were dead, decommissioning and their {panel} Under > replicated blocks {panel} {panel} Blocks with no live replicas {panel} were > all ZERO. Actually It has been fixed in > [HDFS-7374|https://issues.apache.org/jira/browse/HDFS-7374]. After that, we > can refreshNode twice to eliminate this case. But, seems this patch missed > after refactor[HDFS-7411|https://issues.apache.org/jira/browse/HDFS-7411]. We > are using a Hadoop version based 2.7.1 and only below operations can > transition the status from {panel} Dead, DECOMMISSION_INPROGRESS {panel} to > {panel} Dead, DECOMMISSIONED {panel}: > # Retire it from hdfs-exclude > # refreshNodes > # Re-add it to hdfs-exclude > # refreshNodes > So, why the code removed after refactor in the new DecommissionManager? > {code:java} > if (!node.isAlive) { > LOG.info("Dead node " + node + " is decommissioned immediately."); > node.setDecommissioned(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795401#comment-15795401 ] Colin P. McCabe commented on HDFS-6994: --- Hi [~rvs], thanks for the note. It would be nice to get this upstreamed in Hadoop itself so that we could stop splitting our efforts between so many different native client initiatives. I think it would be a nice project for someone who wanted to get more involved with the Hadoop community. I think the biggest goals are maintainability, maintainability, and maintainability. :) I'm less concerned about making it blindingly fast or featureful, since I think that can be added later after we have gotten the coding style synced up with the rest of Hadoop. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3 > http://pivotal-data-attic.github.io/pivotalrd-libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795396#comment-15795396 ] Kihwal Lee commented on HDFS-7784: -- [~xiegang112], when you have a chance to test the performance, please also share the jvm GC setting and the hardware spec (e.g. how many cores, as it affects the GC performance). It will be even better if you can measure the GC activities before and after. If everything looks positive, people will certainly be interested. > load fsimage in parallel > > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795189#comment-15795189 ] Hudson commented on HDFS-11280: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11062 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11062/]) Revert "HDFS-11280. Allow WebHDFS to reuse HTTP connections to NN. (brahma: rev b31e1951e044b2c6f6e88a007a8c175941ddd674) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11280: Status: Patch Available (was: Reopened) Reverted as it's broken the testcases. > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11280: Fix Version/s: (was: 3.0.0-alpha2) (was: 2.7.4) (was: 2.9.0) (was: 2.8.0) > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN
[ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reopened HDFS-11280: - > Allow WebHDFS to reuse HTTP connections to NN > - > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2 > > Attachments: HDFS-11280.for.2.7.and.below.patch, > HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, > HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. > When we use webhdfs as the source in distcp, this used up all ephemeral > ports on the client side since all closed connections continue to occupy the > port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call > conn.getInputStream().close() instead to make sure the connection is kept > alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | > grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, > which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: > https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11170) Add create API in filesystem public class to support assign parameter through builder
[ https://issues.apache.org/jira/browse/HDFS-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhou updated HDFS-11170: Status: Patch Available (was: Open) > Add create API in filesystem public class to support assign parameter through > builder > - > > Key: HDFS-11170 > URL: https://issues.apache.org/jira/browse/HDFS-11170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: Wei Zhou > Attachments: HDFS-11170-00.patch > > > FileSystem class supports multiple create functions to help user create file. > Some create functions has many parameters, it's hard for user to exactly > remember these parameters and their orders. This task is to add builder > based create functions to help user more easily create file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11170) Add create API in filesystem public class to support assign parameter through builder
[ https://issues.apache.org/jira/browse/HDFS-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhou updated HDFS-11170: Attachment: HDFS-11170-00.patch This initial patch. Thanks [~Sammi] for the great help! > Add create API in filesystem public class to support assign parameter through > builder > - > > Key: HDFS-11170 > URL: https://issues.apache.org/jira/browse/HDFS-11170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: Wei Zhou > Attachments: HDFS-11170-00.patch > > > FileSystem class supports multiple create functions to help user create file. > Some create functions has many parameters, it's hard for user to exactly > remember these parameters and their orders. This task is to add builder > based create functions to help user more easily create file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11286) GETFILESTATUS, RENAME logic breaking due to incomplete path argument
Sampada Dehankar created HDFS-11286: --- Summary: GETFILESTATUS, RENAME logic breaking due to incomplete path argument Key: HDFS-11286 URL: https://issues.apache.org/jira/browse/HDFS-11286 Project: Hadoop HDFS Issue Type: Bug Components: httpfs Affects Versions: 2.7.1 Environment: Windows Reporter: Sampada Dehankar We use ADLS to store customer data and to access the data from our containers HttpFS Server-Client is used. HttpFS functions like GETFILESTATUS, RENAME expect absolute 'path' of the file(s) as the argument. But when the request is received at the server from HttpFs Client, the server is forwarding only the relative path rather than absolute path to ADLS. This is breaking the logic for GETFILESTATUS, RENAME functions. Steps to reproduce GETFILESTATUS Command Bug: Run the following command from the client: Example 1: hadoop fs –ls adl_scheme://account/folderA/folderB/ Server logs show only the relative path "folderA/folderB/" is forwarded to ADLS. Example 2: hadoop fs –ls adl_scheme://account/folderX/folderY/SampleFile Server logs show only the relative path "folderX/folderY/SampleFile" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the path. So the path in example 1 and example 2 would look like this 'adl_scheme://account/folderA/folderB/' and 'adl_scheme://account/folderX/folderY/SampleFile' respectively. We have the fix ready and currently it is in the testing phase. Steps to reproduce RENAME Command Bug: Run the following command from the client: Example 1: Hadoop fs –mv /folderA/oldFileName /folderA/newFileName Server logs show only the relative old file path "folderA/oldFileName" and new File path "adl_scheme://account/folderA/newFileName" is forwarded to ADLS. Fix: Prepend the ADLS scheme and account name to the old file name path. We have the fix ready and currently it is in the testing phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11191) Datanode Capacity is misleading if the dfs.datanode.data.dir is configured with two directories from the same file system.
[ https://issues.apache.org/jira/browse/HDFS-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794776#comment-15794776 ] Hadoop QA commented on HDFS-11191: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-hdfs-project: The patch generated 3 new + 386 unchanged - 7 fixed = 389 total (was 393) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 9s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}116m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.server.namenode.TestFileLimit | | | hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter | | | hadoop.hdfs.TestSmallBlock | | | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.TestReplication | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestInjectionForSimulatedStorage | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.TestDFSRSDefault10x4StripedInputStream | | | hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer | | | hadoop.hdfs.web.TestWebHDFS | | | hadoo
[jira] [Created] (HDFS-11285) Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED)
Lantao Jin created HDFS-11285: - Summary: Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED) Key: HDFS-11285 URL: https://issues.apache.org/jira/browse/HDFS-11285 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Lantao Jin We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. In a large cluster, we met more than 100 nodes were dead, decommissioning and their {panel} Under replicated blocks {panel} {panel} Blocks with no live replicas {panel} were all ZERO. Actually It has been fixed in [HDFS-7374|https://issues.apache.org/jira/browse/HDFS-7374]. After that, we can refreshNode twice to eliminate this case. But, seems this patch missed after refactor[HDFS-7411|https://issues.apache.org/jira/browse/HDFS-7411]. We are using a Hadoop version based 2.7.1 and only below operations can transition the status from {panel} Dead, DECOMMISSION_INPROGRESS {panel} to {panel} Dead, DECOMMISSIONED {panel}: # Retire it from hdfs-exclude # refreshNodes # Re-add it to hdfs-exclude # refreshNodes So, why the code removed after refactor in the new DecommissionManager? {code:java} if (!node.isAlive) { LOG.info("Dead node " + node + " is decommissioned immediately."); node.setDecommissioned(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11191) Datanode Capacity is misleading if the dfs.datanode.data.dir is configured with two directories from the same file system.
[ https://issues.apache.org/jira/browse/HDFS-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11191: --- Attachment: HDFS-11191.04.patch > Datanode Capacity is misleading if the dfs.datanode.data.dir is configured > with two directories from the same file system. > -- > > Key: HDFS-11191 > URL: https://issues.apache.org/jira/browse/HDFS-11191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.5.0 > Environment: SLES 11SP3 > HDP 2.5.0 >Reporter: Deepak Chander >Assignee: Weiwei Yang > Labels: capacity, datanode, storage, user-experience > Attachments: HDFS-11191.01.patch, HDFS-11191.02.patch, > HDFS-11191.03.patch, HDFS-11191.04.patch > > > In the command “hdfs dfsadmin -report” The Configured Capacity is misleading > if the dfs.datanode.data.dir is configured with two directories from the same > file system. > hdfs@kimtest1:~> hdfs dfsadmin -report > Configured Capacity: 239942369274 (223.46 GB) > Present Capacity: 207894724602 (193.62 GB) > DFS Remaining: 207894552570 (193.62 GB) > DFS Used: 172032 (168 KB) > DFS Used%: 0.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > - > Live datanodes (3): > Name: 172.26.79.87:50010 (kimtest3) > Hostname: kimtest3 > Decommission Status : Normal > Configured Capacity: 79980789758 (74.49 GB) > DFS Used: 57344 (56 KB) > Non DFS Used: 9528000512 (8.87 GB) > DFS Remaining: 70452731902 (65.61 GB) > DFS Used%: 0.00% > DFS Remaining%: 88.09% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 2 > Last contact: Tue Nov 29 06:59:02 PST 2016 > Name: 172.26.80.38:50010 (kimtest4) > Hostname: kimtest4 > Decommission Status : Normal > Configured Capacity: 79980789758 (74.49 GB) > DFS Used: 57344 (56 KB) > Non DFS Used: 13010952192 (12.12 GB) > DFS Remaining: 66969780222 (62.37 GB) > DFS Used%: 0.00% > DFS Remaining%: 83.73% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 2 > Last contact: Tue Nov 29 06:59:02 PST 2016 > Name: 172.26.79.86:50010 (kimtest2) > Hostname: kimtest2 > Decommission Status : Normal > Configured Capacity: 79980789758 (74.49 GB) > DFS Used: 57344 (56 KB) > Non DFS Used: 9508691968 (8.86 GB) > DFS Remaining: 70472040446 (65.63 GB) > DFS Used%: 0.00% > DFS Remaining%: 88.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 2 > Last contact: Tue Nov 29 06:59:02 PST 2016 > If you see my datanode root file system size its only 38GB > kimtest3:~ # df -h / > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/system-root 38G 2.6G 33G 8% / > kimtest4:~ # df -h / > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/system-root 38G 4.2G 32G 12% / > kimtest2:~ # df -h / > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/system-root 38G 2.6G 33G 8% / > The below is from hdfs-site.xml file > > dfs.datanode.data.dir > file:///grid/hadoop/hdfs/dn, file:///grid1/hadoop/hdfs/dn > > I have removed the other directory grid1 and restarted datanode process. > > dfs.datanode.data.dir > file:///grid/hadoop/hdfs/dn > > Now the size is reflecting correctly > hdfs@kimtest1:/grid> hdfs dfsadmin -report > Configured Capacity: 119971184637 (111.73 GB) > Present Capacity: 103947243517 (96.81 GB) > DFS Remaining: 103947157501 (96.81 GB) > DFS Used: 86016 (84 KB) > DFS Used%: 0.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > - > Live datanodes (3): > Name: 172.26.79.87:50010 (kimtest3) > Hostname: kimtest3 > Decommission Status : Normal > Configured Capacity: 39990394879 (37.24 GB) > DFS Used: 28672 (28 KB) > Non DFS Used: 4764057600 (4.44 GB) > DFS Remaining: 35226308607 (32.81 GB) > DFS Used%: 0.00% > DFS Remaining%: 88.09% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 2 > Last contact: Tue Nov 29 07:34:02 PST 2016 > Name: 172.26.80.38:50010 (kimtest4) > Hostname: kimtest4 > Decommission Status : Normal > Configured Capacity: 39990394879 (37.24 GB) > DFS Used: 28672 (28 KB) > Non DFS Used: 6505525248 (6.06 GB) > DFS Remaining: 33484840959 (31.19 GB) > DFS Used%: 0.0