[jira] [Commented] (HDFS-12758) Ozone: Correcting assertEquals argument order in test cases

2017-11-08 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245206#comment-16245206
 ] 

Chen Liang commented on HDFS-12758:
---

+1 on v00 patch. The failed tests are unrelated. I've committed to the feature 
branch. Thanks [~bharatviswa] for the contribution! 

> Ozone: Correcting assertEquals argument order in test cases
> ---
>
> Key: HDFS-12758
> URL: https://issues.apache.org/jira/browse/HDFS-12758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Bharat Viswanadham
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-12758-HDFS-7240.00.patch
>
>
> In few test cases, the arguments to {{Assert.assertEquals}} is swapped. Below 
> is the list of classes and test-cases where this has to be corrected.
> {noformat}
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/ksm/TestKeySpaceManager.java
>  testChangeVolumeQuota - line: 187, 197 & 204
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/web/TestDistributedOzoneVolumes.java
>  testCreateVolumes - line: 91
>  testCreateVolumesWithQuota - line: 103
>  testCreateVolumesWithInvalidQuota - line: 115
>  testCreateVolumesWithInvalidUser - line: 129
>  testCreateVolumesWithOutAdminRights - line: 144
>  testCreateVolumesInLoop - line: 156
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java
>  runTestPutKey - line: 239 & 246
>  runTestPutAndListKey - line: 428, 429, 451, 452, 458 & 459
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/container/transport/server/TestContainerServer.java
>  testClientServerWithContainerDispatcher - line: 219
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  verifyGetKey - line: 491
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerPersistence.java
>  testUpdateContainer - line: 776, 778, 794, 796, 821 & 823
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/container/common/TestEndPoint.java
>  testGetVersion - line: 122 & 124
>  testRegister - line: 215
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/container/replication/TestContainerReplicationManager.java
>  testDetectSingleContainerReplica - line: 168
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/ozone/scm/TestXceiverClientManager.java
>  testCaching - line: 82, 91, 96 & 97
>  testFreeByReference - line: 120, 130 & 137
>  testFreeByEviction - line: 165, 170, 177 & 185
> hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/ozone/TestOzoneAcls.java
>  testAclValues - line: 111, 112, 113, 116, 117, 118, 121, 122, 123, 126, 127, 
> 128, 131, 132, 133, 136, 137 & 138
> hadoop-tools/hadoop-ozone/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileInterfaces.java
>  testFileSystemInit - line: 102
>  testOzFsReadWrite - line: 123
>  testDirectory - line: 135, 138 & 139
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12793) Ozone : TestSCMCli is failing consistently

2017-11-08 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12793:
--
Attachment: HDFS-12793-HDFS-7240.001.patch

> Ozone : TestSCMCli is failing consistently
> --
>
> Key: HDFS-12793
> URL: https://issues.apache.org/jira/browse/HDFS-12793
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12793-HDFS-7240.001.patch
>
>
> In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests 
> in {{TestSCMCli}} that failed: {{testCloseContainer}}, 
> {{testDeleteContainer}} and {{testInfoContainer}}. I tested locally, these 
> three tests have been failing consistently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12793) Ozone : TestSCMCli is failing consistently

2017-11-08 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245245#comment-16245245
 ] 

Chen Liang commented on HDFS-12793:
---

The fail of the three tests were all caused by closing a container first, then 
check the status of the container, and found out the container is still in open 
status. (in {{ContainerMapping#closeContainer}}). The reason seems that in 
{{updateContainerState}}, after updating the container status, it should return 
{{updatedContainer.getState();}} instead of the original state.

> Ozone : TestSCMCli is failing consistently
> --
>
> Key: HDFS-12793
> URL: https://issues.apache.org/jira/browse/HDFS-12793
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12793-HDFS-7240.001.patch
>
>
> In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests 
> in {{TestSCMCli}} that failed: {{testCloseContainer}}, 
> {{testDeleteContainer}} and {{testInfoContainer}}. I tested locally, these 
> three tests have been failing consistently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12793) Ozone : TestSCMCli is failing consistently

2017-11-08 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12793:
--
Status: Patch Available  (was: Open)

> Ozone : TestSCMCli is failing consistently
> --
>
> Key: HDFS-12793
> URL: https://issues.apache.org/jira/browse/HDFS-12793
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12793-HDFS-7240.001.patch
>
>
> In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests 
> in {{TestSCMCli}} that failed: {{testCloseContainer}}, 
> {{testDeleteContainer}} and {{testInfoContainer}}. I tested locally, these 
> three tests have been failing consistently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-09 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246944#comment-16246944
 ] 

Chen Liang commented on HDFS-12791:
---

Thanks for working on this [~msingh]! +1 on v001 patch, the failed tests all 
passed in my local run. Will commit this shortly.

> NameNode Fsck http Connection can timeout for directories with multiple levels
> --
>
> Key: HDFS-12791
> URL: https://issues.apache.org/jira/browse/HDFS-12791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12791.001.patch
>
>
> Currently the http connections are flushed for every 100 files, however if 
> there are multiple levels of directories in the namespace then flushing will 
> be postponed till multiple directories levels have been traversed. This 
> connection timeout can be avoided if both files and directories are 
> considered for the flushing query.
> {code}
> if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
>   out.println();
>   out.flush();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-09 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12791:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> NameNode Fsck http Connection can timeout for directories with multiple levels
> --
>
> Key: HDFS-12791
> URL: https://issues.apache.org/jira/browse/HDFS-12791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12791.001.patch
>
>
> Currently the http connections are flushed for every 100 files, however if 
> there are multiple levels of directories in the namespace then flushing will 
> be postponed till multiple directories levels have been traversed. This 
> connection timeout can be avoided if both files and directories are 
> considered for the flushing query.
> {code}
> if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
>   out.println();
>   out.flush();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-09 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246948#comment-16246948
 ] 

Chen Liang commented on HDFS-12791:
---

Committed to trunk, thanks Mukul for the contribution!

> NameNode Fsck http Connection can timeout for directories with multiple levels
> --
>
> Key: HDFS-12791
> URL: https://issues.apache.org/jira/browse/HDFS-12791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12791.001.patch
>
>
> Currently the http connections are flushed for every 100 files, however if 
> there are multiple levels of directories in the namespace then flushing will 
> be postponed till multiple directories levels have been traversed. This 
> connection timeout can be avoided if both files and directories are 
> considered for the flushing query.
> {code}
> if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
>   out.println();
>   out.flush();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-14 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252875#comment-16252875
 ] 

Chen Liang commented on HDFS-12804:
---

Thanks for working on this [~msingh]! The patch LGTM, just one thing. Could you 
please verify the failed tests are unrelated? Because I see {{TestUnbuffer}} 
failed because it expects some error message.

> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-11-20 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259729#comment-16259729
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks for working on this [~elek]. Returning current state does seem to be a 
bug and was causing test failing. So I fixed it in HDFS-12793. As for creating 
container, have you tried to call 
{{cluster.getStorageContainerManager().allocateContainer}}, followed with two 
{{mapping.updateContainerState}}, just like what 
{{TestContainerMapping#createContainer}} is doing?

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in Balancer Document.

2017-11-20 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259763#comment-16259763
 ] 

Chen Liang commented on HDFS-12826:
---

Thanks [~peruguusha] for the patch! Would it be a bit more precise to take the 
other way: changing {{ipc}} to {{rpc}} instead?

> Document Saying the RPC port, But it's required IPC port in Balancer Document.
> --
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-20 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259818#comment-16259818
 ] 

Chen Liang commented on HDFS-12804:
---

Thanks [~msingh] for the update, I tested locally also, {{TestUnbuffer}} and 
{{TestBalancerRPCDelay}} did fail even without the patch. I've committed v003 
patch to trunk. Thanks Mukul for the contribution!

> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-20 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12804:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12791:
--
Fix Version/s: 3.1.0

> NameNode Fsck http Connection can timeout for directories with multiple levels
> --
>
> Key: HDFS-12791
> URL: https://issues.apache.org/jira/browse/HDFS-12791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.1.0
>
> Attachments: HDFS-12791.001.patch
>
>
> Currently the http connections are flushed for every 100 files, however if 
> there are multiple levels of directories in the namespace then flushing will 
> be postponed till multiple directories levels have been traversed. This 
> connection timeout can be avoided if both files and directories are 
> considered for the flushing query.
> {code}
> if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
>   out.println();
>   out.flush();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-21 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261474#comment-16261474
 ] 

Chen Liang commented on HDFS-12791:
---

[~zhz] Sorry I should have updated it. I believe it should be 3.1.0, I have 
updated the JIRA. ([~msingh] Please correct me if I'm wrong).

> NameNode Fsck http Connection can timeout for directories with multiple levels
> --
>
> Key: HDFS-12791
> URL: https://issues.apache.org/jira/browse/HDFS-12791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.1.0
>
> Attachments: HDFS-12791.001.patch
>
>
> Currently the http connections are flushed for every 100 files, however if 
> there are multiple levels of directories in the namespace then flushing will 
> be postponed till multiple directories levels have been traversed. This 
> connection timeout can be avoided if both files and directories are 
> considered for the flushing query.
> {code}
> if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
>   out.println();
>   out.flush();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-11-28 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269775#comment-16269775
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks [~elek] for the update! Looks like v002 patch needs to be rebased again. 
Also just one minor comment, could you please add some log.debug log to the new 
branch in {{HeartbeatEndpointTask#processResponse}}? right before 
this.context.addCommand.

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-11-29 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.007.patch

update v007 patch. rewrote many places as lots of code has changed since 
previous patch. post v007 patch to trigger Jenkins, still doing more testing 
locally.

Please note that this is still work in progress. This patch only focuses on 
server side. Specifically, this patch adds versioning to blocks and persistent 
the information of the older versions to meta store. But from client side, the 
read and write are hidden from versioning i.e. write still always rewrites the 
whole key, and read still always reads only the most recently committed version 
of the key. Will follow up with another JIRA for more read/write change. 

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12877) Add open(PathHandle) with default buffersize

2017-11-30 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273190#comment-16273190
 ] 

Chen Liang commented on HDFS-12877:
---

The changes looks to me, but the failed tests seem related. Could you please 
take a look?

> Add open(PathHandle) with default buffersize
> 
>
> Key: HDFS-12877
> URL: https://issues.apache.org/jira/browse/HDFS-12877
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Trivial
> Attachments: HDFS-12877.00.patch, HDFS-12877.01.patch
>
>
> HDFS-7878 added an overload for {{FileSystem::open}} that requires the user 
> to provide a buffer size when opening by {{PathHandle}}. Similar to 
> {{open(Path)}}, it'd be convenient to have another overload that takes the 
> default from the config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12838) Ozone: Optimize number of allocated block rpc by aggregating multiple block allocation requests

2017-11-30 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273354#comment-16273354
 ] 

Chen Liang commented on HDFS-12838:
---

Thanks for working on this [~msingh]! I think this is a very good improvement. 
Looks pretty good to me overall. Just one comment, seems to me when 
{{KeyManagerImpl#openKey}} passes in a {{requestedSize}} of 0, then it ends up 
making an unnecessary call that does nothing. Maybe we should check and skip 
this case, also for better clearance of code.

> Ozone: Optimize number of allocated block rpc by aggregating multiple block 
> allocation requests
> ---
>
> Key: HDFS-12838
> URL: https://issues.apache.org/jira/browse/HDFS-12838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12838-HDFS-7240.001.patch, 
> HDFS-12838-HDFS-7240.002.patch, HDFS-12838-HDFS-7240.003.patch
>
>
> Currently KeySpaceManager allocates multiple blocks by sending multiple block 
> allocation requests over the RPC. This can be optimized to aggregate multiple 
> block allocation request over one rpc.
> {code}
>   while (requestedSize > 0) {
> long allocateSize = Math.min(scmBlockSize, requestedSize);
> AllocatedBlock allocatedBlock =
> scmBlockClient.allocateBlock(allocateSize, type, factor);
> KsmKeyLocationInfo subKeyInfo = new KsmKeyLocationInfo.Builder()
> .setContainerName(allocatedBlock.getPipeline().getContainerName())
> .setBlockID(allocatedBlock.getKey())
> .setShouldCreateContainer(allocatedBlock.getCreateContainer())
> .setIndex(idx++)
> .setLength(allocateSize)
> .setOffset(0)
> .build();
> locations.add(subKeyInfo);
> requestedSize -= allocateSize;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-11-30 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Status: Patch Available  (was: In Progress)

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12879) Ozone : add scm init command to document.

2017-11-30 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12879:
-

 Summary: Ozone : add scm init command to document.
 Key: HDFS-12879
 URL: https://issues.apache.org/jira/browse/HDFS-12879
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ozone
Reporter: Chen Liang
Priority: Minor


When an Ozone cluster is initialized, before starting SCM through {{hdfs 
--daemon start scm}}, the command {{hdfs scm -init}} needs to be called first. 
But seems this command is not being documented. We should add this note to 
document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12879) Ozone : add scm init command to document.

2017-11-30 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12879:
--
Labels: newbie  (was: )

> Ozone : add scm init command to document.
> -
>
> Key: HDFS-12879
> URL: https://issues.apache.org/jira/browse/HDFS-12879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: newbie
>
> When an Ozone cluster is initialized, before starting SCM through {{hdfs 
> --daemon start scm}}, the command {{hdfs scm -init}} needs to be called 
> first. But seems this command is not being documented. We should add this 
> note to document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12879) Ozone : add scm init command to document.

2017-11-30 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12879:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-7240

> Ozone : add scm init command to document.
> -
>
> Key: HDFS-12879
> URL: https://issues.apache.org/jira/browse/HDFS-12879
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: newbie
>
> When an Ozone cluster is initialized, before starting SCM through {{hdfs 
> --daemon start scm}}, the command {{hdfs scm -init}} needs to be called 
> first. But seems this command is not being documented. We should add this 
> note to document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12879) Ozone : add scm init command to document.

2017-11-30 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273609#comment-16273609
 ] 

Chen Liang commented on HDFS-12879:
---

There is another typo in {{OzoneCommandShell.md}}, which I think can be just 
part of this fix : there is {{-listtBucket}} which should be {{-listBucket}} 
instead.

> Ozone : add scm init command to document.
> -
>
> Key: HDFS-12879
> URL: https://issues.apache.org/jira/browse/HDFS-12879
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: newbie
>
> When an Ozone cluster is initialized, before starting SCM through {{hdfs 
> --daemon start scm}}, the command {{hdfs scm -init}} needs to be called 
> first. But seems this command is not being documented. We should add this 
> note to document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12877) Add open(PathHandle) with default buffersize

2017-11-30 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273618#comment-16273618
 ] 

Chen Liang commented on HDFS-12877:
---

Thanks [~chris.douglas] for the quick update! Makes sense to me. +1 on v02 
patch.

> Add open(PathHandle) with default buffersize
> 
>
> Key: HDFS-12877
> URL: https://issues.apache.org/jira/browse/HDFS-12877
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Trivial
> Attachments: HDFS-12877.00.patch, HDFS-12877.01.patch, 
> HDFS-12877.02.patch
>
>
> HDFS-7878 added an overload for {{FileSystem::open}} that requires the user 
> to provide a buffer size when opening by {{PathHandle}}. Similar to 
> {{open(Path)}}, it'd be convenient to have another overload that takes the 
> default from the config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-12-01 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274836#comment-16274836
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks [~elek] for the update! Could you please fix the checkstyle warnings? +1 
after it's fixed.

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch, HDFS-12799-HDFS-7240.003.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-01 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.008.patch

Some of the failed tests are related. Post v008 patch to fix.

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12745) Ozone: XceiverClientManager should cache objects based on pipeline name

2017-12-05 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279198#comment-16279198
 ] 

Chen Liang commented on HDFS-12745:
---

1. this is more like a question...PipelineManager will this be accessed by 
multiple threads? if so, do we need protection on activePipelines?

2. I wonder is there a situation where pipeline should be removed from the list 
activePipelines? Also, would it be better to make it private rather than 
protected? 

3. looks like activePipelines may contain pipelines with different types and 
factors? looks to me that there might be two corner cases in findOpenPipeline() 
due to this (please correct if I'm wrong). 
case a. For example, say we have three pipelines, (only looking at factors here)
\[A(factor=1), B(1), C(3)\]
(1). When current index on A, and we look for a factor=1 pipeline, we return A, 
next current index will be B
(2). now we look for a factor=3 pipeline, we skip B and move to C, and return 
C, next current index will be A
(3). Then again we look for a factor=1 pipeline, A has factor=1, we return A. 
next current index will be B.
Now we have returned A twice but never B. If we further repeat 2,3, we will 
have all factor=1 container requests going to A, never B.
case b. If we have, say, 100 pipelines, but only 1 with factor=1, and if all 
requests are for factor=1, it seems every time we may have to skip the other 99 
factor=3 pipelines only to get the only one that satisfies.

An alternative way might be to maintain different lists for different (factor, 
type) combination. This could be done by having a map, a map such as 
key=factor, value is another whose key=type, value is a list of pipeline.

> Ozone: XceiverClientManager should cache objects based on pipeline name
> ---
>
> Key: HDFS-12745
> URL: https://issues.apache.org/jira/browse/HDFS-12745
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12745-HDFS-7240.001.patch, 
> HDFS-12745-HDFS-7240.002.patch, HDFS-12745-HDFS-7240.003.patch, 
> HDFS-12745-HDFS-7240.004.patch, HDFS-12745-HDFS-7240.005.patch, 
> HDFS-12745-HDFS-7240.006.patch, HDFS-12745-HDFS-7240.007.patch
>
>
> With just the standalone pipeline, a new pipeline was created for each and 
> every container.
> This code can be optimized so that pipelines are craeted less frequently. 
> Caching using pipeline names will help with Ratis clients as well.
> a) Remove Container name from Pipeline object.
> b) XceiverClientManager should cache objects based on pipeline name
> c) XceiverClient and XceiverServer should be renamed to 
> XceiverClientStandAlone & XceiverServerRatis
> d) StandAlone pipeline should have notion of re-using pipeline objects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12745) Ozone: XceiverClientManager should cache objects based on pipeline name

2017-12-05 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279198#comment-16279198
 ] 

Chen Liang edited comment on HDFS-12745 at 12/5/17 9:19 PM:


Thanks [~msingh] for working on this! I think this is a very important 
improvement. I have some comments below, all about {{PipelineManager}}

1. this is more like a question...PipelineManager will this be accessed by 
multiple threads? if so, do we need protection on activePipelines?

2. I wonder is there a situation where pipeline should be removed from the list 
activePipelines? Also, would it be better to make it private rather than 
protected? 

3. looks like activePipelines may contain pipelines with different types and 
factors? looks to me that there might be two corner cases in findOpenPipeline() 
due to this (please correct if I'm wrong). 
case a. For example, say we have three pipelines, (only looking at factors here)
\[A(factor=1), B(1), C(3)\]
(1). When current index on A, and we look for a factor=1 pipeline, we return A, 
next current index will be B
(2). now we look for a factor=3 pipeline, we skip B and move to C, and return 
C, next current index will be A
(3). Then again we look for a factor=1 pipeline, A has factor=1, we return A. 
next current index will be B.
Now we have returned A twice but never B. If we further repeat 2,3, we will 
have all factor=1 container requests going to A, never B.
case b. If we have, say, 100 pipelines, but only 1 with factor=1, and if all 
requests are for factor=1, it seems every time we may have to skip the other 99 
factor=3 pipelines only to get the only one that satisfies.

An alternative way might be to maintain different lists for different (factor, 
type) combination. This could be done by having a map, a map such as 
key=factor, value is another whose key=type, value is a list of pipeline.


was (Author: vagarychen):
1. this is more like a question...PipelineManager will this be accessed by 
multiple threads? if so, do we need protection on activePipelines?

2. I wonder is there a situation where pipeline should be removed from the list 
activePipelines? Also, would it be better to make it private rather than 
protected? 

3. looks like activePipelines may contain pipelines with different types and 
factors? looks to me that there might be two corner cases in findOpenPipeline() 
due to this (please correct if I'm wrong). 
case a. For example, say we have three pipelines, (only looking at factors here)
\[A(factor=1), B(1), C(3)\]
(1). When current index on A, and we look for a factor=1 pipeline, we return A, 
next current index will be B
(2). now we look for a factor=3 pipeline, we skip B and move to C, and return 
C, next current index will be A
(3). Then again we look for a factor=1 pipeline, A has factor=1, we return A. 
next current index will be B.
Now we have returned A twice but never B. If we further repeat 2,3, we will 
have all factor=1 container requests going to A, never B.
case b. If we have, say, 100 pipelines, but only 1 with factor=1, and if all 
requests are for factor=1, it seems every time we may have to skip the other 99 
factor=3 pipelines only to get the only one that satisfies.

An alternative way might be to maintain different lists for different (factor, 
type) combination. This could be done by having a map, a map such as 
key=factor, value is another whose key=type, value is a list of pipeline.

> Ozone: XceiverClientManager should cache objects based on pipeline name
> ---
>
> Key: HDFS-12745
> URL: https://issues.apache.org/jira/browse/HDFS-12745
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12745-HDFS-7240.001.patch, 
> HDFS-12745-HDFS-7240.002.patch, HDFS-12745-HDFS-7240.003.patch, 
> HDFS-12745-HDFS-7240.004.patch, HDFS-12745-HDFS-7240.005.patch, 
> HDFS-12745-HDFS-7240.006.patch, HDFS-12745-HDFS-7240.007.patch
>
>
> With just the standalone pipeline, a new pipeline was created for each and 
> every container.
> This code can be optimized so that pipelines are craeted less frequently. 
> Caching using pipeline names will help with Ratis clients as well.
> a) Remove Container name from Pipeline object.
> b) XceiverClientManager should cache objects based on pipeline name
> c) XceiverClient and XceiverServer should be renamed to 
> XceiverClientStandAlone & XceiverServerRatis
> d) StandAlone pipeline should have notion of re-using pipeline objects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdf

[jira] [Assigned] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-05 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-12751:
-

Assignee: Chen Liang

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-05 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279360#comment-16279360
 ] 

Chen Liang edited comment on HDFS-12751 at 12/5/17 11:22 PM:
-

I looked into the code a little bit. It appears to me that 
{{containerStateManager}} is a member of {{ContainerMapping}}. When 
{{ContainerMapping#updateContainerState}} gets called, it is always that 
{{ContainerMapper}} itself will write the updated status to metadata db 
(container.db in this case), as in following lines
{code}
ContainerInfo updatedContainer = containerStateManager
  .updateContainerState(containerInfo, event);
containerStore.put(dbKey, updatedContainer.getProtobuf().toByteArray());
{code}
Looks like this is the only place {{allocatedSize}} gets updated, so I think 
this means the allocated size is already being updated in container db, 
whenever an update happens. So we don't need to update on close(). What do you 
think [~nandakumar131]? Am I missing anything? 

I also noticed that when {{StorageContainerManager}} shuts down, it will call 
close on {{ContainerMapping}}. But {{ContainerMapper#close}} does not call 
{{containerStateManager.close()}} at all. This seems fine to me though, because 
{{ContainerMapping}} is doing the write to meta store by itself, so seems okay 
that {{ContainerStateManager#close()}} does not do anything with current 
implementation.


was (Author: vagarychen):
I looked into the code a little bit. It appears to me that 
{{containerStateManager}} is a member of {{ContainerMapping}}. When 
{{ContainerMapping#updateContainerState}} gets called, it is always that 
{{ContainerMapper}} itself will write the updated status to metadata db 
(container.db in this case), as in following lines
{code}
ContainerInfo updatedContainer = containerStateManager
  .updateContainerState(containerInfo, event);
containerStore.put(dbKey, updatedContainer.getProtobuf().toByteArray());
{code}
Looks like this is the only place {{allocatedSize}} gets updated, so I think 
this means the allocated size is already being updated in container db, 
whenever an update happens. So we don't need to update on close(). What do you 
think [~nandakumar131]? Am I missing anything? 

I also noticed that when {{StorageContainerManager}} shuts down, it will call 
close on {{ContainerMapping}}. But {{ContainerMapper#close}} does not call 
{{containerStateManager.close()}} at all. This seems fine to me though, because 
{{ContainerMapping}} is doing the write to meta store by itself, so seems okay 
that {{ContainerStateManager#close()}} does not do anything.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-05 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279360#comment-16279360
 ] 

Chen Liang commented on HDFS-12751:
---

I looked into the code a little bit. It appears to me that 
{{containerStateManager}} is a member of {{ContainerMapping}}. When 
{{ContainerMapping#updateContainerState}} gets called, it is always that 
{{ContainerMapper}} itself will write the updated status to metadata db 
(container.db in this case), as in following lines
{code}
ContainerInfo updatedContainer = containerStateManager
  .updateContainerState(containerInfo, event);
containerStore.put(dbKey, updatedContainer.getProtobuf().toByteArray());
{code}
Looks like this is the only place {{allocatedSize}} gets updated, so I think 
this means the allocated size is already being updated in container db, 
whenever an update happens. So we don't need to update on close(). What do you 
think [~nandakumar131]? Am I missing anything? 

I also noticed that when {{StorageContainerManager}} shuts down, it will call 
close on {{ContainerMapping}}. But {{ContainerMapper#close}} does not call 
{{containerStateManager.close()}} at all. This seems fine to me though, because 
{{ContainerMapping}} is doing the write to meta store by itself, so seems okay 
that {{ContainerStateManager#close()}} does not do anything.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12890) Ozone: XceiverClient should have upper bound on async requests

2017-12-05 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12890:
--
Summary: Ozone: XceiverClient should have upper bound on async requests  
(was: XceiverClient should have upper bound on async requests)

> Ozone: XceiverClient should have upper bound on async requests
> --
>
> Key: HDFS-12890
> URL: https://issues.apache.org/jira/browse/HDFS-12890
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12890-HDFS-7240.001.patch, 
> HDFS-12890-HDFS-7240.002.patch
>
>
> XceiverClient-ratis maintains upper bound on the no of outstanding async 
> requests . XceiverClient
> should also impose an upper bound on the no of outstanding async requests 
> received from client
> for write.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11923) Stress test of DFSNetworkTopology

2017-12-06 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-11923:
--
Attachment: HDFS-11923.003.patch

Instead of running the test indefinitely, v003 patch changes to running a 
certain of operations. Thanks [~ajayydv] for the suggestion.

> Stress test of DFSNetworkTopology
> -
>
> Key: HDFS-11923
> URL: https://issues.apache.org/jira/browse/HDFS-11923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11923.001.patch, HDFS-11923.002.patch, 
> HDFS-11923.003.patch
>
>
> I wrote a stress test with {{DFSNetworkTopology}} to verify its correctness 
> under huge number of datanode changes e.g., data node insert/delete, storage 
> addition/removal etc. The goal is to show that the topology maintains the 
> correct counters all time. The test is written that, unless manually 
> terminated, it will keep randomly performing the operations nonstop. (and 
> because of this, the test is ignored in the patch).
> My local test lasted 40 min before I stopped it, it involved more than one 
> million datanode changes, and no error happened. We believe this should be 
> sufficient to show the correctness of {{DFSNetworkTopology}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-06 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-12751 started by Chen Liang.
-
> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12899) Ozone: SCM: BlockManagerImpl close is called twice during StorageContainerManager#stop

2017-12-06 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280836#comment-16280836
 ] 

Chen Liang commented on HDFS-12899:
---

Thanks [~nandakumar131] for the catch! the patch LGTM, pending jenkins.

> Ozone: SCM: BlockManagerImpl close is called twice during 
> StorageContainerManager#stop
> --
>
> Key: HDFS-12899
> URL: https://issues.apache.org/jira/browse/HDFS-12899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12899-HDFS-7240.000.patch
>
>
> As part of {{StorageContainerManager#stop}}, we are calling 
> {{scmBlockManager#stop}} which will internally do {{BlockManagerImpl#close}} 
> and again explicitly we are calling {{scmBlockManager#close}} using 
> {{IOUtils.cleanupWithLogger(LOG, scmBlockManager)}}. This causes 
> {{RocksDBStore#close}} to be called twice which inturn does 
> {{MBeans#unregister}} twice, resulting in the following exception trace 
> (WARN) during the second call
> {noformat}
> 2017-12-06 22:30:06,316 [main] WARN  util.MBeans 
> (MBeans.java:unregister(137)) - Error unregistering 
> Hadoop:service=Ozone,name=RocksDbStore,dbName=block.db
> javax.management.InstanceNotFoundException: 
> Hadoop:service=Ozone,name=RocksDbStore,dbName=block.db
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:135)
>   at org.apache.hadoop.utils.RocksDBStore.close(RocksDBStore.java:368)
>   at 
> org.apache.hadoop.ozone.scm.block.BlockManagerImpl.close(BlockManagerImpl.java:506)
>   at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:278)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.stop(StorageContainerManager.java:900)
> stacktrace truncated--
> 2017-12-06 22:30:06,317 [main] WARN  util.MBeans 
> (MBeans.java:unregister(137)) - Error unregistering 
> Hadoop:service=Ozone,name=RocksDbStore,dbName=deletedBlock.db
> javax.management.InstanceNotFoundException: 
> Hadoop:service=Ozone,name=RocksDbStore,dbName=deletedBlock.db
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:135)
>   at org.apache.hadoop.utils.RocksDBStore.close(RocksDBStore.java:368)
>   at 
> org.apache.hadoop.ozone.scm.block.DeletedBlockLogImpl.close(DeletedBlockLogImpl.java:326)
>   at 
> org.apache.hadoop.ozone.scm.block.BlockManagerImpl.close(BlockManagerImpl.java:509)
>   at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:278)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.stop(StorageContainerManager.java:900)
> stacktrace truncated--
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-07 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282358#comment-16282358
 ] 

Chen Liang commented on HDFS-12751:
---

Thanks [~nandakumar131] for the followup!

bq. We do not update allocatedBytes here

But looks to me it does update allocated bytes, as in the follow code from 
{{ContainerStateManager#updateContainerState}}. The {{info}} variable is the 
new container info passed in from {{updateContainerState}}.
{code}
  ContainerInfo containerInfo = new ContainerInfo.Builder()
  .setContainerName(info.getContainerName())
  .setState(newState)
  .setPipeline(info.getPipeline())
  .setAllocatedBytes(info.getAllocatedBytes())
  .setUsedBytes(info.getUsedBytes())
  .setNumberOfKeys(info.getNumberOfKeys())
  .setStateEnterTime(Time.monotonicNow())
  .setOwner(info.getOwner())
  .build();
{code}

bq. Not exactly. Whenever we allocate block...

I did miss this part earlier, thanks for point out! But appears to me that this 
is what BlockManager perceives as the bytes that might get allocated, not 
necessarily the actual allocated bytes on the container. e.g. client may then 
terminate before talking to container etc. While the allocatedBytes we got from 
block report seems to be the more accurate number, precisely the number of 
bytes the container sees when sending the report. And block report is the 
trigger of the {{updateContainerState}} code path. Namely, I feel that 
persisting the number we got from updateContainerState is already the better 
thing.

Additionally, I'm under the impression that {{ContainerMapping}} is the class 
that interacts the container store, while {{ContainerStateManager}} is purely 
an in-memory state representation that (currently) does not read/write to 
container meta store at all. Seems to me that we should keep this abstraction, 
leave {{ContainerStateManager}} away from container.db and only let 
{{ContainerMapping}} do the container metadata management.

So although we are indeed missing the {{allocatedSize}} from allocateBlock code 
path, I would prefer to leave this part as-is. What do you think?

Nonetheless, containerStateManager.close() not being does look like a bug, will 
upload a patch later.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-08 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.009.patch

Thanks [~xyao] for the review and the comments! Update v009 patch.

bq. I think it will work when we only have one version, i.e., latest version. 
Correct me if I'm wrong, say we have K1 (B1V1, B2V2), with 
getBlocksLatestVersionOnly, are we going to get B2V2 instead of (B1V1, B2V2)?

Exactly right, for now, getBlocksLatestVersionOnly() will only return the 
blocks from the most recent version, B2V2 only in your case. In the next few 
steps, my plan for multiple versions is to augment APIs that specifies which 
historical version to read. For example, an API that specifies read(K1, 
version=1), then it will ignore B2V2 but only look at B1V1.

bq. Line 173: NIT: more comment does not valid any more

Fixed

Additionally, I have one major right now, that I'm looking for advices, is that 
I believe the dominating majority of read will be reading only the most recent 
version, in this case always iterating all the blocks including old version can 
become inefficient. Any comments on this are appreciated.

bq. Line 268: should this openVersion as part of the request so that the client 
can request open certain version? It is ok to assume open the latest version 
for now. Maybe add a TODO for next JIRA on this feature.

Yes, like mentioned above, in follow JIRA, there will be API that allows 
request specific version, old or recent. It will likely be using this field. 
Added a TODO note to the comments and follow up in next JIRAs.

bq. Line 111: Is there a reason why the KsyKeyLocationInfo#Builder does not 
support setCreateVersion? Do we expect it to be set directly on the 
KsyKeyLocationInfo afterwards?

I found that when a block is allocated, it gets allocated first then version is 
set based on whether it's appending to current version, or added as a new 
version. I think conceptually, a block itself does not have the notion of 
version. So, yes, I leave the block builder not setting version but the caller 
should set it after creating the block.

bq. Line 22: NIT: unused imports

Fixed

bq. Line 58: if the version starts from 0, the special handling for 
currentVersion==-1 is not needed. Can you confirm?

You are right. Thanks for the catch. Fixed

bq. Line 30: can the open version be committed without close, something like 
hsync to populate the write without closing file.

We don't need the open version, as it is only used when opening a key to 
disambiguate preallocated blocks. 

More specifically, when a key is opened, depending on whether size is 
specified, KeyManager *may or may not* have pre-allocated some blocks and 
returned in open session. If pre-allocation does happen, the returned latest 
version is the version to write. But if pre-allocation did not happen, the 
returned latest version is actually an old, already committed version that 
should not be written. The only purpose of this open version field is to 
distinguish these two cases. This value gets checked one time on client when 
loading pre-allocated blocks, then never used. So I think we don't need to 
commit it.


> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> HDFS-12000-HDFS-7240.009.patch, OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12626) Ozone : delete open key entries that will no longer be closed

2017-12-11 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12626:
--
Attachment: HDFS-12626-HDFS-7240.005.patch

Thanks [~xyao] for the review and the comments! All fixed in v005 patch except:

bq. Line 201: Do we update the keyInfo with the modification time when the 
block is written to the container as well?

If I understand this question correctly. Then I believe writing a block to 
container is a process between client and the containers, KSM is not involved 
in this process so it does not know when a particular block write this done.

> Ozone : delete open key entries that will no longer be closed
> -
>
> Key: HDFS-12626
> URL: https://issues.apache.org/jira/browse/HDFS-12626
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12626-HDFS-7240.001.patch, 
> HDFS-12626-HDFS-7240.002.patch, HDFS-12626-HDFS-7240.003.patch, 
> HDFS-12626-HDFS-7240.004.patch, HDFS-12626-HDFS-7240.005.patch
>
>
> HDFS-12543 introduced the notion of "open key" where when a key is opened, an 
> open key entry gets persisted, only after client calls a close will this 
> entry be made visible. One issue is that if the client does not call close 
> (e.g. failed), then that open key entry will never be deleted from meta data. 
> This JIRA tracks this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-11 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.010.patch

Seems Jenkins railed on v009 patch due to some issue unrelated to the patch, 
resubmit v009 patch as v010 to trigger another build.

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> HDFS-12000-HDFS-7240.009.patch, HDFS-12000-HDFS-7240.010.patch, 
> OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12626) Ozone : delete open key entries that will no longer be closed

2017-12-12 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12626:
--
Attachment: HDFS-12626-HDFS-7240.006.patch

Thanks [~xyao] for the review! Post v006 patch to fix the test.

> Ozone : delete open key entries that will no longer be closed
> -
>
> Key: HDFS-12626
> URL: https://issues.apache.org/jira/browse/HDFS-12626
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12626-HDFS-7240.001.patch, 
> HDFS-12626-HDFS-7240.002.patch, HDFS-12626-HDFS-7240.003.patch, 
> HDFS-12626-HDFS-7240.004.patch, HDFS-12626-HDFS-7240.005.patch, 
> HDFS-12626-HDFS-7240.006.patch
>
>
> HDFS-12543 introduced the notion of "open key" where when a key is opened, an 
> open key entry gets persisted, only after client calls a close will this 
> entry be made visible. One issue is that if the client does not call close 
> (e.g. failed), then that open key entry will never be deleted from meta data. 
> This JIRA tracks this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-12 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.011.patch

Fix the checkstyle warning. The javadoc and the asf license issues seem 
unrelated, the failed tests also seem unrelated. All passed locally except for 
the two consistently failing tests {{TestUnbuffer}} and {{TestBalancerRPCDelay}}

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> HDFS-12000-HDFS-7240.009.patch, HDFS-12000-HDFS-7240.010.patch, 
> HDFS-12000-HDFS-7240.011.patch, OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-12 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288193#comment-16288193
 ] 

Chen Liang commented on HDFS-12751:
---

Thanks [~nandakumar131] for the clarification! I totally missed {{usedBytes}}. 
Then this makes sense. I will upload a patch soon.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-12 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12751:
--
Attachment: HDFS-12751-HDFS-7240.001.patch

Post v001 patch.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
> Attachments: HDFS-12751-HDFS-7240.001.patch
>
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-12 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12751:
--
Status: Patch Available  (was: In Progress)

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
> Attachments: HDFS-12751-HDFS-7240.001.patch
>
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-12 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288500#comment-16288500
 ] 

Chen Liang commented on HDFS-12000:
---

Thansk [~xyao] for checking the tests! I ran locally the tests you mentioned 
and the failed tests in the latest Jenkins run. All tests passed except for 
{{TestOzoneRpcClient.testPutKeyRatisThreeNodes}} which fails even without the 
patch.

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> HDFS-12000-HDFS-7240.009.patch, HDFS-12000-HDFS-7240.010.patch, 
> HDFS-12000-HDFS-7240.011.patch, OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12751:
--
Attachment: HDFS-12751-HDFS-7240.002.patch

Fix checkstyle, the failed tests are unrelated, and passed in local run

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
> Attachments: HDFS-12751-HDFS-7240.001.patch, 
> HDFS-12751-HDFS-7240.002.patch
>
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12265) Ozone : better handling of operation fail due to chill mode

2017-12-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12265:
--
Release Note:   (was: Looks like this has been handled as part of 
HDFS-12387, close this JIRA.)

> Ozone : better handling of operation fail due to chill mode
> ---
>
> Key: HDFS-12265
> URL: https://issues.apache.org/jira/browse/HDFS-12265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: OzonePostMerge
>
> Currently if someone tries to create a container while SCM is in chill mode, 
> there will be exception of INTERNAL_ERROR, which is not very informative and 
> can be confusing for debugging.
> We should make it easier to identify problems caused by chill mode. For 
> example, we may detect if SCM is in chill mode and report back to client in 
> some way, such that the client can backup and try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12265) Ozone : better handling of operation fail due to chill mode

2017-12-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-12265.
---
  Resolution: Fixed
Release Note: Looks like this has been handled as part of HDFS-12387, close 
this JIRA.

> Ozone : better handling of operation fail due to chill mode
> ---
>
> Key: HDFS-12265
> URL: https://issues.apache.org/jira/browse/HDFS-12265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: OzonePostMerge
>
> Currently if someone tries to create a container while SCM is in chill mode, 
> there will be exception of INTERNAL_ERROR, which is not very informative and 
> can be confusing for debugging.
> We should make it easier to identify problems caused by chill mode. For 
> example, we may detect if SCM is in chill mode and report back to client in 
> some way, such that the client can backup and try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12265) Ozone : better handling of operation fail due to chill mode

2017-12-13 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289691#comment-16289691
 ] 

Chen Liang commented on HDFS-12265:
---

Looks like this has been handled as part of HDFS-12387, close this JIRA.

> Ozone : better handling of operation fail due to chill mode
> ---
>
> Key: HDFS-12265
> URL: https://issues.apache.org/jira/browse/HDFS-12265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: OzonePostMerge
>
> Currently if someone tries to create a container while SCM is in chill mode, 
> there will be exception of INTERNAL_ERROR, which is not very informative and 
> can be confusing for debugging.
> We should make it easier to identify problems caused by chill mode. For 
> example, we may detect if SCM is in chill mode and report back to client in 
> some way, such that the client can backup and try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12000) Ozone: Container : Add key versioning support-1

2017-12-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12000:
--
Attachment: HDFS-12000-HDFS-7240.012.patch

v012 patch to rebase.

> Ozone: Container : Add key versioning support-1
> ---
>
> Key: HDFS-12000
> URL: https://issues.apache.org/jira/browse/HDFS-12000
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Chen Liang
>  Labels: OzonePostMerge
> Attachments: HDFS-12000-HDFS-7240.001.patch, 
> HDFS-12000-HDFS-7240.002.patch, HDFS-12000-HDFS-7240.003.patch, 
> HDFS-12000-HDFS-7240.004.patch, HDFS-12000-HDFS-7240.005.patch, 
> HDFS-12000-HDFS-7240.007.patch, HDFS-12000-HDFS-7240.008.patch, 
> HDFS-12000-HDFS-7240.009.patch, HDFS-12000-HDFS-7240.010.patch, 
> HDFS-12000-HDFS-7240.011.patch, HDFS-12000-HDFS-7240.012.patch, 
> OzoneVersion.001.pdf
>
>
> The rest interface of ozone supports versioning of keys. This support comes 
> from the containers and how chunks are managed to support this feature. This 
> JIRA tracks that feature. Will post a detailed design doc so that we can talk 
> about this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12881:
--
Fix Version/s: (was: 3.0.1)

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0
>
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch, HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12881:
--
Fix Version/s: 3.0.1

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch, HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12925:
-

 Summary: Ozone: Container : Add key versioning support-2
 Key: HDFS-12925
 URL: https://issues.apache.org/jira/browse/HDFS-12925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Chen Liang
Assignee: Chen Liang


One component for versioning is assembling read IO vector, (please see 4.2 
section of the versioning design doc HDFS-12000 for the detail). This JIRA adds 
the util functions that takes a list with blocks from different versions and 
properly generate the read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Attachment: HDFS-12925-HDFS-7240.001.patch

Post v001 patch.

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the versioning design doc HDFS-12000 for the detail). This JIRA 
> adds the util functions that takes a list with blocks from different versions 
> and properly generate the read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Status: Patch Available  (was: Open)

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the versioning design doc HDFS-12000 for the detail). This JIRA 
> adds the util functions that takes a list with blocks from different versions 
> and properly generate the read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Attachment: HDFS-12925-HDFS-7240.002.patch

The output link from Jenkins seem dead, not sure why the build failed, my build 
worked fine, resubmitting as v002 patch.

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch, 
> HDFS-12925-HDFS-7240.002.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the versioning design doc HDFS-12000 for the detail). This JIRA 
> adds the util functions that takes a list with blocks from different versions 
> and properly generate the read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12917) Fix description errors in testErasureCodingConf.xml

2017-12-14 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291742#comment-16291742
 ] 

Chen Liang commented on HDFS-12917:
---

Thanks [~candychencan] for the catch, just one NIT, "which have an..." to 
"which has an..."?

> Fix description errors in testErasureCodingConf.xml
> ---
>
> Key: HDFS-12917
> URL: https://issues.apache.org/jira/browse/HDFS-12917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
> Attachments: HADOOP-12917.patch
>
>
> In testErasureCodingConf.xml,there are two case's description may be 
> "getPolicy : get EC policy information at specified path, whick have an EC 
> Policy".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Description: One component for versioning is assembling read IO vector, 
(please see 4.2 section of the [versioning design 
doc|https://issues.apache.org/jira/secure/attachment/12877154/OzoneVersion.001.pdf]
 under HDFS-12000 for the detail). This JIRA adds the util functions that takes 
a list with blocks from different versions and properly generate the read 
vector for the requested version.  (was: One component for versioning is 
assembling read IO vector, (please see 4.2 section of the versioning design doc 
HDFS-12000 for the detail). This JIRA adds the util functions that takes a list 
with blocks from different versions and properly generate the read vector for 
the requested version.)

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch, 
> HDFS-12925-HDFS-7240.002.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the [versioning design 
> doc|https://issues.apache.org/jira/secure/attachment/12877154/OzoneVersion.001.pdf]
>  under HDFS-12000 for the detail). This JIRA adds the util functions that 
> takes a list with blocks from different versions and properly generate the 
> read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-12-15 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293035#comment-16293035
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks [~elek] for the update! Looks like there were compilation failures, 
might need to be rebased, would you mind taking a look? Thanks

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch, HDFS-12799-HDFS-7240.003.patch, 
> HDFS-12799-HDFS-7240.004.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-15 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Attachment: HDFS-12925-HDFS-7240.003.patch

v003 patch fixes the checkstyle and javadoc issue, findbug warnings are not 
introduced by this patch. The failed tests all passed locally, except for the 
consistently failing test {{TestOzoneRpcClient}}

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch, 
> HDFS-12925-HDFS-7240.002.patch, HDFS-12925-HDFS-7240.003.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the [versioning design 
> doc|https://issues.apache.org/jira/secure/attachment/12877154/OzoneVersion.001.pdf]
>  under HDFS-12000 for the detail). This JIRA adds the util functions that 
> takes a list with blocks from different versions and properly generate the 
> read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-15 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293273#comment-16293273
 ] 

Chen Liang edited comment on HDFS-12925 at 12/15/17 9:31 PM:
-

v003 patch fixes the checkstyle and javadoc issue, findbug warnings are not 
introduced by this patch. The failed tests all passed locally, except for 
{{TestOzoneRpcClient}} which fails consistently even without the patch


was (Author: vagarychen):
v003 patch fixes the checkstyle and javadoc issue, findbug warnings are not 
introduced by this patch. The failed tests all passed locally, except for the 
consistently failing test {{TestOzoneRpcClient}}

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch, 
> HDFS-12925-HDFS-7240.002.patch, HDFS-12925-HDFS-7240.003.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the [versioning design 
> doc|https://issues.apache.org/jira/secure/attachment/12877154/OzoneVersion.001.pdf]
>  under HDFS-12000 for the detail). This JIRA adds the util functions that 
> takes a list with blocks from different versions and properly generate the 
> read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12917) Fix description errors in testErasureCodingConf.xml

2017-12-15 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-12917:
-

Assignee: chencan

> Fix description errors in testErasureCodingConf.xml
> ---
>
> Key: HDFS-12917
> URL: https://issues.apache.org/jira/browse/HDFS-12917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>Assignee: chencan
> Attachments: HADOOP-12917.002.patch, HADOOP-12917.patch
>
>
> In testErasureCodingConf.xml,there are two case's description may be 
> "getPolicy : get EC policy information at specified path, whick have an EC 
> Policy".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12917) Fix description errors in testErasureCodingConf.xml

2017-12-15 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293301#comment-16293301
 ] 

Chen Liang commented on HDFS-12917:
---

Thanks [~candychencan] for the updated patch! +1 on v002 patch, I've committed 
to trunk, (and I've changed the assignee of this JIRA to you). Thanks for your 
contribution!

> Fix description errors in testErasureCodingConf.xml
> ---
>
> Key: HDFS-12917
> URL: https://issues.apache.org/jira/browse/HDFS-12917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>Assignee: chencan
> Attachments: HADOOP-12917.002.patch, HADOOP-12917.patch
>
>
> In testErasureCodingConf.xml,there are two case's description may be 
> "getPolicy : get EC policy information at specified path, whick have an EC 
> Policy".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12917) Fix description errors in testErasureCodingConf.xml

2017-12-15 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12917:
--
  Resolution: Fixed
Target Version/s: 3.1.0
  Status: Resolved  (was: Patch Available)

> Fix description errors in testErasureCodingConf.xml
> ---
>
> Key: HDFS-12917
> URL: https://issues.apache.org/jira/browse/HDFS-12917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>Assignee: chencan
> Attachments: HADOOP-12917.002.patch, HADOOP-12917.patch
>
>
> In testErasureCodingConf.xml,there are two case's description may be 
> "getPolicy : get EC policy information at specified path, whick have an EC 
> Policy".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-15 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12925:
--
Attachment: HDFS-12925-HDFS-7240.004.patch

Latest Jenkins build failed again, resubmit v003 patch as v004 patch to trigger 
another run.

> Ozone: Container : Add key versioning support-2
> ---
>
> Key: HDFS-12925
> URL: https://issues.apache.org/jira/browse/HDFS-12925
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12925-HDFS-7240.001.patch, 
> HDFS-12925-HDFS-7240.002.patch, HDFS-12925-HDFS-7240.003.patch, 
> HDFS-12925-HDFS-7240.004.patch
>
>
> One component for versioning is assembling read IO vector, (please see 4.2 
> section of the [versioning design 
> doc|https://issues.apache.org/jira/secure/attachment/12877154/OzoneVersion.001.pdf]
>  under HDFS-12000 for the detail). This JIRA adds the util functions that 
> takes a list with blocks from different versions and properly generate the 
> read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12932) Confusing LOG message for block replication

2017-12-18 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295772#comment-16295772
 ] 

Chen Liang commented on HDFS-12932:
---

Thanks [~csun] for the catch! I think maybe it's better to add a third branch 
to catch the {{==}} case. Just log a message saying it remains unchanged at 
that value. Because the current code always out a message for all three cases 
of {{=}} {{<}} and {{>}}. I think it's probably better we don't change the 
syntax here.

> Confusing LOG message for block replication
> ---
>
> Key: HDFS-12932
> URL: https://issues.apache.org/jira/browse/HDFS-12932
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.3
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-12932.0.patch
>
>
> In our cluster we see large number of log messages such as the following:
> {code}
> 2017-12-15 22:55:54,603 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication 
> from 3 to 3 for 
> {code}
> This is a little confusing since "from 3 to 3" is not "increasing". Digging 
> into it, it seems related to this piece of code:
> {code}
> if (oldBR != -1) {
>   if (oldBR > targetReplication) {
> FSDirectory.LOG.info("Decreasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else {
> FSDirectory.LOG.info("Increasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   }
> }
> {code}
> Perhaps a {{oldBR == targetReplication}} case is missing?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-12-19 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297290#comment-16297290
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks [~elek] for the update! +1 on v005 patch, the failed tests and the 
findbug warning are unrelated. Will commit it shortly.

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch, HDFS-12799-HDFS-7240.003.patch, 
> HDFS-12799-HDFS-7240.004.patch, HDFS-12799-HDFS-7240.005.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-12-19 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297297#comment-16297297
 ] 

Chen Liang commented on HDFS-12799:
---

Committed to the feature branch, thanks [~elek] for the contribution!

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Fix For: ozone
>
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch, HDFS-12799-HDFS-7240.003.patch, 
> HDFS-12799-HDFS-7240.004.patch, HDFS-12799-HDFS-7240.005.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-12-19 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12799:
--
   Resolution: Fixed
Fix Version/s: ozone
   Status: Resolved  (was: Patch Available)

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Fix For: ozone
>
> Attachments: HDFS-12799-HDFS-7240.001.patch, 
> HDFS-12799-HDFS-7240.002.patch, HDFS-12799-HDFS-7240.003.patch, 
> HDFS-12799-HDFS-7240.004.patch, HDFS-12799-HDFS-7240.005.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12940) Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally

2017-12-19 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297507#comment-16297507
 ] 

Chen Liang commented on HDFS-12940:
---

Thanks [~nandakumar131] for the catch! But when I ran test locally, I still 
occasionally got assertion on line 1140, which implies that a key that should 
have been deleted still exists. I suspect that in addition to the issue you 
fixed, there can be another issue due to other tests: since the cleanup service 
got started at the start of minicluster, as multiple tests went on, the exact 
time of clean up check can be arbitrary. So it's possible that the second 
{{Thread.sleep(2000);}} may not actually include a cleanup check, given that 
the clean up check interval is set to 3 sec.

If this is true, then I think there are two ways to fix:
1. instead of sleeping for 2000, first sleep 2000, then create key 5, then 
sleep another 2000. Then for key1~4, the sleep is 4000, which is guaranteed to 
have included a check. While for key 5, the sleep is 2000, so even if there is 
a check after its creation, it guarantees the check won't remove key because 
key 5 is guaranteed to have lived <2000 by the checking time. We may also want 
to decrease the check interval and cleanup threshold to make this test run 
faster.
2. make testExpiredOpenKey a separate unit test class, then we avoid the impact 
from the other tests.
Either way looks good to me.

> Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally
> -
>
> Key: HDFS-12940
> URL: https://issues.apache.org/jira/browse/HDFS-12940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12940-HDFS-7240.000.patch
>
>
> {{TestKeySpaceManager#testExpiredOpenKey}} is flaky.
> In {{testExpiredOpenKey}} we are opening four keys for writing and wait for 
> them to expire (without committing). Verification/Assertion is done by 
> querying {{MiniOzoneCluster}} and matching the count. Since the {{cluster}} 
> instance of {{MiniOzoneCluster}} is shared between test-cases in 
> {{TestKeySpaceManager}}, we should not rely on the count. The verification 
> should only happen by matching the keyNames and not with the count.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12940) Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally

2017-12-19 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297507#comment-16297507
 ] 

Chen Liang edited comment on HDFS-12940 at 12/19/17 10:06 PM:
--

Thanks [~nandakumar131] for the catch! But when I ran test locally, I still 
occasionally got assertion on line 1140, which implies that a key that should 
have been deleted still exists. I suspect that in addition to the issue you 
fixed, there can be another issue due to other tests: since the cleanup service 
got started at the start of minicluster, as multiple tests went on, the exact 
time of clean up check can be arbitrary. So it's possible that the second 
{{Thread.sleep(2000);}} may not actually include a cleanup check, given that 
the clean up check interval is set to 3 sec.

If this is true, then I think there are two ways to fix:
1. instead of sleeping for 2000, first sleep 2000, then create key 5, then 
sleep another 2000. Then for key1~4, the sleep is 4000, which is guaranteed to 
include a check that cleans up key 1~4. While for key 5, the sleep is 2000, so 
even if there is a check after its creation, it guarantees the check won't 
remove key because key 5 is guaranteed to have lived <2000 by the checking 
time. We may also want to decrease the check interval and cleanup threshold to 
make this test run faster.
2. make testExpiredOpenKey a separate unit test class, then we avoid the impact 
from the other tests.
Either way looks good to me.


was (Author: vagarychen):
Thanks [~nandakumar131] for the catch! But when I ran test locally, I still 
occasionally got assertion on line 1140, which implies that a key that should 
have been deleted still exists. I suspect that in addition to the issue you 
fixed, there can be another issue due to other tests: since the cleanup service 
got started at the start of minicluster, as multiple tests went on, the exact 
time of clean up check can be arbitrary. So it's possible that the second 
{{Thread.sleep(2000);}} may not actually include a cleanup check, given that 
the clean up check interval is set to 3 sec.

If this is true, then I think there are two ways to fix:
1. instead of sleeping for 2000, first sleep 2000, then create key 5, then 
sleep another 2000. Then for key1~4, the sleep is 4000, which is guaranteed to 
have included a check. While for key 5, the sleep is 2000, so even if there is 
a check after its creation, it guarantees the check won't remove key because 
key 5 is guaranteed to have lived <2000 by the checking time. We may also want 
to decrease the check interval and cleanup threshold to make this test run 
faster.
2. make testExpiredOpenKey a separate unit test class, then we avoid the impact 
from the other tests.
Either way looks good to me.

> Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally
> -
>
> Key: HDFS-12940
> URL: https://issues.apache.org/jira/browse/HDFS-12940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12940-HDFS-7240.000.patch
>
>
> {{TestKeySpaceManager#testExpiredOpenKey}} is flaky.
> In {{testExpiredOpenKey}} we are opening four keys for writing and wait for 
> them to expire (without committing). Verification/Assertion is done by 
> querying {{MiniOzoneCluster}} and matching the count. Since the {{cluster}} 
> instance of {{MiniOzoneCluster}} is shared between test-cases in 
> {{TestKeySpaceManager}}, we should not rely on the count. The verification 
> should only happen by matching the keyNames and not with the count.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-20 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298970#comment-16298970
 ] 

Chen Liang commented on HDFS-12751:
---

Thanks [~nandakumar131] for the comment!

bq. 1. Getting all the containers from ContainerStateManager for updating 
allocated bytes.

Thanks for point this out, especially the link the JIRAs. Then let's this 
revisit this when HDFS-12522 is done, For now I think this should be fine as 
this should affect the time it takes to shut down SCM. I will follow up JIRAs 
later.

bq. 2. Exposing setter method for allocated bytes - setAllocatedBytes

Thanks for bringing up this point. I wonder though, is there a code path where 
SCM will read and rely on {{ContainerInfo}} instance from client? I was under 
impression that client only reads ContaienrInfo, but whatever the client does 
with this object will not be taken by the server side.



> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
> Fix For: HDFS-7240
>
> Attachments: HDFS-12751-HDFS-7240.001.patch, 
> HDFS-12751-HDFS-7240.002.patch
>
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12954) Ozone: Container : Add key versioning support-3

2017-12-20 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12954:
-

 Summary: Ozone: Container : Add key versioning support-3
 Key: HDFS-12954
 URL: https://issues.apache.org/jira/browse/HDFS-12954
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Chen Liang
Assignee: Chen Liang


A new version of a key is effectively overwriting some consecutive range of 
bytes in the entire key offset range. For each version, we need to keep exactly 
what the range is in order for the IO vector to work.

Currently, since we only write from the start (offset = 0), so offset range of 
a version is only up to the key data size field when the version gets 
committed. But currently we only keep one single key data size variable.(see 
{{KeyManagerImpl#commitKey}}). We need to know the corresponding key data size 
for each version. This JIRA is to the tracking of offset range for each version.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo

2017-12-21 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12958:
-

 Summary: Ozone: remove setAllocatedBytes method in ContainerInfo
 Key: HDFS-12958
 URL: https://issues.apache.org/jira/browse/HDFS-12958
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Priority: Minor


We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and 
we keep all fields of {{ContainerInfo}} immutable, such that client won't 
accidentally change {{ContainerInfo}} and rely on the changed instance.

An alternative of having {{setAllocatedBytes}} is to always create a new 
{{ContainerInfo}} instance whenever it needs to be changed.

This is based on [this 
comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750]
 from HDFS-12751.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12751) Ozone: SCM: update container allocated size to container db for all the open containers in ContainerStateManager#close

2017-12-21 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300395#comment-16300395
 ] 

Chen Liang commented on HDFS-12751:
---

Thanks for the clarification [~nandakumar131]. Filed HDFS-12958 as follow up.

> Ozone: SCM: update container allocated size to container db for all the open 
> containers in ContainerStateManager#close
> --
>
> Key: HDFS-12751
> URL: https://issues.apache.org/jira/browse/HDFS-12751
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Chen Liang
> Fix For: HDFS-7240
>
> Attachments: HDFS-12751-HDFS-7240.001.patch, 
> HDFS-12751-HDFS-7240.002.patch
>
>
> Container allocated size is maintained in memory by 
> {{ContainerStateManager}}, this has to be updated in container db when we 
> shutdown SCM. {{ContainerStateManager#close}} will be called during SCM 
> shutdown, so updating allocated size for all the open containers should be 
> done here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-12958:
-

Assignee: Chen Liang

> Ozone: remove setAllocatedBytes method in ContainerInfo
> ---
>
> Key: HDFS-12958
> URL: https://issues.apache.org/jira/browse/HDFS-12958
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
>
> We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and 
> we keep all fields of {{ContainerInfo}} immutable, such that client won't 
> accidentally change {{ContainerInfo}} and rely on the changed instance.
> An alternative of having {{setAllocatedBytes}} is to always create a new 
> {{ContainerInfo}} instance whenever it needs to be changed.
> This is based on [this 
> comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750]
>  from HDFS-12751.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12951) Incorrect javadoc in SaslDataTransferServer.java#receive

2017-12-21 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300466#comment-16300466
 ] 

Chen Liang commented on HDFS-12951:
---

Thanks [~msingh] for the catch, the failed tests and findbug warnings are 
unrelated. +1 on v001 patch, I've committed to trunk. Thanks for the 
contribution.

> Incorrect javadoc in SaslDataTransferServer.java#receive 
> -
>
> Key: HDFS-12951
> URL: https://issues.apache.org/jira/browse/HDFS-12951
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.1.0
>
> Attachments: HDFS-12951.001.patch
>
>
> The javadoc for the receive incorrectly states "int" in the 4th parameter. 
> This should be corrected to remove the int.
> {code}
>   /**
>* Receives SASL negotiation from a peer on behalf of a server.
>*
>* @param peer connection peer
>* @param underlyingOut connection output stream
>* @param underlyingIn connection input stream
>* @param int xferPort data transfer port of DataNode accepting connection
>* @param datanodeId ID of DataNode accepting connection
>* @return new pair of streams, wrapped after SASL negotiation
>* @throws IOException for any error
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12951) Incorrect javadoc in SaslDataTransferServer.java#receive

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12951:
--
   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

> Incorrect javadoc in SaslDataTransferServer.java#receive 
> -
>
> Key: HDFS-12951
> URL: https://issues.apache.org/jira/browse/HDFS-12951
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.1.0
>
> Attachments: HDFS-12951.001.patch
>
>
> The javadoc for the receive incorrectly states "int" in the 4th parameter. 
> This should be corrected to remove the int.
> {code}
>   /**
>* Receives SASL negotiation from a peer on behalf of a server.
>*
>* @param peer connection peer
>* @param underlyingOut connection output stream
>* @param underlyingIn connection input stream
>* @param int xferPort data transfer port of DataNode accepting connection
>* @param datanodeId ID of DataNode accepting connection
>* @return new pair of streams, wrapped after SASL negotiation
>* @throws IOException for any error
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12958:
--
Status: Patch Available  (was: Open)

> Ozone: remove setAllocatedBytes method in ContainerInfo
> ---
>
> Key: HDFS-12958
> URL: https://issues.apache.org/jira/browse/HDFS-12958
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-12958-HDFS-7240.001.patch
>
>
> We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and 
> we keep all fields of {{ContainerInfo}} immutable, such that client won't 
> accidentally change {{ContainerInfo}} and rely on the changed instance.
> An alternative of having {{setAllocatedBytes}} is to always create a new 
> {{ContainerInfo}} instance whenever it needs to be changed.
> This is based on [this 
> comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750]
>  from HDFS-12751.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12958:
--
Attachment: HDFS-12958-HDFS-7240.001.patch

Post v001 patch. [~nandakumar131] would you mind taking a look? Thanks!

> Ozone: remove setAllocatedBytes method in ContainerInfo
> ---
>
> Key: HDFS-12958
> URL: https://issues.apache.org/jira/browse/HDFS-12958
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-12958-HDFS-7240.001.patch
>
>
> We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and 
> we keep all fields of {{ContainerInfo}} immutable, such that client won't 
> accidentally change {{ContainerInfo}} and rely on the changed instance.
> An alternative of having {{setAllocatedBytes}} is to always create a new 
> {{ContainerInfo}} instance whenever it needs to be changed.
> This is based on [this 
> comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750]
>  from HDFS-12751.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-12954) Ozone: Container : Add key versioning support-3

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-12954 started by Chen Liang.
-
> Ozone: Container : Add key versioning support-3
> ---
>
> Key: HDFS-12954
> URL: https://issues.apache.org/jira/browse/HDFS-12954
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12954-HDFS-7240.001.patch
>
>
> A new version of a key is effectively overwriting some consecutive range of 
> bytes in the entire key offset range. For each version, we need to keep 
> exactly what the range is in order for the IO vector to work.
> Currently, since we only write from the start (offset = 0), so offset range 
> of a version is only up to the key data size field when the version gets 
> committed. But currently we only keep one single key data size variable.(see 
> {{KeyManagerImpl#commitKey}}). We need to know the corresponding key data 
> size for each version. This JIRA is to the tracking of offset range for each 
> version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12954) Ozone: Container : Add key versioning support-3

2017-12-21 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12954:
--
Attachment: HDFS-12954-HDFS-7240.001.patch

> Ozone: Container : Add key versioning support-3
> ---
>
> Key: HDFS-12954
> URL: https://issues.apache.org/jira/browse/HDFS-12954
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12954-HDFS-7240.001.patch
>
>
> A new version of a key is effectively overwriting some consecutive range of 
> bytes in the entire key offset range. For each version, we need to keep 
> exactly what the range is in order for the IO vector to work.
> Currently, since we only write from the start (offset = 0), so offset range 
> of a version is only up to the key data size field when the version gets 
> committed. But currently we only keep one single key data size variable.(see 
> {{KeyManagerImpl#commitKey}}). We need to know the corresponding key data 
> size for each version. This JIRA is to the tracking of offset range for each 
> version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12940) Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally

2018-01-02 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308660#comment-16308660
 ] 

Chen Liang commented on HDFS-12940:
---

[~nandakumar131] how often did you see these tests fail? I ran 
{{TestKeySpaceManager}} about twenty times. The only fail I saw was there was 
one time {{testExpiredOpenKey}} failed on 
{{Assert.assertFalse(removed.contains(keyName));}}.

If {{testWriteSize}} and {{testListKeys}} fail with {{error:KEY_NOT_FOUND}} 
then, I would guess it is probably because these tests were being slow and took 
several seconds to run, and because 
{{OZONE_OPEN_KEY_CLEANUP_SERVICE_INTERVAL_SECONDS}} is also only a couple 
seconds, those keys got cleaned up. If this is the case, then I think it is 
probably best to make {{testExpiredOpenKey}} a separate test and leave  
{{OZONE_OPEN_KEY_CLEANUP_SERVICE_INTERVAL_SECONDS}} in {{TestKeySpaceManager}} 
as the default value of a very long time.

> Ozone: KSM: TestKeySpaceManager#testExpiredOpenKey fails occasionally
> -
>
> Key: HDFS-12940
> URL: https://issues.apache.org/jira/browse/HDFS-12940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12940-HDFS-7240.000.patch, 
> HDFS-12940-HDFS-7240.001.patch
>
>
> {{TestKeySpaceManager#testExpiredOpenKey}} is flaky.
> In {{testExpiredOpenKey}} we are opening four keys for writing and wait for 
> them to expire (without committing). Verification/Assertion is done by 
> querying {{MiniOzoneCluster}} and matching the count. Since the {{cluster}} 
> instance of {{MiniOzoneCluster}} is shared between test-cases in 
> {{TestKeySpaceManager}}, we should not rely on the count. The verification 
> should only happen by matching the keyNames and not with the count.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12980) Ozone: SCM: Restructuring container state transition and events

2018-01-02 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308761#comment-16308761
 ] 

Chen Liang commented on HDFS-12980:
---

Thanks for working on this [~nandakumar131]! I'm just looking at the state 
machine, seems there is no longer an edge from open to delete, does this mean 
we will no longer be able to delete an open container, but have to wait until 
it is closed?

> Ozone: SCM: Restructuring container state transition and events
> ---
>
> Key: HDFS-12980
> URL: https://issues.apache.org/jira/browse/HDFS-12980
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12980-HDFS-7240.000.patch
>
>
> Existing state transition of containers
> {noformat}
> ALLOCATED --> CREATING --> OPEN 
> -> PENDING_CLOSE --> CLOSED
>   (BEGIN_CREATE) |  (COMPLETE_CREATE)|
> (FULL_CONTAINER)(CLOSE)
>  |   |
>  |   |
>  | (TIMEOUT) | (DELETE)
>  |   |
>  +-> DELETING <--+
> |
> |
> | (CLEANUP)
> |
> DELETED
> {noformat}
> We don't have support for deleting a CLOSED container in this.
> *Proposal*
> {noformat}
>  
> [ALLOCATED]--->[CREATING]->[OPEN]-->[CLOSING]--->[CLOSED]
> (CREATE) |(CREATED)   (FINALIZE)  (CLOSE)|
>  |   |
>  |   |
>  |(TIMEOUT)  (DELETE)|
>  |   |
>  +--> [DELETING] <---+
>|
>|
>   (CLEANUP)|
>|
>[DELETED]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12980) Ozone: SCM: Restructuring container state transition and events

2018-01-02 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308880#comment-16308880
 ] 

Chen Liang commented on HDFS-12980:
---

Thanks for the illustration [~anu]! Still though, is there a reason why we 
don't allow a {{DELETE}} on a container that is already created, and is in 
{{OPEN}} state? i.e. looks fine to me to have an edge from state {{OPEN}} (and 
probably also {{CLOSING}}) to {{DELETING}} state. While still keeping 
{{delete}} idempotent.

> Ozone: SCM: Restructuring container state transition and events
> ---
>
> Key: HDFS-12980
> URL: https://issues.apache.org/jira/browse/HDFS-12980
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12980-HDFS-7240.000.patch
>
>
> Existing state transition of containers
> {noformat}
> ALLOCATED --> CREATING --> OPEN 
> -> PENDING_CLOSE --> CLOSED
>   (BEGIN_CREATE) |  (COMPLETE_CREATE)|
> (FULL_CONTAINER)(CLOSE)
>  |   |
>  |   |
>  | (TIMEOUT) | (DELETE)
>  |   |
>  +-> DELETING <--+
> |
> |
> | (CLEANUP)
> |
> DELETED
> {noformat}
> We don't have support for deleting a CLOSED container in this.
> *Proposal*
> {noformat}
>  
> [ALLOCATED]--->[CREATING]->[OPEN]-->[CLOSING]--->[CLOSED]
> (CREATE) |(CREATED)   (FINALIZE)  (CLOSE)|
>  |   |
>  |   |
>  |(TIMEOUT)  (DELETE)|
>  |   |
>  +--> [DELETING] <---+
>|
>|
>   (CLEANUP)|
>|
>[DELETED]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12980) Ozone: SCM: Restructuring container state transition and events

2018-01-02 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308903#comment-16308903
 ] 

Chen Liang commented on HDFS-12980:
---

Thanks [~anu] for the follow-up explanation! I was also thinking it might 
actually be the right thing to only allow deleting containers in immutable 
state.  Makes sense to me.

> Ozone: SCM: Restructuring container state transition and events
> ---
>
> Key: HDFS-12980
> URL: https://issues.apache.org/jira/browse/HDFS-12980
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
> Attachments: HDFS-12980-HDFS-7240.000.patch
>
>
> Existing state transition of containers
> {noformat}
> ALLOCATED --> CREATING --> OPEN 
> -> PENDING_CLOSE --> CLOSED
>   (BEGIN_CREATE) |  (COMPLETE_CREATE)|
> (FULL_CONTAINER)(CLOSE)
>  |   |
>  |   |
>  | (TIMEOUT) | (DELETE)
>  |   |
>  +-> DELETING <--+
> |
> |
> | (CLEANUP)
> |
> DELETED
> {noformat}
> We don't have support for deleting a CLOSED container in this.
> *Proposal*
> {noformat}
>  
> [ALLOCATED]--->[CREATING]->[OPEN]-->[CLOSING]--->[CLOSED]
> (CREATE) |(CREATED)   (FINALIZE)  (CLOSE)|
>  |   |
>  |   |
>  |(TIMEOUT)  (DELETE)|
>  |   |
>  +--> [DELETING] <---+
>|
>|
>   (CLEANUP)|
>|
>[DELETED]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.

2020-10-08 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210499#comment-17210499
 ] 

Chen Liang commented on HDFS-15567:
---

Thanks for working on this [~shv]! Some comments:

1. Currently calling {{AbstractFileSystem.java}}'s msync throws 
UnsupportedOperationException, I was thinking whether it should be throwing 
UnsupportedOperationException, or just making it a noop. (similarly for 
{{FileSystem.java}}'s msync}}. I think making it noop might be better, any 
thoughts?
 2. Change in {{MiniDFSCluster.java}}, is it really needed?
 3. testMsyncFileContext has a LOG info call, seems unnecessary. Also looks 
like it only test FileContext, should we also test for FileSystem?

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicetly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch
>
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.

2020-10-12 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212685#comment-17212685
 ] 

Chen Liang commented on HDFS-15567:
---

Thanks [~shv]. Makes sense, +1 on v002 patch.

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicetly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch
>
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225011#comment-17225011
 ] 

Chen Liang commented on HDFS-15665:
---

Thanks for working on this [~shv]! v001 patch looks good to me. Just two minor 
comments:

1. The {{getInt}} line Balancer.java:L#286 seems redundant? no variable is 
taking that value
2. Balancer.java:L#663 and L#665, the two LOG.info lines, would it be better to 
merge them to one line?

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15665) Balancer logging improvement

2020-11-03 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225636#comment-17225636
 ] 

Chen Liang commented on HDFS-15665:
---

Thanks for the clarification [~shv] , +1 to v002 patch 

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch, HDFS-15665.002.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

2020-12-15 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14272:
--
Fix Version/s: 2.10.0
   3.2.1
   3.1.3

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -
>
> Key: HDFS-14272
> URL: https://issues.apache.org/jira/browse/HDFS-14272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
> Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch, 
> HDFS-14272.002.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-09 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947906#comment-16947906
 ] 

Chen Liang commented on HDFS-14509:
---

Hey [~John Smith], after reading through the previous comments again, I'm under 
impression that you have a 2.x patch? If this is correct, would you mind 
sharing a branch-2 patch? If it is just casting to {{DataInputStream}}, I guess 
I can put a patch as well I think.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-10 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948927#comment-16948927
 ] 

Chen Liang commented on HDFS-14509:
---

Thanks for the update [~John Smith]! The change makes sense to me. I think we 
should be able to remove {{instanceof}} as it seems for block access token it 
is always {{DataInputStream}}. But I also don't like doing a casting without a 
check. I also verified the two failed tests, both passed in my local run. 
Committed to branch-2. Thanks for the contribution Yuxuan!

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch, HDFS-14509-branch-2.001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-10 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14509:
--
   Fix Version/s: 3.2.2
  3.1.4
  3.3.0
  2.10.0
Target Version/s: 3.1.3, 2.10.0, 3.3.0, 3.2.2  (was: 2.10.0, 3.3.0, 3.1.3, 
3.2.2)
Assignee: Yuxuan Wang
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch, HDFS-14509-branch-2.001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should allow SASL and privileged HTTP

2019-10-15 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952282#comment-16952282
 ] 

Chen Liang commented on HDFS-13081:
---

Hey folks, any plan to backport to branch-2? I do try do the backport if no 
objection/concerns.

> Datanode#checkSecureConfig should allow SASL and privileged HTTP
> 
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 3.1.0, 3.0.3
>
> Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch, 
> HDFS-13081.002.patch, HDFS-13081.003.patch, HDFS-13081.004.patch, 
> HDFS-13081.005.patch, HDFS-13081.006.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. 
> Authentication of Datanode RPC server can be done either via SASL handshake 
> or JSVC/privilege RPC port. 
> This guarantees authentication of the datanode RPC server before a client 
> transmits a secret, such as a block access token. 
> Authentication of the  HTTP server can also be done either via HTTPS/SSL or 
> JSVC/privilege HTTP port. This guarantees authentication of datandoe HTTP 
> server before a client transmits a secret, such as a delegation token.
> This ticket is open to allow privileged HTTP as an alternative to HTTPS to 
> work with SASL based RPC protection.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException

2019-10-28 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961248#comment-16961248
 ] 

Chen Liang commented on HDFS-14937:
---

Thanks for reporting this [~xuzq_zander]. v001 patch makes sense to me. Can we 
add a log message though, to explicitly indicate it is interrupted? Also, 
checking {{Thread.currentThread().isInterrupted()}} seems unusual, is it 
possible to check, say, something like {{InterruptedException}}?

> [SBN read] ObserverReadProxyProvider should throw InterruptException
> 
>
> Key: HDFS-14937
> URL: https://issues.apache.org/jira/browse/HDFS-14937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14937-trunk-001.patch
>
>
> ObserverReadProxyProvider should throw InterruptException immediately if one 
> Observer catch InterruptException in invoking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption

2019-10-28 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13541:
--
Release Note: 
This feature allows HDFS to selectively enforce encryption for both RPC 
(NameNode) and data transfer (DataNode). With this feature enabled, NameNode 
can listen on multiple ports, and different ports can have different security 
configurations. Depending on which NameNode port clients connect to, the RPC 
calls and the following data transfer will enforce security configuration 
corresponding to this NameNode port. This can help when there is requirement to 
enforce different security policies depending on the location where the clients 
are connecting from.

This can be enabled by setting `hadoop.security.saslproperties.resolver.class` 
configuration to `org.apache.hadoop.security.IngressPortBasedResolver`, and add 
the additional NameNode auxiliary ports by setting 
`dfs.namenode.rpc-address.auxiliary-ports`, and set the security individual 
ports by configuring `ingress.port.sasl.configured.ports`.

> NameNode Port based selective encryption
> 
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-13541-branch-2.001.patch, 
> HDFS-13541-branch-2.002.patch, HDFS-13541-branch-2.003.patch, 
> HDFS-13541-branch-3.1.001.patch, HDFS-13541-branch-3.1.002.patch, 
> HDFS-13541-branch-3.2.001.patch, HDFS-13541-branch-3.2.002.patch, NameNode 
> Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-29 Thread Chen Liang (Jira)
Chen Liang created HDFS-14941:
-

 Summary: Potential editlog race condition can cause corrupted file
 Key: HDFS-14941
 URL: https://issues.apache.org/jira/browse/HDFS-14941
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chen Liang


Recently we encountered an issue that, after a failover, NameNode complains 
corrupted file/missing blocks. The blocks did recover after full block reports, 
so the blocks are not actually missing. After further investigation, we believe 
this is what happened:

First of all, on SbN, it is possible that it receives block reports before 
corresponding edit tailing happened. In which case SbN postpones processing the 
DN block report, handled by the guarding logic below:
{code:java}
  if (shouldPostponeBlocksFromFuture &&
  namesystem.isGenStampInFuture(iblk)) {
queueReportedBlock(storageInfo, iblk, reportedState,
QUEUE_REASON_FUTURE_GENSTAMP);
continue;
  }
{code}
Basically if reported block has a future generation stamp, the DN report gets 
requeued.

However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
{code:java}
  // allocate new block, record block locations in INode.
  newBlock = createNewBlock();
  INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
  saveAllocatedBlock(src, inodesInPath, newBlock, targets);

  persistNewBlock(src, pendingFile);
  offset = pendingFile.computeFileSize();
{code}
The line
 {{newBlock = createNewBlock();}}
 Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
Standby
 while the following line
 {{persistNewBlock(src, pendingFile);}}
 would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
Standby.

Then the race condition is that, imagine Standby has just processed 
{{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to be 
in different setment). Now a block report with new generation stamp comes in.

Since the genstamp bump has already been processed, the reported block may not 
be considered as future block. So the guarding logic passes. But actually, the 
block hasn't been added to blockmap, because the second edit is yet to be 
tailed. So, the block then gets added to invalidate block list and we saw 
messages like:
{code:java}
BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
{code}
Even worse, since this IBR is effectively lost, the NameNode has no information 
about this block, until the next full block report. So after a failover, the NN 
marks it as corrupt.

This issue won't happen though, if both of the edit entries get tailed all 
together, so no IBR processing can happen in between. But in our case, we set 
edit tailing interval to super low (to allow Standby read), so when under high 
workload, there is a much much higher chance that the two entries are tailed 
separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-29 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-14941:
-

Assignee: Chen Liang

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-29 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962543#comment-16962543
 ] 

Chen Liang commented on HDFS-14941:
---

ping [~shv], [~xkrogen], [~csun] FYI.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   10   >