[jira] [Updated] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit

2020-11-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15651:
---
Attachment: HDFS-15651.002.patch

> Client could not obtain block when DN CommandProcessingThread exit
> --
>
> Key: HDFS-15651
> URL: https://issues.apache.org/jira/browse/HDFS-15651
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15651.001.patch, HDFS-15651.002.patch, 
> HDFS-15651.patch
>
>
> In our cluster, we applied the HDFS-14997 improvement.
>  We find one case that CommandProcessingThread will exit due to OOM error. 
> OOM error was caused by our one abnormal application that running on this DN 
> node.
> {noformat}
> 2020-10-18 10:27:12,604 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Command processor 
> encountered fatal exception and exit.
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2005)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:617)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1247)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.access$1000(BPServiceActor.java:1194)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread$3.run(BPServiceActor.java:1299)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1208)
> {noformat}
> Here the main point is that CommandProcessingThread crashed will lead a very 
> bad impact. All the NN response commands will not be processed by DN side.
> We enabled the block token to access the data, but here the DN command 
> DNA_ACCESSKEYUPDATE is not processed on time by DN. And then we see lots of 
> Sasl error due to key expiration in DN log:
> {noformat}
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=xxx, keyId=xx, 
> userId=xxx, blockPoolId=, blockId=xxx, access modes=[READ]), since the 
> required block key (keyID=xxx) doesn't exist.]
> {noformat}
>  
> For the impact in client side, our users receive lots of 'could not obtain 
> block' error  with BlockMissingException.
> CommandProcessingThread is a critical thread, it should always be running.
> {code:java}
>   /**
>* CommandProcessingThread that process commands asynchronously.
>*/
>   class CommandProcessingThread extends Thread {
> private final BPServiceActor actor;
> private final BlockingQueue queue;
> ...
> @Override
> public void run() {
>   try {
> processQueue();
>   } catch (Throwable t) {
> LOG.error("{} encountered fatal exception and exit.", getName(), t);  
>  <=== should not exit this thread
>   }
> }
> {code}
> Once a unexpected error happened, a better handing should be:
>  * catch the exception, appropriately deal with the error and let 
> processQueue continue to run
>  or
>  * exit the DN process to let admin user investigate this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit

2020-10-29 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15651:
---
Attachment: HDFS-15651.001.patch

> Client could not obtain block when DN CommandProcessingThread exit
> --
>
> Key: HDFS-15651
> URL: https://issues.apache.org/jira/browse/HDFS-15651
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15651.001.patch, HDFS-15651.patch
>
>
> In our cluster, we applied the HDFS-14997 improvement.
>  We find one case that CommandProcessingThread will exit due to OOM error. 
> OOM error was caused by our one abnormal application that running on this DN 
> node.
> {noformat}
> 2020-10-18 10:27:12,604 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Command processor 
> encountered fatal exception and exit.
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2005)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:617)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1247)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.access$1000(BPServiceActor.java:1194)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread$3.run(BPServiceActor.java:1299)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1208)
> {noformat}
> Here the main point is that CommandProcessingThread crashed will lead a very 
> bad impact. All the NN response commands will not be processed by DN side.
> We enabled the block token to access the data, but here the DN command 
> DNA_ACCESSKEYUPDATE is not processed on time by DN. And then we see lots of 
> Sasl error due to key expiration in DN log:
> {noformat}
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=xxx, keyId=xx, 
> userId=xxx, blockPoolId=, blockId=xxx, access modes=[READ]), since the 
> required block key (keyID=xxx) doesn't exist.]
> {noformat}
>  
> For the impact in client side, our users receive lots of 'could not obtain 
> block' error  with BlockMissingException.
> CommandProcessingThread is a critical thread, it should always be running.
> {code:java}
>   /**
>* CommandProcessingThread that process commands asynchronously.
>*/
>   class CommandProcessingThread extends Thread {
> private final BPServiceActor actor;
> private final BlockingQueue queue;
> ...
> @Override
> public void run() {
>   try {
> processQueue();
>   } catch (Throwable t) {
> LOG.error("{} encountered fatal exception and exit.", getName(), t);  
>  <=== should not exit this thread
>   }
> }
> {code}
> Once a unexpected error happened, a better handing should be:
>  * catch the exception, appropriately deal with the error and let 
> processQueue continue to run
>  or
>  * exit the DN process to let admin user investigate this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit

2020-10-27 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221284#comment-17221284
 ] 

Aiphago commented on HDFS-15651:


[^HDFS-15651.patch]  I think let BPServiceActor exit is better when occur error 
due to hardware.

> Client could not obtain block when DN CommandProcessingThread exit
> --
>
> Key: HDFS-15651
> URL: https://issues.apache.org/jira/browse/HDFS-15651
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Priority: Major
> Attachments: HDFS-15651.patch
>
>
> In our cluster, we applied the HDFS-14997 improvement.
>  We find one case that CommandProcessingThread will exit due to OOM error. 
> OOM error was caused by our one abnormal application that running on this DN 
> node.
> {noformat}
> 2020-10-18 10:27:12,604 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Command processor 
> encountered fatal exception and exit.
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2005)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:617)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1247)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.access$1000(BPServiceActor.java:1194)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread$3.run(BPServiceActor.java:1299)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1208)
> {noformat}
> Here the main point is that CommandProcessingThread crashed will lead a very 
> bad impact. All the NN response commands will not be processed by DN side.
> We enabled the block token to access the data, but here the DN command 
> DNA_ACCESSKEYUPDATE is not processed on time by DN. And then we see lots of 
> Sasl error due to key expiration in DN log:
> {noformat}
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=xxx, keyId=xx, 
> userId=xxx, blockPoolId=, blockId=xxx, access modes=[READ]), since the 
> required block key (keyID=xxx) doesn't exist.]
> {noformat}
>  
> For the impact in client side, our users receive lots of 'could not obtain 
> block' error  with BlockMissingException.
> CommandProcessingThread is a critical thread, it should always be running.
> {code:java}
>   /**
>* CommandProcessingThread that process commands asynchronously.
>*/
>   class CommandProcessingThread extends Thread {
> private final BPServiceActor actor;
> private final BlockingQueue queue;
> ...
> @Override
> public void run() {
>   try {
> processQueue();
>   } catch (Throwable t) {
> LOG.error("{} encountered fatal exception and exit.", getName(), t);  
>  <=== should not exit this thread
>   }
> }
> {code}
> Once a unexpected error happened, a better handing should be:
>  * catch the exception, appropriately deal with the error and let 
> processQueue continue to run
>  or
>  * exit the DN process to let admin user investigate this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit

2020-10-27 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15651:
---
Attachment: HDFS-15651.patch

> Client could not obtain block when DN CommandProcessingThread exit
> --
>
> Key: HDFS-15651
> URL: https://issues.apache.org/jira/browse/HDFS-15651
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Priority: Major
> Attachments: HDFS-15651.patch
>
>
> In our cluster, we applied the HDFS-14997 improvement.
>  We find one case that CommandProcessingThread will exit due to OOM error. 
> OOM error was caused by our one abnormal application that running on this DN 
> node.
> {noformat}
> 2020-10-18 10:27:12,604 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Command processor 
> encountered fatal exception and exit.
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2005)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:617)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1247)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.access$1000(BPServiceActor.java:1194)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread$3.run(BPServiceActor.java:1299)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1208)
> {noformat}
> Here the main point is that CommandProcessingThread crashed will lead a very 
> bad impact. All the NN response commands will not be processed by DN side.
> We enabled the block token to access the data, but here the DN command 
> DNA_ACCESSKEYUPDATE is not processed on time by DN. And then we see lots of 
> Sasl error due to key expiration in DN log:
> {noformat}
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=xxx, keyId=xx, 
> userId=xxx, blockPoolId=, blockId=xxx, access modes=[READ]), since the 
> required block key (keyID=xxx) doesn't exist.]
> {noformat}
>  
> For the impact in client side, our users receive lots of 'could not obtain 
> block' error  with BlockMissingException.
> CommandProcessingThread is a critical thread, it should always be running.
> {code:java}
>   /**
>* CommandProcessingThread that process commands asynchronously.
>*/
>   class CommandProcessingThread extends Thread {
> private final BPServiceActor actor;
> private final BlockingQueue queue;
> ...
> @Override
> public void run() {
>   try {
> processQueue();
>   } catch (Throwable t) {
> LOG.error("{} encountered fatal exception and exit.", getName(), t);  
>  <=== should not exit this thread
>   }
> }
> {code}
> Once a unexpected error happened, a better handing should be:
>  * catch the exception, appropriately deal with the error and let 
> processQueue continue to run
>  or
>  * exit the DN process to let admin user investigate this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-16 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196824#comment-17196824
 ] 

Aiphago edited comment on HDFS-15382 at 9/16/20, 9:35 AM:
--

This is a sample base our 2.7 version, main logic is similar, but apply to 
trunk need some work todo.


was (Author: aiphag0):
This is a sample base our 2.7 version, main logic is similar, but apply to 
trunck need some work todo.

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, 
> image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-16 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196824#comment-17196824
 ] 

Aiphago edited comment on HDFS-15382 at 9/16/20, 9:35 AM:
--

This is a sample base our 2.7 version, main logic is similar, but apply to 
trunck need some work todo.


was (Author: aiphag0):
This is a sample base our 2.7 version, mian logic is similar, but apply to 
trunck need some work todo.

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, 
> image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-16 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196824#comment-17196824
 ] 

Aiphago edited comment on HDFS-15382 at 9/16/20, 9:33 AM:
--

This is a sample base our 2.7 version, mian logic is similar, but apply to 
trunck need some work todo.


was (Author: aiphag0):
This is a sample base our 2.7 version, mian logic is similar, but apply to 
trunck need some wrok todo.

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, 
> image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-16 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: HDFS-15382-sample.patch

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, 
> image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-16 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196824#comment-17196824
 ] 

Aiphago commented on HDFS-15382:


This is a sample base our 2.7 version, mian logic is similar, but apply to 
trunck need some wrok todo.

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, 
> image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124767#comment-17124767
 ] 

Aiphago edited comment on HDFS-15382 at 6/3/20, 8:45 AM:
-

!image-2020-06-03-1.png|width=628,height=378!

The simple lock model like this.Parts of implemented as follows
 * As for finalizeReplica(),append(),createRbw()First get BlockPoolLock 
read lock,and then get BlockPoolLock-volume-lock write lock.
 * As for getStoredBlock(),getMetaDataInputStream()First get BlockPoolLock 
read lock,and the then get BlockPoolLock-volume-lock read lock.
 * As for deepCopyReplica(),getBlockReports() get the BlockPoolLock read lock.
 * As for delete hold the BlockPoolLock write lock.
 * The change of replicaMap's Gset change to sync to make thread safe.And 
replicaMap itself is the same as HDFS-15180  only control blockpool lock


was (Author: aiphag0):
!image-2020-06-03-1.png|width=628,height=378!

The simple lock model like this.Parts of implemented as follows
 # As for finalizeReplica(),append(),createRbw()First get BlockPoolLock 
read lock,and then get BlockPoolLock-volume-lock write lock.

 # As for getStoredBlock(),getMetaDataInputStream()First get BlockPoolLock 
read lock,and the then get BlockPoolLock-volume-lock read lock.

 # As for deepCopyReplica(),getBlockReports() get the BlockPoolLock read lock.

 # As for delete hold the BlockPoolLock write lock.
 # The change of replicaMap's Gset change to sync to make thread safe.And 
replicaMap itself is the same as HDFS-15180  only control blockpool lock

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124767#comment-17124767
 ] 

Aiphago commented on HDFS-15382:


!image-2020-06-03-1.png|width=628,height=378!

The simple lock model like this.Parts of implemented as follows
 # As for finalizeReplica(),append(),createRbw()First get BlockPoolLock 
read lock,and then get BlockPoolLock-volume-lock write lock.

 # As for getStoredBlock(),getMetaDataInputStream()First get BlockPoolLock 
read lock,and the then get BlockPoolLock-volume-lock read lock.

 # As for deepCopyReplica(),getBlockReports() get the BlockPoolLock read lock.

 # As for delete hold the BlockPoolLock write lock.
 # The change of replicaMap's Gset change to sync to make thread safe.And 
replicaMap itself is the same as HDFS-15180  only control blockpool lock

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: image-2020-06-03-1.png

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: (was: 2020-06-03.png)

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: 2020-06-03.png

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: 2020-06-03.png, image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124736#comment-17124736
 ] 

Aiphago edited comment on HDFS-15382 at 6/3/20, 8:12 AM:
-

{quote}it is confused with {{ReplicaCachingGetSpaceUsed}}, IIUC, 
{{ReplicaCachingGetSpaceUsed}} is calculated in memory directly rather than 
sync info from disk, right? so why is it related to this changes? Also the log 
print is based on our internal version rather than branch trunk, some notes 
could be better.
{quote}
ReplicaCachingGetSpaceUsed copy replica from FsDataSetImpl most of time is 
spend in wait FsDataSetImpl lock,so 'Copy replica infos' time spend can reflect 
the time wait for the lock.


was (Author: aiphag0):
{quote}it is confused with {{ReplicaCachingGetSpaceUsed}}, IIUC, 
{{ReplicaCachingGetSpaceUsed}} is calculated in memory directly rather than 
sync info from disk, right? so why is it related to this changes? Also the log 
print is based on our internal version rather than branch trunk, some notes 
could be better.
{quote}
{{ReplicaCachingGetSpaceUsed copy replica from FsDataSetImpl most of time is 
spend in wait }}{{FsDataSetImpl lock,so 'Copy replica infos' time spend can 
reflect the time wait for the lock.}}{{}}

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-03 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124736#comment-17124736
 ] 

Aiphago commented on HDFS-15382:


{quote}it is confused with {{ReplicaCachingGetSpaceUsed}}, IIUC, 
{{ReplicaCachingGetSpaceUsed}} is calculated in memory directly rather than 
sync info from disk, right? so why is it related to this changes? Also the log 
print is based on our internal version rather than branch trunk, some notes 
could be better.
{quote}
{{ReplicaCachingGetSpaceUsed copy replica from FsDataSetImpl most of time is 
spend in wait }}{{FsDataSetImpl lock,so 'Copy replica infos' time spend can 
reflect the time wait for the lock.}}{{}}

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Summary: Split FsDatasetImpl from blockpool lock to blockpool volume lock   
(was: Split DataNode FsDatasetImpl lock to blockpool volume lock )

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123379#comment-17123379
 ] 

Aiphago commented on HDFS-15382:


After improve our du in cache copy time is very low.And we make improve just 
copy replica in each BlockPoolSlice.

 
{code:java}
2020-06-02 12:44:16,586 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, blockPoolId: BP-xxx, replicas size: 665, duration: 0ms
2020-06-02 12:44:16,586 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Refresh dfs used, bpid: BP-xxx,replicas size: 665, dfsUsed: 15925882188 on 
volume: DS-4f1f820a-460f-4fa9-89be-49caed604a52, duration: 0ms , isopen 
hardlink false
2020-06-02 12:44:16,586 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, blockPoo
lId: BP-xxx, replicas size: 699, duration: 1ms
2020-06-02 12:44:16,586 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, blockPoo
lId: BP-xxx, replicas size: 698, duration: 1ms
2020-06-02 12:44:16,587 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, blockPoo
lId: BP-xxx, replicas size: 638, duration: 0ms
2020-06-02 12:44:16,587 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Refresh dfs used, bpid: BP-xxx, replicas size: 638, dfsUsed: 16519661992 on 
volume: DS-b2eb6423-d0bd-493e-a102-d317e55815ce, duration: 0ms , isopen 
hardlink false
2020-06-02 12:44:16,588 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, blockPoo
lId: BP-xxx, replicas size: 644, duration: 0ms
2020-06-02 12:44:16,588 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Refresh dfs used, bpid: BP-xxx, replicas size: 644, dfsUsed: 16636348641 on 
volume: DS-83a2deeb-2389-4036-9f13-df61fc6b35f6, duration: 0ms , isopen 
hardlink false
2020-06-02 12:44:16,588 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
 Copy replica infos, BP-xxx, 
replicas size: 663, duration: 0ms{code}
 

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123372#comment-17123372
 ] 

Aiphago commented on HDFS-15382:


Now we deploy in our produce cluster use this patch for some days.Here is the 
metric by random choose some dn.The metric we add before upgrade with this 
patch.The unit is ms.

!image-2020-06-02-1.png|width=923,height=233!

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: image-2020-06-02-1.png

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Affects Version/s: (was: 3.2.1)
   (was: 2.9.2)

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Target Version/s:   (was: 3.2.0, 2.9.2)

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2, 3.2.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: (was: image-2020-06-02.png)

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2, 3.2.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Attachment: image-2020-06-02.png

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2, 3.2.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15382 started by Aiphago.
--
> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Affects Version/s: 2.9.2
   3.2.1

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2, 3.2.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Fix Version/s: 2.9.2
   3.2.1

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 2.9.2, 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15382:
---
Fix Version/s: (was: 2.9.2)

> Split DataNode FsDatasetImpl lock to blockpool volume lock 
> ---
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15382) Split DataNode FsDatasetImpl lock to blockpool volume lock

2020-06-01 Thread Aiphago (Jira)
Aiphago created HDFS-15382:
--

 Summary: Split DataNode FsDatasetImpl lock to blockpool volume 
lock 
 Key: HDFS-15382
 URL: https://issues.apache.org/jira/browse/HDFS-15382
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Aiphago
Assignee: Aiphago


In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
heavy load and will block other request which in same blockpool but different 
volume.So we split lock to two leval to avoid this happend.And to improve 
datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-04-09 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080192#comment-17080192
 ] 

Aiphago commented on HDFS-15180:


Hi [~zhuqi] , thanks for your feedback.I think GenerationStamp may be change 
before we split block pool lock.And in our version we use wrtie lock in 
DataNode.transferReplicaForPipelineRecovery,this is diff the patch and may 
related to this problem.

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-24 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065536#comment-17065536
 ] 

Aiphago commented on HDFS-15180:


ping [~sodonnell] ,  [~linyiqun], [~weichiu] Any advice ?Thanks.
 

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-22 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064192#comment-17064192
 ] 

Aiphago commented on HDFS-15180:


Fix the problem in UT

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-22 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15180:
---
Attachment: HDFS-15180.004.patch

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-21 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063806#comment-17063806
 ] 

Aiphago commented on HDFS-15180:


ok,I'll fix later

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, image-2020-03-10-17-22-57-391.png, 
> image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-21 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15180:
---
Attachment: HDFS-15180.003.patch

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, image-2020-03-10-17-22-57-391.png, 
> image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063204#comment-17063204
 ] 

Aiphago commented on HDFS-15180:


Hi [~zhuqi], thanks for valuable suggestions.

Change the lock style use try() without finally{}.

Change transferReplicaForPipelineRecovery to read lock.

Wait UT result.[^HDFS-15180.002.patch]

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15180:
---
Attachment: HDFS-15180.002.patch

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-13 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15180:
---
Attachment: HDFS-15180.001.patch

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: HDFS-15180.001.patch, image-2020-03-10-17-22-57-391.png, 
> image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2020-03-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059199#comment-17059199
 ] 

Aiphago commented on HDFS-15180:


Hi [~zhuqi] ,Thanks for your proposal.And we have split dataset lock in our 
early version about 2.7 ,and gray deploy in our produce cluster for weeks.It 
looks like a good improvement in our version.But the trunck version looks big 
different from our version and have many works to do.And our idea is to split 
lock to blockpool  at first, second we try to split each blockpool lock to 
volume lock, third we try to remove remain IO in lock as HDFS-15000 say.If you 
are interesting with this we can do this together.And here is the demo 
patch,and may have some problem.

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: image-2020-03-10-17-22-57-391.png, 
> image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-23 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002146#comment-17002146
 ] 

Aiphago commented on HDFS-15068:


Renew the patch,change log.debug to e.printStackTrace().[^HDFS-15068.005.patch]

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch, HDFS-15068.005.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-23 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15068:
---
Attachment: HDFS-15068.005.patch

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch, HDFS-15068.005.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: HDFS-15000.001.patch

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: (was: HDFS-15000.001.patch)

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000742#comment-17000742
 ] 

Aiphago edited comment on HDFS-15000 at 12/20/19 9:21 AM:
--

Hi [~sodonnell] ,Thanks for your comment.I have some question with your idea.
{quote}I wonder if it would be possible to refactor things so we do steps 1, 2, 
3 and 5, drop the lock and then do the IO operation to actually create the 
file. In the event the IO fails, re-take the lock and clean up the volume map.
{quote}
If we do the IO after step 5,and now the volume map have the replica info but 
actually the IO may not done.If some error happend with IO,you should get lock 
again so volume map will have this replica for a long time,this may cause 
consistency problem when other thread get this  replica.

and my thought is
 # Not change the step in one method.
 # Make the IO async and release the lock wait util other thread signal this 
thread when finish the IO.
 # Keep the IO operation is in order in different methods as #finalizeBlock, 
#finalizeReplica, #createRbw.
 # after IO operation change then change the volume map.

 

 
 


was (Author: aiphag0):
Hi [~sodonnell] ,Thanks for your comment.I have some question with your idea.

??I wonder if it would be possible to refactor things so we do steps 1, 2, 3 
and 5, drop the lock and then do the IO operation to actually create the file. 
In the event the IO fails, re-take the lock and clean up the volume map.??

If we do the IO after step 5,and now the volume map have the replica info but 
actually the IO may not done.If some error happend with IO,you should get lock 
again so volume map will have this replica for a long time,this may cause 
consistency problem when other thread get this  replica.

and my thought is
 # Not change the step in one method.
 # Make the IO async and release the lock wait util other thread signal this 
thread when finish the IO.
 # Keep the IO operation is in order in different methods as #finalizeBlock, 
#finalizeReplica, #createRbw.
 # after IO operation change then change the volume map.

 

 

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: HDFS-15000.001.patch

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: (was: HDFS-15000.001.patch)

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: HDFS-15000.001.patch

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000742#comment-17000742
 ] 

Aiphago commented on HDFS-15000:


Hi [~sodonnell] ,Thanks for your comment.I have some question with your idea.

??I wonder if it would be possible to refactor things so we do steps 1, 2, 3 
and 5, drop the lock and then do the IO operation to actually create the file. 
In the event the IO fails, re-take the lock and clean up the volume map.??

If we do the IO after step 5,and now the volume map have the replica info but 
actually the IO may not done.If some error happend with IO,you should get lock 
again so volume map will have this replica for a long time,this may cause 
consistency problem when other thread get this  replica.

and my thought is
 # Not change the step in one method.
 # Make the IO async and release the lock wait util other thread signal this 
thread when finish the IO.
 # Keep the IO operation is in order in different methods as #finalizeBlock, 
#finalizeReplica, #createRbw.
 # after IO operation change then change the volume map.

 

 

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: (was: HDFS-15000.001.patch)

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-19 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000679#comment-17000679
 ] 

Aiphago commented on HDFS-15068:


Hi [~elgoiri] ,thanks for valuable suggestions.Have test the unit without 
patch,and fix the problem as your suggestions.Renew the 
patch.[^HDFS-15068.004.patch]

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-19 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15068:
---
Attachment: HDFS-15068.004.patch

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-18 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999776#comment-16999776
 ] 

Aiphago edited comment on HDFS-15000 at 12/19/19 6:32 AM:
--

submit demo patch.The main idea is to make IO opreate(the opreate may have 
order depend) without lock and async and keep the opreate is in order.Any 
suggestions or problems?
 


was (Author: aiphag0):
submit demo patch.The main idea is to make IO opreate(the opreate may have 
order depend) without lock and async and keep the opreate is in order.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-18 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999776#comment-16999776
 ] 

Aiphago commented on HDFS-15000:


submit demo patch.The main idea is to make IO opreate(the opreate may have 
order depend) without lock and async and keep the opreate is in order.

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15000) Improve FsDatasetImpl to avoid IO operation in datasetLock

2019-12-18 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15000:
---
Attachment: HDFS-15000.001.patch

> Improve FsDatasetImpl to avoid IO operation in datasetLock
> --
>
> Key: HDFS-15000
> URL: https://issues.apache.org/jira/browse/HDFS-15000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15000.001.patch
>
>
> As HDFS-14997 mentioned, some methods in #FsDatasetImpl such as 
> #finalizeBlock, #finalizeReplica, #createRbw includes IO operation in the 
> datasetLock, It will block some logic when IO load is very high. We should 
> reduce grain fineness or move IO operation out of datasetLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-18 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999736#comment-16999736
 ] 

Aiphago commented on HDFS-15068:


Hi [~hexiaoqiao],thanks for valuable advice,renew the 
patch.[^HDFS-15068.003.patch]
 

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-18 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15068:
---
Attachment: HDFS-15068.003.patch

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-17 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15068:
---
Attachment: HDFS-15068.002.patch

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-17 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998860#comment-16998860
 ] 

Aiphago commented on HDFS-15068:


Thansk [~hexiaoqiao] for the review,I Release the patch with unit test 
[^HDFS-15068.002.patch]

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-16 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-15068:
---
Attachment: HDFS-15068.001.patch

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2019-12-16 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997922#comment-16997922
 ] 

Aiphago commented on HDFS-15068:


I change the lock order of refreshVolumes()  and here is the 
patch.[^HDFS-15068.001.patch]

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Critical
> Attachments: HDFS-15068.001.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984170#comment-16984170
 ] 

Aiphago commented on HDFS-14986:


Thanks a lot for the review [~linyiqun].

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1, 2.11.0
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.006.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983513#comment-16983513
 ] 

Aiphago commented on HDFS-14986:


Good advice,I change the retrytimes to 10 and close the stream in while 
loop.[^HDFS-14986.006.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983213#comment-16983213
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.I improve the patch as your 
comment.Addition I rename shouldInitRefresh to shouldFirstRefresh order to easy 
distinguish.[^HDFS-14986.005.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.005.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-22 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980042#comment-16980042
 ] 

Aiphago commented on HDFS-14986:


Hi 
!https://issues.apache.org/jira/secure/useravatar?size=xsmall&ownerId=linyiqun&avatarId=25258!
 [~linyiqun],I get your idea.But this may have a small problem.Because when use 
FSCachingGetSpaceUsed invoke init() and when 

used < 0,so pass the frist refresh().This will make used = 0 until default 10 
min until next refresh().So I add runImmediately to slove this problem.And to 
prevent other subClass not refresh() twice at first du.So this is my idea in 
patch 004.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-21 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979853#comment-16979853
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.And now 
setShouldInitRefresh() is only invoke in 
ReplicaCachingGetSpaceUsed.[^HDFS-14986.004.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-21 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.004.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978951#comment-16978951
 ] 

Aiphago commented on HDFS-14986:


hi [~jianliang.wu], I just assign this Jira to myself, please feel free to 
assign back if you would also like to work on this.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago reassigned HDFS-14986:
--

Assignee: Aiphago  (was: Ryan Wu)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978950#comment-16978950
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.In previous patch I modify 
CachingGetSpaceUsed#init(),but this will influences the subclass of 
CachingGetSpaceUsed like DU.So I add a filter,and now the related unit tests 
can pass.[^HDFS-14986.003.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.003.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-19 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.002.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-19 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977328#comment-16977328
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],I updata the patch.Can you review this 
again?[^HDFS-14986.002.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-17 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976301#comment-16976301
 ] 

Aiphago commented on HDFS-14986:


Thanks for your advice, I'll fix later.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-14 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974888#comment-16974888
 ] 

Aiphago commented on HDFS-14986:


Here is the patch for the trunk can [~leosun08] [~hexiaoqiao] [~linyiqun] 
review this one? Thanks very much.[^HDFS-14986.001.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-14 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.001.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973916#comment-16973916
 ] 

Aiphago commented on HDFS-14986:


I means I can fix the dead lock problem with add dataset lock,my first comment 
have say the solution.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973906#comment-16973906
 ] 

Aiphago commented on HDFS-14986:


We use the branch before 2.8 with the synchronized,and we fix the deadlock 
problem.And I think it's better to and add dataset lock at 
FsDatasetImpl#deepCopyReplica in trunk,and the method to slove deadlock problem 
is the same.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973321#comment-16973321
 ] 

Aiphago commented on HDFS-14986:


"DataNode: [... #283 daemon prio=5 os_prio=0 tid=0x7fd9826a7800 
nid=0x7463 in Object.wait() [0x7fd949616000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x0006b375a1d0> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2)
 at java.lang.Thread.join(Thread.java:1249)
 - locked <0x0006b375a1d0> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2)
 at java.lang.Thread.join(Thread.java:1323)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:423)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:2509)
 - locked <0x0006b33a3b10> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1388)
 at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:232)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:720)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973317#comment-16973317
 ] 

Aiphago commented on HDFS-14986:


I would like to submit a patch to try to fix this bug later.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973314#comment-16973314
 ] 

Aiphago commented on HDFS-14986:


In -HDFS-14313- we found same concurrent problem before base our hadoop 
version.So we solve it by add dataset lock at FsDatasetImpl#deepCopyReplica().  
But this may cause deadlock problem when restart datanode.When BlockPoolSlice 
init first time,the BlockPoolSlice#loadDfsUsed() may return -1.And the 
ReplicaCachingGetSpaceUsed#init() will blocking because of 
FsDatasetImpl#deepCopyReplica() can't get the dataset lock,and the thread can't 
release dataset lock the same time.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-17 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932097#comment-16932097
 ] 

Aiphago commented on HDFS-14836:


Hi [~jojochuang],I run TestDFSUpgradeWithHA on local with the patch,and I  pass 
the junit tests

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836-trunk-001.patch, HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-14 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929875#comment-16929875
 ] 

Aiphago commented on HDFS-14836:


Hi [~jojochuang] I update the code can you help review this again? Thanks very 
much.

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836-trunk-001.patch, HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-14 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Attachment: HDFS-14836-trunk-001.patch

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836-trunk-001.patch, HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-11 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928173#comment-16928173
 ] 

Aiphago commented on HDFS-14836:


Thanks [~jojochuang] for the comment.
{quote}it would be the best if we can avoid string-matching exception messages. 
"Broken pipe" and "Connection reset" are usually thrown as a SocketException. 
Would it make sense to check exception class name instead? Better, catch 
SocketException and do not call onFailure().
{quote}
if we just check exception class name instead, the range maybe too big.I means 
maybe there are some other SocketException and not match "Broken pipe" and 
"Connection reset" .So why we not keep consistency to  -HDFS-2054 .-
{quote}for socket related exceptions, you don't want to call onFailure(); 
however, those exceptions should be re-thrown too. They should not be ignored 
silently.
{quote}
it's a good suggestion,I will change later.

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-11 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Affects Version/s: (was: 2.9.1)

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-11 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927338#comment-16927338
 ] 

Aiphago commented on HDFS-14836:


Hi [~jojochuang] can you help review this one? Thanks very much.

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-11 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Attachment: HDFS-14836.patch

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926300#comment-16926300
 ] 

Aiphago commented on HDFS-14836:


Hi [~jojochuang] thxs for your attention.

like  HDFS-2054  "Broken pipe" and "Connection reset" is cause by client rather 
than datanode , and datanode may increment a lot of FileIoErrors counter 
,because of this Exception. So I think it's better to do a filter.

 

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Summary: FileIoProvider should not increase FileIoErrors metric in datanode 
volume metric  (was: FileIoProvider will increase FileIoErrors metric in 
datanode volume metric)

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider will increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Description: 
I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .

So should do a filter when fileIoProvider increase FileIoErrors count ?

  was:
I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use 

fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .So should do a filter when 
fileIoProvider increase FileIoErrors count ?


> FileIoProvider will increase FileIoErrors metric in datanode volume metric
> --
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14836) FileIoProvider will increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)
Aiphago created HDFS-14836:
--

 Summary: FileIoProvider will increase FileIoErrors metric in 
datanode volume metric
 Key: HDFS-14836
 URL: https://issues.apache.org/jira/browse/HDFS-14836
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Aiphago


I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use 

fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .So should do a filter when 
fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org