[jira] [Commented] (HDFS-17523) Add fine-grained locks metrics in DataSetLockManager

2024-05-15 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846659#comment-17846659
 ] 

lei w commented on HDFS-17523:
--

[~hexiaoqiao]Thank you for your comment, PR will be submitted later

> Add  fine-grained locks metrics in DataSetLockManager
> -
>
> Key: HDFS-17523
> URL: https://issues.apache.org/jira/browse/HDFS-17523
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>
> Currently we use fine-grained locks to manage FsDataSetImpl. But we did not 
> add lock-related metrics. In some cases, we actually need lock-holding 
> information to understand the time-consuming lock-holding of a certain 
> operation. Using this information, we can also optimize some long-term lock 
> operations as early as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17523) Add fine-grained locks metrics in DataSetLockManager

2024-05-12 Thread lei w (Jira)
lei w created HDFS-17523:


 Summary: Add  fine-grained locks metrics in DataSetLockManager
 Key: HDFS-17523
 URL: https://issues.apache.org/jira/browse/HDFS-17523
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w


Currently we use fine-grained locks to manage FsDataSetImpl. But we did not add 
lock-related metrics. In some cases, we actually need lock-holding information 
to understand the time-consuming lock-holding of a certain operation. Using 
this information, we can also optimize some long-term lock operations as early 
as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17519) Reduce lease checks when the last block is in the complete state and the penultimate block is in the committed state

2024-05-10 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17519:
-
Summary: Reduce lease checks when the last block is in the complete state 
and the penultimate block is in the committed state  (was: Reduce lease checks 
when the last block is in the complete state and the penultimate block is in 
the committed state in ones file.)

> Reduce lease checks when the last block is in the complete state and the 
> penultimate block is in the committed state
> 
>
> Key: HDFS-17519
> URL: https://issues.apache.org/jira/browse/HDFS-17519
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> When the last block of the file is in the complete state and the penultimate 
> block is in the committed state, LeaseMonitor will continuously check the 
> lease of this file. In this case, it is usually because the DN encountered 
> some anomalies and did not report to the NN for a long time. So can we renew 
> the lease with the LeaseMonitor, and then reduce the number of checks on this 
> lease?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17519) Reduce lease checks when the last block is in the complete state and the penultimate block is in the committed state in ones file.

2024-05-10 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845401#comment-17845401
 ] 

lei w commented on HDFS-17519:
--

[~ayushsaxena] Can you give me some suggestions?

> Reduce lease checks when the last block is in the complete state and the 
> penultimate block is in the committed state in ones file.
> --
>
> Key: HDFS-17519
> URL: https://issues.apache.org/jira/browse/HDFS-17519
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> When the last block of the file is in the complete state and the penultimate 
> block is in the committed state, LeaseMonitor will continuously check the 
> lease of this file. In this case, it is usually because the DN encountered 
> some anomalies and did not report to the NN for a long time. So can we renew 
> the lease with the LeaseMonitor, and then reduce the number of checks on this 
> lease?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17519) Reduce lease checks when the last block is in the complete state and the penultimate block is in the committed state in ones file.

2024-05-10 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17519:
-
Summary: Reduce lease checks when the last block is in the complete state 
and the penultimate block is in the committed state in ones file.  (was: 
LeaseMonitor renew the lease when the last block is in the complete state and 
the penultimate block is in the committed state in ones file.)

> Reduce lease checks when the last block is in the complete state and the 
> penultimate block is in the committed state in ones file.
> --
>
> Key: HDFS-17519
> URL: https://issues.apache.org/jira/browse/HDFS-17519
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> When the last block of the file is in the complete state and the penultimate 
> block is in the committed state, LeaseMonitor will continuously check the 
> lease of this file. In this case, it is usually because the DN encountered 
> some anomalies and did not report to the NN for a long time. So can we renew 
> the lease with the LeaseMonitor, and then reduce the number of checks on this 
> lease?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17519) LeaseMonitor renew the lease when the last block is in the complete state and the penultimate block is in the committed state in ones file.

2024-05-10 Thread lei w (Jira)
lei w created HDFS-17519:


 Summary: LeaseMonitor renew the lease when the last block is in 
the complete state and the penultimate block is in the committed state in ones 
file.
 Key: HDFS-17519
 URL: https://issues.apache.org/jira/browse/HDFS-17519
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w


When the last block of the file is in the complete state and the penultimate 
block is in the committed state, LeaseMonitor will continuously check the lease 
of this file. In this case, it is usually because the DN encountered some 
anomalies and did not report to the NN for a long time. So can we renew the 
lease with the LeaseMonitor, and then reduce the number of checks on this lease?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17518) In the lease monitor, if a file is closed, we should sync the editslog

2024-05-10 Thread lei w (Jira)
lei w created HDFS-17518:


 Summary: In the lease monitor, if a file is closed, we should sync 
the editslog
 Key: HDFS-17518
 URL: https://issues.apache.org/jira/browse/HDFS-17518
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w


In the lease monitor, if a file is closed,  method checklease will return true, 
and then the edits log will not be sync. In my opinion, we should sync the 
edits log to avoid not synchronizing the state to the standby NameNode for a 
long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-12 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17408:
-
Description: During the execution of the rename operation, we first 
calculate the quota for the source INode using verifyQuotaForRename, and at the 
same time, we calculate the quota for the target INode. Subsequently, in 
RenameOperation#removeSrc, RenameOperation#removeSrc4OldRename, and 
RenameOperation#addSourceToDestination, the quota for the source directory is 
calculated again. In exceptional cases, RenameOperation#restoreDst and 
RenameOperation#restoreSource will also perform quota calculations for the 
source and target directories. In fact, many of the quota calculations are 
redundant and unnecessary, so we should optimize them away.

> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
>
> During the execution of the rename operation, we first calculate the quota 
> for the source INode using verifyQuotaForRename, and at the same time, we 
> calculate the quota for the target INode. Subsequently, in 
> RenameOperation#removeSrc, RenameOperation#removeSrc4OldRename, and 
> RenameOperation#addSourceToDestination, the quota for the source directory is 
> calculated again. In exceptional cases, RenameOperation#restoreDst and 
> RenameOperation#restoreSource will also perform quota calculations for the 
> source and target directories. In fact, many of the quota calculations are 
> redundant and unnecessary, so we should optimize them away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17422) Enhance the stability of the unit test TestDFSAdmin

2024-03-11 Thread lei w (Jira)
lei w created HDFS-17422:


 Summary:  Enhance the stability of the unit test TestDFSAdmin
 Key: HDFS-17422
 URL: https://issues.apache.org/jira/browse/HDFS-17422
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w


It has been observed that TestDFSAdmin frequently fails tests, such as 
[PR-6620|https://github.com/apache/hadoop/pull/6620]. The failure occurs when 
the test method testDecommissionDataNodesReconfig asserts the first line of the 
standard output. The issue arises when the content being checked does not 
appear on a single line. I believe we should change the method of testing. The 
standard output content, which was printed in 
[PR-6620|https://github.com/apache/hadoop/pull/6620], is as follows :
{panel:title=TestInformation}
2024-03-11 02:36:19,442 [main] INFO  tools.TestDFSAdmin 
(TestDFSAdmin.java:testDecommissionDataNodesReconfig(1356)) - 
outsForFinishReconf first element is Reconfiguring status for node 
[127.0.0.1:41361]: started at Mon Mar 11 02:36:18 UTC 2024 and finished at Mon 
Mar 11 02:36:18 UTC 2024., all element is [Reconfiguring status for node 
[127.0.0.1:41361]: started at Mon Mar 11 02:36:18 UTC 2024 and finished at Mon 
Mar 11 02:36:18 UTC 2024., SUCCESS: Changed property 
dfs.datanode.data.transfer.bandwidthPerSec,From: "0",  To: "1000", 
Reconfiguring status for node [127.0.0.1:33073]: started at Mon Mar 11 02:36:18 
UTC 2024 and finished at Mon Mar 11 02:36:18 UTC 2024., SUCCESS: Changed 
property dfs.datanode.data.transfer.bandwidthPerSec,   From: "0",  To: 
"1000", Retrieval of reconfiguration status successful on 2 nodes, failed on 0 
nodes.], node1Addr is 127.0.0.1:41361 , node2Addr is 127.0.0.1:33073.
{panel}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-04 Thread lei w (Jira)
lei w created HDFS-17408:


 Summary: Reduce the number of quota calculations in FSDirRenameOp
 Key: HDFS-17408
 URL: https://issues.apache.org/jira/browse/HDFS-17408
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size

2024-02-20 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17391:
-
Description: 
Adjust the checkpoint io buffer size to the chunk size to reduce checkpoint 
time.
Before change:
2022-07-11 07:10:50,900 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 374700896827 to namenode at http://:50070 in 1729.465 seconds
After change:
2022-07-12 08:15:55,068 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 375717629244 to namenode at http://:50070  in 858.668 seconds


  was:Adjust the checkpoint io buffer size to the chunk size to reduce 
checkpoint time 


> Adjust the checkpoint io buffer size to the chunk size
> --
>
> Key: HDFS-17391
> URL: https://issues.apache.org/jira/browse/HDFS-17391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>
> Adjust the checkpoint io buffer size to the chunk size to reduce checkpoint 
> time.
> Before change:
> 2022-07-11 07:10:50,900 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
> txid 374700896827 to namenode at http://:50070 in 1729.465 seconds
> After change:
> 2022-07-12 08:15:55,068 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
> txid 375717629244 to namenode at http://:50070  in 858.668 seconds



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size

2024-02-20 Thread lei w (Jira)
lei w created HDFS-17391:


 Summary: Adjust the checkpoint io buffer size to the chunk size
 Key: HDFS-17391
 URL: https://issues.apache.org/jira/browse/HDFS-17391
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w


Adjust the checkpoint io buffer size to the chunk size to reduce checkpoint 
time 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17383) Datanode current block token should come from active NameNode in HA mode

2024-02-19 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17383:
-
Summary: Datanode current block token should come from active NameNode in 
HA mode  (was: Use the block token from active NameNode to transfer block in 
DataNode)

> Datanode current block token should come from active NameNode in HA mode
> 
>
> Key: HDFS-17383
> URL: https://issues.apache.org/jira/browse/HDFS-17383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
> Attachments: reproduce.diff
>
>
> We found that transfer block failed during the namenode upgrade. The specific 
> error reported was that the block token verification failed. The reason is 
> that during the datanode transfer block process, the source datanode uses its 
> own generated block token, and the keyid comes from ANN or SBN. However, 
> because the newly upgraded NN has just been started, the keyid owned by the 
> source datanode may not be owned by the target datanode, so the write fails. 
> Here's how to reproduce this situation in the attachment



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17383) Use the block token from active NameNode to transfer block in DataNode

2024-02-18 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17383:
-
Description: We found that transfer block failed during the namenode 
upgrade. The specific error reported was that the block token verification 
failed. The reason is that during the datanode transfer block process, the 
source datanode uses its own generated block token, and the keyid comes from 
ANN or SBN. However, because the newly upgraded NN has just been started, the 
keyid owned by the source datanode may not be owned by the target datanode, so 
the write fails. Here's how to reproduce this situation in the attachment  
(was: We found that transfer block failed during the namenode upgrade. The 
specific error reported was that the block token verification failed. The 
reason is that during the datanode transfer block process, the source datanode 
uses its own generated block token, and the keyid comes from ANN or SBN. 
However, because the newly upgraded NN has just been started, the keyid owned 
by the source datanode may not be owned by the target datanode, so the write 
fails. Here's how to reproduce this situation.)

> Use the block token from active NameNode to transfer block in DataNode
> --
>
> Key: HDFS-17383
> URL: https://issues.apache.org/jira/browse/HDFS-17383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
> Attachments: reproduce.diff
>
>
> We found that transfer block failed during the namenode upgrade. The specific 
> error reported was that the block token verification failed. The reason is 
> that during the datanode transfer block process, the source datanode uses its 
> own generated block token, and the keyid comes from ANN or SBN. However, 
> because the newly upgraded NN has just been started, the keyid owned by the 
> source datanode may not be owned by the target datanode, so the write fails. 
> Here's how to reproduce this situation in the attachment



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17383) Use the block token from active NameNode to transfer block in DataNode

2024-02-18 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17383:
-
Attachment: reproduce.diff

> Use the block token from active NameNode to transfer block in DataNode
> --
>
> Key: HDFS-17383
> URL: https://issues.apache.org/jira/browse/HDFS-17383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
> Attachments: reproduce.diff
>
>
> We found that transfer block failed during the namenode upgrade. The specific 
> error reported was that the block token verification failed. The reason is 
> that during the datanode transfer block process, the source datanode uses its 
> own generated block token, and the keyid comes from ANN or SBN. However, 
> because the newly upgraded NN has just been started, the keyid owned by the 
> source datanode may not be owned by the target datanode, so the write fails. 
> Here's how to reproduce this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17383) Use the block token from active NameNode to transfer block in DataNode

2024-02-18 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17383:
-
Description: We found that transfer block failed during the namenode 
upgrade. The specific error reported was that the block token verification 
failed. The reason is that during the datanode transfer block process, the 
source datanode uses its own generated block token, and the keyid comes from 
ANN or SBN. However, because the newly upgraded NN has just been started, the 
keyid owned by the source datanode may not be owned by the target datanode, so 
the write fails. Here's how to reproduce this situation.

> Use the block token from active NameNode to transfer block in DataNode
> --
>
> Key: HDFS-17383
> URL: https://issues.apache.org/jira/browse/HDFS-17383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> We found that transfer block failed during the namenode upgrade. The specific 
> error reported was that the block token verification failed. The reason is 
> that during the datanode transfer block process, the source datanode uses its 
> own generated block token, and the keyid comes from ANN or SBN. However, 
> because the newly upgraded NN has just been started, the keyid owned by the 
> source datanode may not be owned by the target datanode, so the write fails. 
> Here's how to reproduce this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17383) Use the block token from active NameNode to transfer block in DataNode

2024-02-18 Thread lei w (Jira)
lei w created HDFS-17383:


 Summary: Use the block token from active NameNode to transfer 
block in DataNode
 Key: HDFS-17383
 URL: https://issues.apache.org/jira/browse/HDFS-17383
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-01-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17354:
-
Description: 
We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
phase  because StateStoreService is Initialized in  initialization phase.  Now, 
router will throw IoException when start up.

{panel:title=Exception}
2024-01-09 16:27:06,939 WARN 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
fetch current list of namespaces.
java.io.IOException: State Store does not have an interface for MembershipStore
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{panel}


  was:
We should  start clear expired namespace thread at  RouterRpcServer 
initialization phase  because StateStoreService is Initialized in  
initialization phase.  Now, router will throw IoException when start up.

{panel:title=Exception}
2024-01-09 16:27:06,939 WARN 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
fetch current list of namespaces.
java.io.IOException: State Store does not have an interface for MembershipStore
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{panel}



> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
> phase  because StateStoreService is Initialized in  initialization phase.  
> Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> 

[jira] [Updated] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-01-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17354:
-
Description: 
We should  start clear expired namespace thread at  RouterRpcServer 
initialization phase  because StateStoreService is Initialized in  
initialization phase.  Now, router will throw IoException when start up.

{panel:title=Exception}
2024-01-09 16:27:06,939 WARN 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
fetch current list of namespaces.
java.io.IOException: State Store does not have an interface for MembershipStore
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{panel}


  was:
We should  start clear expired namespace thread at  RouterRpcServer 
initialization phase  because StateStoreService is Initialized in  
initialization phase.  Now, router will throw IoException when start up.

{panel:title=My title}
2024-01-09 16:27:06,939 WARN 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
fetch current list of namespaces.
java.io.IOException: State Store does not have an interface for MembershipStore
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{panel}



> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>
> We should  start clear expired namespace thread at  RouterRpcServer 
> initialization phase  because StateStoreService is Initialized in  
> initialization phase.  Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> 

[jira] [Created] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-01-24 Thread lei w (Jira)
lei w created HDFS-17354:


 Summary: Delay invoke  clearStaleNamespacesInRouterStateIdContext 
during router start up
 Key: HDFS-17354
 URL: https://issues.apache.org/jira/browse/HDFS-17354
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w


We should  start clear expired namespace thread at  RouterRpcServer 
initialization phase  because StateStoreService is Initialized in  
initialization phase.  Now, router will throw IoException when start up.

{panel:title=My title}
2024-01-09 16:27:06,939 WARN 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
fetch current list of namespaces.
java.io.IOException: State Store does not have an interface for MembershipStore
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{panel}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17339) BPServiceActor should skip cacheReport when one blockPool does not have CacheBlock on this DataNode

2024-01-15 Thread lei w (Jira)
lei w created HDFS-17339:


 Summary: BPServiceActor should skip cacheReport when one blockPool 
does not have CacheBlock on this DataNode
 Key: HDFS-17339
 URL: https://issues.apache.org/jira/browse/HDFS-17339
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w


Now, DataNode will cacheReport to all NameNode when CacheCapacitySize is not 
zero. But sometimes, not all NameNodes have CacheBlock on this DataNode. So 
BPServiceActor should skip cacheReport when one blockPool does not have 
CacheBlock on this DataNode. If so, the NameNode will reduce unnecessary lock 
contention



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html

2024-01-14 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17331:
-
Attachment: Before fix.png

> Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html
> ---
>
> Key: HDFS-17331
> URL: https://issues.apache.org/jira/browse/HDFS-17331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
> Attachments: After fix.png, Before fix.png
>
>
> Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html

2024-01-14 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17331:
-
Attachment: After fix.png

> Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html
> ---
>
> Key: HDFS-17331
> URL: https://issues.apache.org/jira/browse/HDFS-17331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
> Attachments: After fix.png, Before fix.png
>
>
> Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html

2024-01-14 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17331:
-
Attachment: (was: 截屏2024-01-08 下午8.17.07.png)

> Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html
> ---
>
> Key: HDFS-17331
> URL: https://issues.apache.org/jira/browse/HDFS-17331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
>
> Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html

2024-01-09 Thread lei w (Jira)
lei w created HDFS-17331:


 Summary: Fix Blocks are always -1 and DataNode`s version are 
always UNKNOWN in federationhealth.html
 Key: HDFS-17331
 URL: https://issues.apache.org/jira/browse/HDFS-17331
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w
 Attachments: 截屏2024-01-08 下午8.17.07.png

Blocks are always -1 and DataNode`s version are always UNKNOWN in 
federationhealth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html

2024-01-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-17331:
-
Attachment: 截屏2024-01-08 下午8.17.07.png

> Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html
> ---
>
> Key: HDFS-17331
> URL: https://issues.apache.org/jira/browse/HDFS-17331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
> Attachments: 截屏2024-01-08 下午8.17.07.png
>
>
> Blocks are always -1 and DataNode`s version are always UNKNOWN in 
> federationhealth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2023-12-11 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795580#comment-17795580
 ] 

lei w commented on HDFS-16083:
--

[~tomscut] Is this reasonable?

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> HDFS-16083.005.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16102) Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to save time

2023-12-11 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w resolved HDFS-16102.
--
Resolution: Invalid

> Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to 
> save time 
> --
>
> Key: HDFS-16102
> URL: https://issues.apache.org/jira/browse/HDFS-16102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16102.001.patch
>
>
> The current logic in removeBlocksAssociatedTo(...) is as follows:
> {code:java}
>   void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
> providedStorageMap.removeDatanode(node);
> for (DatanodeStorageInfo storage : node.getStorageInfos()) {
>   final Iterator it = storage.getBlockIterator();
>   //add the BlockInfos to a new collection as the
>   //returned iterator is not modifiable.
>   Collection toRemove = new ArrayList<>();
>   while (it.hasNext()) {
> toRemove.add(it.next()); // First iteration : to put blocks to 
> another collection 
>   }
>   for (BlockInfo b : toRemove) {
> removeStoredBlock(b, node); // Another iteration : to remove blocks
>   }
> }
>   // ..
>   }
> {code}
>  In fact , we can use the first iteration to achieve this logic , so should 
> we remove the redundant iteration to save time and memory?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16020) DatanodeReportType should add LIVE_NOT_DECOMMISSIONING type

2023-12-11 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w resolved HDFS-16020.
--
Resolution: Invalid

> DatanodeReportType should add LIVE_NOT_DECOMMISSIONING type
> ---
>
> Key: HDFS-16020
> URL: https://issues.apache.org/jira/browse/HDFS-16020
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16020.001.patch
>
>
> Balancer builds cluster nodes by 
> getDatanodeStorageReport(DatanodeReportType.LIVE) method。If the user does not 
> specify the exclude node list, the balancer may migrate data to the DataNode 
> in the decommission state. Should we filter out nodes in the decommission 
> state by a new DatanodeReportType(LIVE_NOT_DECOMMISSIONING) regardless of 
> whether the user specifies the exclude node list ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2023-12-11 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16196:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17275) We should determine whether the block has been deleted in the block report

2023-12-05 Thread lei w (Jira)
lei w created HDFS-17275:


 Summary: We should determine whether the block has been deleted in 
the block report
 Key: HDFS-17275
 URL: https://issues.apache.org/jira/browse/HDFS-17275
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: lei w


Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In 
block report.,We may do some useless block related calculations when blocks 
haven't been added to invalidateBlocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17270) Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper Client to get token in some case

2023-12-01 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w reassigned HDFS-17270:


Assignee: lei w

>  Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper Client  to get 
> token in some case 
> ---
>
> Key: HDFS-17270
> URL: https://issues.apache.org/jira/browse/HDFS-17270
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Major
> Attachments: CuratorFrameworkException
>
>
> Now, we use CuratorFramework to simplifies using ZooKeeper in 
> ZKDelegationTokenSecretManagerImpl and we always hold the same 
> zookeeperClient after initialization ZKDelegationTokenSecretManagerImpl. But 
> in some cases like network problem , CuratorFramework may close current 
> zookeeperClient and create new one. In this case , we will use  a zkclient 
> which has been closed  to get token. We encountered this situation in our 
> cluster,exception information in attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17270) Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper Client to get token in some case

2023-12-01 Thread lei w (Jira)
lei w created HDFS-17270:


 Summary:  Fix ZKDelegationTokenSecretManagerImpl use closed 
zookeeper Client  to get token in some case 
 Key: HDFS-17270
 URL: https://issues.apache.org/jira/browse/HDFS-17270
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w
 Attachments: CuratorFrameworkException

Now, we use CuratorFramework to simplifies using ZooKeeper in 
ZKDelegationTokenSecretManagerImpl and we always hold the same zookeeperClient 
after initialization ZKDelegationTokenSecretManagerImpl. But in some cases like 
network problem , CuratorFramework may close current zookeeperClient and create 
new one. In this case , we will use  a zkclient which has been closed  to get 
token. We encountered this situation in our cluster,exception information in 
attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16038) DataNode Unrecognized Observer Node when cluster add an observer node

2022-04-28 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529764#comment-17529764
 ] 

lei w commented on HDFS-16038:
--

Is it possible to add a switch on the namenode node, if namenode  state is 
observe and switch on. we can  replace the Observer state with the Standby 
state then respon to datanode.[~tomscut]

> DataNode Unrecognized Observer Node when cluster add an observer node
> -
>
> Key: HDFS-16038
> URL: https://issues.apache.org/jira/browse/HDFS-16038
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Critical
>  Labels: Observer
>
> When an Observer node is added to the cluster, the DataNode will not be able 
> to recognize the HAServiceState.observer, This is because we did not upgrade 
> the DataNode. Generally, it will take a long time for a big cluster to 
> upgrade the DataNode . So should we add a switch to replace the Observer 
> state with the Standby state when DataNode can not recognize the 
> HAServiceState.observer state?
> The following are some error messages of DataNode:
> {code:java}
> 11:14:31,812 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> com.google.protobuf.InvalidProtocolBufferException: Message missing required 
> fields: haStatus.state
> at 
> com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)
> at 
> com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:71)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation

2022-01-17 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477588#comment-17477588
 ] 

lei w commented on HDFS-16428:
--

[~LiJinglun] [~ayushtkn] [~xiaoyuyao] Looking forward to your comments!

> Source path setted storagePolicy will cause wrong typeConsumed  in rename 
> operation
> ---
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Priority: Major
> Attachments: example.txt
>
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation

2022-01-17 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16428:
-
Summary: Source path setted storagePolicy will cause wrong typeConsumed  in 
rename operation  (was: When source path setted storagePolicy in rename 
operation will cause wrong typeConsumed )

> Source path setted storagePolicy will cause wrong typeConsumed  in rename 
> operation
> ---
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Priority: Major
> Attachments: example.txt
>
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) When source path setted storagePolicy in rename operation will cause wrong typeConsumed

2022-01-17 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16428:
-
Attachment: example.txt

> When source path setted storagePolicy in rename operation will cause wrong 
> typeConsumed 
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Priority: Major
> Attachments: example.txt
>
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16428) When source path setted storagePolicy in rename operation will cause wrong typeConsumed

2022-01-17 Thread lei w (Jira)
lei w created HDFS-16428:


 Summary: When source path setted storagePolicy in rename operation 
will cause wrong typeConsumed 
 Key: HDFS-16428
 URL: https://issues.apache.org/jira/browse/HDFS-16428
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: lei w


When compute quota in rename operation , we use storage policy of the target 
directory to compute src  quota usage. This will cause wrong value of 
typeConsumed when source path was setted storage policy. I provided a unit test 
to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2021-11-29 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w reassigned HDFS-15068:


Assignee: Aiphago  (was: lei w)

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch, HDFS-15068.005.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register

2021-11-29 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w reassigned HDFS-15068:


Assignee: lei w  (was: Aiphago)

> DataNode could meet deadlock if invoke refreshVolumes when register
> ---
>
> Key: HDFS-15068
> URL: https://issues.apache.org/jira/browse/HDFS-15068
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: lei w
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, 
> HDFS-15068.003.patch, HDFS-15068.004.patch, HDFS-15068.005.patch
>
>
> DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host 
> start` to trigger #refreshVolumes.
> 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} 
> when enter this method first, then try to hold BPOfferService {{readlock}} 
> when `bpos.getNamespaceInfo()` in following code segment. 
> {code:java}
> for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
>   nsInfos.add(bpos.getNamespaceInfo());
> }
> {code}
> 2. BPOfferService#registrationSucceeded (which is invoked by #register when 
> DataNode start or #reregister when processCommandFromActor) hold 
> BPOfferService {{writelock}} first, then try to hold datanode instance 
> ownable {{synchronizer}} in following method.
> {code:java}
>   synchronized void bpRegistrationSucceeded(DatanodeRegistration 
> bpRegistration,
>   String blockPoolId) throws IOException {
> id = bpRegistration;
> if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
>   throw new IOException("Inconsistent Datanode IDs. Name-node returned "
>   + bpRegistration.getDatanodeUuid()
>   + ". Expecting " + storage.getDatanodeUuid());
> }
> 
> registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2021-09-16 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415995#comment-17415995
 ] 

lei w commented on HDFS-14768:
--

hi [~gjhkael] Please provide patch for branch 3.1 and 3.2, We need it. Thanks!

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   

[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2021-09-13 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-15240:
-
Description: 
# When read some lzo files we found some blocks were broken.

I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from DN 
directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') blocks. 
And find the longest common sequenece(LCS) between b6'(decoded) and b6(read 
from DN)(b7'/b7 and b8'/b8).

After selecting 6 blocks of the block group in combinations one time and 
iterating through all cases, I find one case that the length of LCS is the 
block length - 64KB, 64KB is just the length of ByteBuffer used by 
StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
buffer.

The following log snippet(only show 2 of 28 cases) is my check program output. 
In my case, I known the 3th block is corrupt, so need other 5 blocks to decode 
another 3 blocks, then find the 1th block's LCS substring is block length - 
64kb.

It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
dirty buffer was used before read the 1th block.

Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
after used the dirty buffer.

EDITED for readability.
{code:java}
decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
Check the first 131072 bytes between block[1] and block[1'], the longest common 
substring length is 4
Check the first 131072 bytes between block[6] and block[6'], the longest common 
substring length is 4
Check the first 131072 bytes between block[8] and block[8'], the longest common 
substring length is 4
decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
Check the first 131072 bytes between block[1] and block[1'], the longest common 
substring length is 65536
CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
common substring length is 27197440  # this one
Check the first 131072 bytes between block[7] and block[7'], the longest common 
substring length is 4
Check the first 131072 bytes between block[8] and block[8'], the longest common 
substring length is 4{code}
Now I know the dirty buffer causes reconstruction block error, but how does the 
dirty buffer come about?

After digging into the code and DN log, I found this following DN log is the 
root reason.
{code:java}
[INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
java.nio.channels.SocketChannel[connected local=/:52586 
remote=/:50010]. 18 millis timeout left.
[WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped block: 
BP-714356632--1519726836856:blk_-YY_3472979393
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834) {code}
Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
the futures that contains the future(F)  is cleared, 
{code:java}
return new StripingChunkReadResult(futures.remove(future),
StripingChunkReadResult.CANCELLED); {code}
futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
finally phase, the code snippet in *getStripedReader().close()* 
{code:java}
reconstructor.freeBuffer(reader.getReadBuffer());
reader.freeReadBuffer();
reader.closeBlockReader(); {code}
free buffer firstly, but the StripedBlockReader still holds the buffer and 
write it, that pollute the buffer of BufferPool.

  was:
When read some lzo files we found some blocks were broken.

I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from DN 
directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') blocks. 
And find the longest common sequenece(LCS) between b6'(decoded) and b6(read 
from DN)(b7'/b7 and b8'/b8).

After selecting 6 blocks of the block group in combinations one time and 
iterating through all cases, I find one case that the length of LCS is the 
block length - 64KB, 64KB is just the 

[jira] [Commented] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410605#comment-17410605
 ] 

lei w commented on HDFS-16196:
--

Hi [~hexiaoqiao] . I saw other requests on NameNode,  those do not have the 
same case。Because router will send full path on other requests which without 
filedID.  But fsync、addBlock、complete、abandonBlock has fileID and the path 
loged by method addBlock、complete、abandonBlock is resolved by fileID at first. 
Only complete method will use the path sent by router. So I think we don’t need 
to let the router send the full path in complete method,  we can resolve path 
by fileID at first like addBlock.   


> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w reassigned HDFS-16196:


Assignee: lei w

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16196:
-
Attachment: HDFS-16196.001.patch
Status: Patch Available  (was: Open)

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16196:
-
Attachment: HDFS-16196.001.patch

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16196:
-
Attachment: (was: HDFS-16196.001.patch)

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16196.001.patch
>
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-06 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410501#comment-17410501
 ] 

lei w commented on HDFS-16196:
--

OK,Thanks for [~hexiaoqiao] comment.

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-05 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410334#comment-17410334
 ] 

lei w edited comment on HDFS-16196 at 9/6/21, 4:17 AM:
---

Thanks [~hexiaoqiao] comment. I  checked the logic of trunk was same like this.
In RouterClientProtocol#complete logic is as follow:
  RouterClientProtocol#complete(... ...) --> RouterRpcClient#invokeSingle(... 
...) --> RouterRpcClient#invokeMethod(..) --> RPC to NN. 
 And in RouterClientProtocol#complete(... ...)  we will generate RemoteMethod 
as follows:

{code:java}
RemoteMethod method = new RemoteMethod("complete",
new Class[] {String.class, String.class, ExtendedBlock.class,
long.class},
   // params[0]params[1]params[2]  params[3]
   new RemoteParam(),  clientName,last,fileId);
{code}
RemoteMethod uses an array of object type named params to store parameters , 
and params[0] will store class RemoteParam instance.
Then in  RouterRpcClient#invokeSingle(... ...)  we will generate path by 
RemoteLocation instance as follows :

{code:java}
List nns =
  getNamenodesForNameservice(nsId);
 // create RemoteLocation with src path "/" , destpath "/"
  RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");
  Class proto = method.getProtocol();
  Method m = method.getMethod();
  //Because RemoteMethod params[0] store class RemoteParam instance. we 
will use RemoteLocation dest "/" as method path. please look 
router.RemoteMethod#getParams(... ...)
  Object[] params = method.getParams(loc);
  return invokeMethod(ugi, nns, proto, m, params);
{code}
 As mentioned above, NameNodeRpcServer will receive "/" as complete method path 
information and will log "/" in audit log.  This is my simple analysis.



was (Author: lei w):
Thanks [~hexiaoqiao] comment. I  checked the logic of trunk was same like this.
In RouterClientProtocol#complete logic is as follow:
  RouterClientProtocol#complete(... ...) --> RouterRpcClient#invokeSingle(... 
...) --> RouterRpcClient#invokeMethod(..) --> RPC to NN. 
 And in RouterClientProtocol#complete(... ...)  we will generate RemoteMethod 
as follows:

{code:java}
RemoteMethod method = new RemoteMethod("complete",
new Class[] {String.class, String.class, ExtendedBlock.class,
long.class},
   // params[0]params[1]params[2]  params[3]
   new RemoteParam(),  clientName,last,fileId);
{code}
RemoteMethod uses an array of object type named params to store parameters , 
and params[0] will store class RemoteParam instance.
Then in  RouterRpcClient#invokeSingle(... ...)  we will generate path by 
RemoteLocation instance as follows :

{code:java}
List nns =
  getNamenodesForNameservice(nsId);
 // create RemoteLocation with src path "/" , destpath "/"
  RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");
  Class proto = method.getProtocol();
  Method m = method.getMethod();
  //Because RemoteMethod params[0] store class RemoteParam instance. we 
will use RemoteLocation dest "/" as method path. please look 
router.RemoteMethod#getParams(... ...)
  Object[] params = method.getParams(loc);
  return invokeMethod(ugi, nns, proto, m, params);
{code}
 As mentioned above, NameNodeRpcServer will receive "/" as complete method path 
information and will log "/" in audit log.  This is my simple analysis. If you 
have any questions, thank you for your correction.


> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-05 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410334#comment-17410334
 ] 

lei w edited comment on HDFS-16196 at 9/6/21, 3:28 AM:
---

Thanks [~hexiaoqiao] comment. I  checked the logic of trunk was same like this.
In RouterClientProtocol#complete logic is as follow:
  RouterClientProtocol#complete(... ...) --> RouterRpcClient#invokeSingle(... 
...) --> RouterRpcClient#invokeMethod(..) --> RPC to NN. 
 And in RouterClientProtocol#complete(... ...)  we will generate RemoteMethod 
as follows:

{code:java}
RemoteMethod method = new RemoteMethod("complete",
new Class[] {String.class, String.class, ExtendedBlock.class,
long.class},
   // params[0]params[1]params[2]  params[3]
   new RemoteParam(),  clientName,last,fileId);
{code}
RemoteMethod uses an array of object type named params to store parameters , 
and params[0] will store class RemoteParam instance.
Then in  RouterRpcClient#invokeSingle(... ...)  we will generate path by 
RemoteLocation instance as follows :

{code:java}
List nns =
  getNamenodesForNameservice(nsId);
 // create RemoteLocation with src path "/" , destpath "/"
  RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");
  Class proto = method.getProtocol();
  Method m = method.getMethod();
  //Because RemoteMethod params[0] store class RemoteParam instance. we 
will use RemoteLocation dest "/" as method path. please look 
router.RemoteMethod#getParams(... ...)
  Object[] params = method.getParams(loc);
  return invokeMethod(ugi, nns, proto, m, params);
{code}
 As mentioned above, NameNodeRpcServer will receive "/" as complete method path 
information and will log "/" in audit log.  This is my simple analysis. If you 
have any questions, thank you for your correction.



was (Author: lei w):
Thanks [~hexiaoqiao] comment. I  checked the logic of trunk was same like this.
In RouterClientProtocol#complete logic is as follow:
  RouterClientProtocol#complete(... ...) --> RouterRpcClient#invokeSingle(... 
...) --> RouterRpcClient#invokeMethod(..) --> RPC to NN. 
 And in RouterClientProtocol#complete(... ...)  we will generate RemoteMethod 
as follows:

{code:java}
RemoteMethod method = new RemoteMethod("complete",
new Class[] {String.class, String.class, ExtendedBlock.class,
long.class},
   // params[0]params[1]params[2]  params[3]
   new RemoteParam(),  clientName,last,fileId);
{code}
RemoteMethod uses an array of object type named params to store parameters , 
and params[0] will store class RemoteParam instance.
Then in  RouterRpcClient#invokeSingle(... ...)  we will generate path by 
RemoteLocation instance as follows :

{code:java}
List nns =
  getNamenodesForNameservice(nsId);
 // create RemoteLocation with src path "/" , destpath "/"
  RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");
  Class proto = method.getProtocol();
  Method m = method.getMethod();
  //Because RemoteMethod params[0] store class RemoteParam instance. we 
will use RemoteLocation dest "/" as method path. please look 
router.RemoteMethod#getParams(... ...)
  Object[] params = method.getParams(loc);
  return invokeMethod(ugi, nns, proto, m, params);
{code}
 As mentioned above, NameNodeRpcServer will receive "/" as complete method path 
information and will log "/" in audit log.  This is my simple analysis. If you 
have any questions, thank you for your correction.


> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-05 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410334#comment-17410334
 ] 

lei w commented on HDFS-16196:
--

Thanks [~hexiaoqiao] comment. I  checked the logic of trunk was same like this.
In RouterClientProtocol#complete logic is as follow:
  RouterClientProtocol#complete(... ...) --> RouterRpcClient#invokeSingle(... 
...) --> RouterRpcClient#invokeMethod(..) --> RPC to NN. 
 And in RouterClientProtocol#complete(... ...)  we will generate RemoteMethod 
as follows:

{code:java}
RemoteMethod method = new RemoteMethod("complete",
new Class[] {String.class, String.class, ExtendedBlock.class,
long.class},
   // params[0]params[1]params[2]  params[3]
   new RemoteParam(),  clientName,last,fileId);
{code}
RemoteMethod uses an array of object type named params to store parameters , 
and params[0] will store class RemoteParam instance.
Then in  RouterRpcClient#invokeSingle(... ...)  we will generate path by 
RemoteLocation instance as follows :

{code:java}
List nns =
  getNamenodesForNameservice(nsId);
 // create RemoteLocation with src path "/" , destpath "/"
  RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");
  Class proto = method.getProtocol();
  Method m = method.getMethod();
  //Because RemoteMethod params[0] store class RemoteParam instance. we 
will use RemoteLocation dest "/" as method path. please look 
router.RemoteMethod#getParams(... ...)
  Object[] params = method.getParams(loc);
  return invokeMethod(ugi, nns, proto, m, params);
{code}
 As mentioned above, NameNodeRpcServer will receive "/" as complete method path 
information and will log "/" in audit log.  This is my simple analysis. If you 
have any questions, thank you for your correction.


> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-09-03 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409393#comment-17409393
 ] 

lei w commented on HDFS-16196:
--

Thanks [~ayushtkn] comment. I may not have made it clear . The issue is  when 
complete method invoked by router to namenode , NameNode will log "/" as file 
path rather than file's real path.  This is not conducive to troubleshooting. 
NameNode logs the path through router as follows:
2021-09-03 16:01:26,838 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: /  is closed by DFSClient_attempt_***
NameNode logs the  path through client directly as follows:
2021-09-03 16:01:26,803 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: /home/* /* /   is closed by DFSClient_attempt_***
So I think we can use fileID(complete method`s parameter ) to get file real 
path , then logs it.

> Namesystem#completeFile method will log incorrect path information when 
> router to access
> 
>
> Key: HDFS-16196
> URL: https://issues.apache.org/jira/browse/HDFS-16196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Minor
>
> Router not send entire path information to namenode because 
> ClientProtocol#complete method`s parameter with fileId. Then NameNode will 
> log incorrect path information. This is very confusing, should we let the 
> router pass the path information or modify the log path on  namenode?
> completeFile log as fllow:
> StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16196) Namesystem#completeFile method will log incorrect path information when router to access

2021-08-30 Thread lei w (Jira)
lei w created HDFS-16196:


 Summary: Namesystem#completeFile method will log incorrect path 
information when router to access
 Key: HDFS-16196
 URL: https://issues.apache.org/jira/browse/HDFS-16196
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: lei w


Router not send entire path information to namenode because 
ClientProtocol#complete method`s parameter with fileId. Then NameNode will log 
incorrect path information. This is very confusing, should we let the router 
pass the path information or modify the log path on  namenode?
completeFile log as fllow:
StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUC_*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16126) VolumePair should override hashcode() method

2021-07-13 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w resolved HDFS-16126.
--
Resolution: Invalid

>  VolumePair  should  override hashcode() method
> ---
>
> Key: HDFS-16126
> URL: https://issues.apache.org/jira/browse/HDFS-16126
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: lei w
>Priority: Minor
>
> Now  we use a map to check  one plan with more than one line of same 
> VolumePair in createWorkPlan(final VolumePair volumePair, Step step) , code 
> is as flow:
> {code:java}
> private void createWorkPlan(final VolumePair volumePair, Step step)
>   throws DiskBalancerException {
>  // ... 
> // In case we have a plan with more than
> // one line of same VolumePair
> // we compress that into one work order.
> if (workMap.containsKey(volumePair)) {//  To check use map
>   bytesToMove += workMap.get(volumePair).getBytesToCopy();
> }
>// ...
>   }
> {code}
>  I found the object volumePair is always a new object and without hashcode() 
> method, So use a map to check is invalid. Should we add  hashcode() in 
> VolumePair ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16126) VolumePair should override hashcode() method

2021-07-13 Thread lei w (Jira)
lei w created HDFS-16126:


 Summary:  VolumePair  should  override hashcode() method
 Key: HDFS-16126
 URL: https://issues.apache.org/jira/browse/HDFS-16126
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: diskbalancer
Reporter: lei w


Now  we use a map to check  one plan with more than one line of same VolumePair 
in createWorkPlan(final VolumePair volumePair, Step step) , code is as flow:
{code:java}
private void createWorkPlan(final VolumePair volumePair, Step step)
  throws DiskBalancerException {
 // ... 
// In case we have a plan with more than
// one line of same VolumePair
// we compress that into one work order.
if (workMap.containsKey(volumePair)) {//  To check use map
  bytesToMove += workMap.get(volumePair).getBytesToCopy();
}
   // ...
  }
{code}
 I found the object volumePair is always a new object and without hashcode() 
method, So use a map to check is invalid. Should we add  hashcode() in 
VolumePair ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.005.patch
Status: Patch Available  (was: Open)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> HDFS-16083.005.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Status: Open  (was: Patch Available)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: (was: HDFS-16083.005.patch)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.005.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> HDFS-16083.005.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-09 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: (was: HDFS-16083.005.patch)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16097) Datanode receives ipc requests will throw NPE when datanode quickly restart

2021-07-07 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16097:
-
Description: 
Datanode receives ipc requests will throw NPE when datanode quickly restart. 
This is because when DN is reStarted, BlockPool is first registered with 
blockPoolManager and then fsdataset is initialized. When BlockPool is 
registered to blockPoolManager without initializing fsdataset,  DataNode 
receives an IPC request will throw NPE, because it will call related methods 
provided by fsdataset. The stack exception is as follows:



{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
{code}


The  client side stack exception  is as follows:

{code:java}
 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
recover block (block=BP-###:blk_###, 
datanode=DatanodeInfoWithStorage[,null,null])
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2873)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy26.initReplicaRecovery(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.initReplicaRecovery(InterDatanodeProtocolTranslatorPB.java:83)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.callInitReplicaRecovery(BlockRecoveryWorker.java:571)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.access$400(BlockRecoveryWorker.java:57)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:142)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:610)
at java.lang.Thread.run(Thread.java:748)
{code}



  was:
Datanode receives ipc requests will throw NPE when datanode quickly restart. 
This is because when DN is reStarted, BlockPool is first registered with 
blockPoolManager and then fsdataset is initialized. When BlockPool is 
registered to blockPoolManager without initializing fsdataset,  DataNode 
receives an IPC request will throw NPE, because it will call related methods 
provided by fsdataset. The stack exception is as follows:



{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 

[jira] [Commented] (HDFS-16097) Datanode receives ipc requests will throw NPE when datanode quickly restart

2021-07-02 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373287#comment-17373287
 ] 

lei w commented on HDFS-16097:
--

Thanks [~hexiaoqiao] for your comment. I did not look for the consequences of 
the client encountering this kind of request. But judging from the code logic, 
if the DataNode performs block recovery, the block recovery task will fail and 
if the client calls the getReplicaVisibleLength() method in the 
ClientDataNodeProtocol, the client should exit directly. 

> Datanode receives ipc requests will throw NPE when datanode quickly restart 
> 
>
> Key: HDFS-16097
> URL: https://issues.apache.org/jira/browse/HDFS-16097
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: 
>Reporter: lei w
>Assignee: lei w
>Priority: Major
> Attachments: HDFS-16097.001.patch
>
>
> Datanode receives ipc requests will throw NPE when datanode quickly restart. 
> This is because when DN is reStarted, BlockPool is first registered with 
> blockPoolManager and then fsdataset is initialized. When BlockPool is 
> registered to blockPoolManager without initializing fsdataset,  DataNode 
> receives an IPC request will throw NPE, because it will call related methods 
> provided by fsdataset. The stack exception is as follows:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
> at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
> at 
> org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-06-30 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372310#comment-17372310
 ] 

lei w commented on HDFS-16101:
--

Thanks [~ayushtkn] for your reply.

> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16102) Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to save time

2021-06-30 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372307#comment-17372307
 ] 

lei w commented on HDFS-16102:
--

Thanks [~hexiaoqiao] for your reply.  I will update it.

> Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to 
> save time 
> --
>
> Key: HDFS-16102
> URL: https://issues.apache.org/jira/browse/HDFS-16102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16102.001.patch
>
>
> The current logic in removeBlocksAssociatedTo(...) is as follows:
> {code:java}
>   void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
> providedStorageMap.removeDatanode(node);
> for (DatanodeStorageInfo storage : node.getStorageInfos()) {
>   final Iterator it = storage.getBlockIterator();
>   //add the BlockInfos to a new collection as the
>   //returned iterator is not modifiable.
>   Collection toRemove = new ArrayList<>();
>   while (it.hasNext()) {
> toRemove.add(it.next()); // First iteration : to put blocks to 
> another collection 
>   }
>   for (BlockInfo b : toRemove) {
> removeStoredBlock(b, node); // Another iteration : to remove blocks
>   }
> }
>   // ..
>   }
> {code}
>  In fact , we can use the first iteration to achieve this logic , so should 
> we remove the redundant iteration to save time and memory?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-30 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372039#comment-17372039
 ] 

lei w commented on HDFS-16083:
--

Thanks [~LiJinglun] reply. Take your suggestion and make some changes in v05 . 
Please review again. 

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-30 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.005.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-06-30 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371998#comment-17371998
 ] 

lei w commented on HDFS-16101:
--

[~ayushsaxena] Could you give me some advice?

> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16102) Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to save time

2021-06-30 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16102:
-
Attachment: HDFS-16102.001.patch

> Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to 
> save time 
> --
>
> Key: HDFS-16102
> URL: https://issues.apache.org/jira/browse/HDFS-16102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16102.001.patch
>
>
> The current logic in removeBlocksAssociatedTo(...) is as follows:
> {code:java}
>   void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
> providedStorageMap.removeDatanode(node);
> for (DatanodeStorageInfo storage : node.getStorageInfos()) {
>   final Iterator it = storage.getBlockIterator();
>   //add the BlockInfos to a new collection as the
>   //returned iterator is not modifiable.
>   Collection toRemove = new ArrayList<>();
>   while (it.hasNext()) {
> toRemove.add(it.next()); // First iteration : to put blocks to 
> another collection 
>   }
>   for (BlockInfo b : toRemove) {
> removeStoredBlock(b, node); // Another iteration : to remove blocks
>   }
> }
>   // ..
>   }
> {code}
>  In fact , we can use the first iteration to achieve this logic , so should 
> we remove the redundant iteration to save time and memory?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16102) Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to save time

2021-06-30 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16102:
-
Description: 
The current logic in removeBlocksAssociatedTo(...) is as follows:
{code:java}
  void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
providedStorageMap.removeDatanode(node);
for (DatanodeStorageInfo storage : node.getStorageInfos()) {
  final Iterator it = storage.getBlockIterator();
  //add the BlockInfos to a new collection as the
  //returned iterator is not modifiable.
  Collection toRemove = new ArrayList<>();
  while (it.hasNext()) {
toRemove.add(it.next()); // First iteration : to put blocks to another 
collection 
  }

  for (BlockInfo b : toRemove) {
removeStoredBlock(b, node); // Another iteration : to remove blocks
  }
}
  // ..
  }
{code}
 In fact , we can use the first iteration to achieve this logic , so should we 
remove the redundant iteration to save time and memory?

  was:
The current logic in removeBlocksAssociatedTo(...) is as follows:
{code:java}
  void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
providedStorageMap.removeDatanode(node);
for (DatanodeStorageInfo storage : node.getStorageInfos()) {
  final Iterator it = storage.getBlockIterator();
  //add the BlockInfos to a new collection as the
  //returned iterator is not modifiable.
  Collection toRemove = new ArrayList<>();
  while (it.hasNext()) {
toRemove.add(it.next()); // First iteration : to put blocks to another 
collection 
  }

  for (BlockInfo b : toRemove) {
removeStoredBlock(b, node); // Another iteration : to remove blocks
  }
}
  // ..
  }
{code}
 In fact , we can use the first iteration to achieve this logic , so should we 
remove the redundant iteration to save time?


> Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to 
> save time 
> --
>
> Key: HDFS-16102
> URL: https://issues.apache.org/jira/browse/HDFS-16102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16102.001.patch
>
>
> The current logic in removeBlocksAssociatedTo(...) is as follows:
> {code:java}
>   void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
> providedStorageMap.removeDatanode(node);
> for (DatanodeStorageInfo storage : node.getStorageInfos()) {
>   final Iterator it = storage.getBlockIterator();
>   //add the BlockInfos to a new collection as the
>   //returned iterator is not modifiable.
>   Collection toRemove = new ArrayList<>();
>   while (it.hasNext()) {
> toRemove.add(it.next()); // First iteration : to put blocks to 
> another collection 
>   }
>   for (BlockInfo b : toRemove) {
> removeStoredBlock(b, node); // Another iteration : to remove blocks
>   }
> }
>   // ..
>   }
> {code}
>  In fact , we can use the first iteration to achieve this logic , so should 
> we remove the redundant iteration to save time and memory?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16102) Remove redundant iteration in BlockManager#removeBlocksAssociatedTo(...) to save time

2021-06-30 Thread lei w (Jira)
lei w created HDFS-16102:


 Summary: Remove redundant iteration in 
BlockManager#removeBlocksAssociatedTo(...) to save time 
 Key: HDFS-16102
 URL: https://issues.apache.org/jira/browse/HDFS-16102
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: lei w
Assignee: lei w


The current logic in removeBlocksAssociatedTo(...) is as follows:
{code:java}
  void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
providedStorageMap.removeDatanode(node);
for (DatanodeStorageInfo storage : node.getStorageInfos()) {
  final Iterator it = storage.getBlockIterator();
  //add the BlockInfos to a new collection as the
  //returned iterator is not modifiable.
  Collection toRemove = new ArrayList<>();
  while (it.hasNext()) {
toRemove.add(it.next()); // First iteration : to put blocks to another 
collection 
  }

  for (BlockInfo b : toRemove) {
removeStoredBlock(b, node); // Another iteration : to remove blocks
  }
}
  // ..
  }
{code}
 In fact , we can use the first iteration to achieve this logic , so should we 
remove the redundant iteration to save time?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-06-30 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16101:
-
Attachment: HDFS-16101.001.patch

> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-06-30 Thread lei w (Jira)
lei w created HDFS-16101:


 Summary: Remove unuse variable and IoException in 
ProvidedStorageMap
 Key: HDFS-16101
 URL: https://issues.apache.org/jira/browse/HDFS-16101
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: lei w


Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-29 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371403#comment-17371403
 ] 

lei w commented on HDFS-16083:
--

Add  test in HDFS-16083.003.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-29 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.003.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16097) Datanode receives ipc requests will throw NPE when datanode quickly restart

2021-06-29 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16097:
-
Attachment: HDFS-16097.001.patch

> Datanode receives ipc requests will throw NPE when datanode quickly restart 
> 
>
> Key: HDFS-16097
> URL: https://issues.apache.org/jira/browse/HDFS-16097
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: 
>Reporter: lei w
>Priority: Major
> Attachments: HDFS-16097.001.patch
>
>
> Datanode receives ipc requests will throw NPE when datanode quickly restart. 
> This is because when DN is reStarted, BlockPool is first registered with 
> blockPoolManager and then fsdataset is initialized. When BlockPool is 
> registered to blockPoolManager without initializing fsdataset,  DataNode 
> receives an IPC request will throw NPE, because it will call related methods 
> provided by fsdataset. The stack exception is as follows:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
> at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
> at 
> org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16097) Datanode receives ipc requests will throw NPE when datanode quickly restart

2021-06-29 Thread lei w (Jira)
lei w created HDFS-16097:


 Summary: Datanode receives ipc requests will throw NPE when 
datanode quickly restart 
 Key: HDFS-16097
 URL: https://issues.apache.org/jira/browse/HDFS-16097
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
 Environment: 



Reporter: lei w


Datanode receives ipc requests will throw NPE when datanode quickly restart. 
This is because when DN is reStarted, BlockPool is first registered with 
blockPoolManager and then fsdataset is initialized. When BlockPool is 
registered to blockPoolManager without initializing fsdataset,  DataNode 
receives an IPC request will throw NPE, because it will call related methods 
provided by fsdataset. The stack exception is as follows:



{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: When the Observer NameNode is turned on in the cluster, the 
Active NameNode will receive rollEditLog RPC requests from the Standby NameNode 
and Observer NameNode in a short time. Observer NameNode's rollEditLog request 
is a repetitive operation, so should we forbid Observer NameNode trigger  
active namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
minutes) and active NameNode receives rollEditLog RPC as shown in 
activeRollEdits.png  (was: When the Observer NameNode is turned on in the 
cluster, the Active NameNode will receive rollEditLog RPC requests from the 
Standby NameNode and Observer NameNode in a short time. Observer NameNode's 
rollEditLog request is a repetitive operation, so should we forbid Observer 
NameNode trigger  active namenode log roll ? We  'dfs.ha.log-roll.period' 
configured is 300( 5 minutes))

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: When the Observer NameNode is turned on in the cluster, the 
Active NameNode will receive rollEditLog RPC requests from the Standby NameNode 
and Observer NameNode in a short time. Observer NameNode's rollEditLog request 
is a repetitive operation, so should we forbid Observer NameNode trigger  
active namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
minutes)  (was: When the Observer NameNode is turned on in the cluster, the 
Active NameNode will receive rollEditLog RPC requests from the Standby NameNode 
and Observer NameNode in a short time. Observer NameNode's rollEditLog request 
is a repetitive operation, so should we forbid Observer NameNode trigger  
active namenode log roll ? We  'dfs.ha.log-roll.period' configuration
)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 minutes)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: 
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we forbid Observer NameNode trigger  active 
namenode log roll ? We  'dfs.ha.log-roll.period' configuration


  was:
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we forbid Observer NameNode trigger  active 
namenode log roll ? We Forbid Observer NameNode trigger  active namenode log 
roll



> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: 
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we forbid Observer NameNode trigger  active 
namenode log roll ? We 


  was:
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we proh



> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: 
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we forbid Observer NameNode trigger  active 
namenode log roll ? We Forbid Observer NameNode trigger  active namenode log 
roll


  was:
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we forbid Observer NameNode trigger  active 
namenode log roll ? We 



> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We Forbid Observer NameNode trigger  active namenode log 
> roll



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: activeRollEdits.png

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we proh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-24 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Description: 
When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we proh


  was:When the Observer NameNode is turned on in the cluster, the Active 
NameNode will receive rollEditLog RPC requests from the Standby NameNode and 
Observer NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we prohibit Observer NameNode from triggering 
rollEditLog?


> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we proh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-23 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368075#comment-17368075
 ] 

lei w commented on HDFS-16081:
--

OK,Thanks [~ste...@apache.org] reply.

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of files 
> and return it to the client after obtaining the specified number of files  or 
> NameNode returnes files based on lock hold time instead of just relying on a 
> configuration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-23 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368017#comment-17368017
 ] 

lei w commented on HDFS-16083:
--

Thanks [~LiJinglun] reply ,I will add more information and add unit tests later

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we prohibit Observer NameNode from triggering 
> rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-23 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367906#comment-17367906
 ] 

lei w commented on HDFS-16083:
--

[~ayushsaxena] , [~LiJinglun] anyone have any suggestions?

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we prohibit Observer NameNode from triggering 
> rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-23 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367889#comment-17367889
 ] 

lei w commented on HDFS-16081:
--

Thanks [~ste...@apache.org] comment.  We use original DFSClient API , not found 
listStatusIncremental() or you mean listStatusInternal() ?  The method 
listStatusInternal() will only return after accepting all the files, so it will 
wait a long time when there are too many files in a directory. We want add a 
limit on the number of files and return it to the client after obtaining the 
specified number of files.

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of files 
> and return it to the client after obtaining the specified number of files  or 
> NameNode returnes files based on lock hold time instead of just relying on a 
> configuration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-23 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16081:
-
Description: When we list a large directory, we need to wait a lot of time. 
This is because the NameNode only returns the number of files corresponding to 
dfs.ls.limit each time, and then the client iteratively obtains the remaining 
files. But in many scenarios, we only need to know part of the files in the 
current directory, and then process this part of the file. After processing, go 
to get the remaining files. So can we add a limit on the number of files and 
return it to the client after obtaining the specified number of files  or 
NameNode returnes files based on lock hold time instead of just relying on a 
configuration.   (was: When we list a large directory, we need to wait a lot of 
time. This is because the NameNode only returns the number of files 
corresponding to dfs.ls.limit each time, and then the client iteratively 
obtains the remaining files. But in many scenarios, we only need to know part 
of the files in the current directory, and then process this part of the file. 
After processing, go to get the remaining files. So can we add a limit on the 
number of ls files and return it to the client after obtaining the specified 
number of files  or NameNode returnes files based on lock hold time instead of 
just relying on a configuration. )

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of files 
> and return it to the client after obtaining the specified number of files  or 
> NameNode returnes files based on lock hold time instead of just relying on a 
> configuration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-22 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367858#comment-17367858
 ] 

lei w commented on HDFS-16083:
--

Thanks [~tomscut] comment. Changed and add new patch HDFS-16083.002.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we prohibit Observer NameNode from triggering 
> rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-22 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.002.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we prohibit Observer NameNode from triggering 
> rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-21 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16083:
-
Attachment: HDFS-16083.001.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we prohibit Observer NameNode from triggering 
> rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-06-21 Thread lei w (Jira)
lei w created HDFS-16083:


 Summary: Forbid Observer NameNode trigger  active namenode log roll
 Key: HDFS-16083
 URL: https://issues.apache.org/jira/browse/HDFS-16083
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namanode
Reporter: lei w


When the Observer NameNode is turned on in the cluster, the Active NameNode 
will receive rollEditLog RPC requests from the Standby NameNode and Observer 
NameNode in a short time. Observer NameNode's rollEditLog request is a 
repetitive operation, so should we prohibit Observer NameNode from triggering 
rollEditLog?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-21 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16081:
-
Description: When we list a large directory, we need to wait a lot of time. 
This is because the NameNode only returns the number of files corresponding to 
dfs.ls.limit each time, and then the client iteratively obtains the remaining 
files. But in many scenarios, we only need to know part of the files in the 
current directory, and then process this part of the file. After processing, go 
to get the remaining files. So can we add a limit on the number of ls files and 
return it to the client after obtaining the specified number of files  or 
NameNode returnes files based on lock hold time instead of just relying on a 
configuration.   (was: When we list a large directory, we need to wait a lot of 
time. This is because the NameNode only returns the number of files 
corresponding to dfs.ls.limit each time, and then the client iteratively 
obtains the remaining files. But in many scenarios, we only need to know part 
of the files in the current directory, and then process this part of the file. 
After processing, go to get the remaining files. So can we add a limit on the 
number of ls files and return it to the client after obtaining the specified 
number of files ?)

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of ls 
> files and return it to the client after obtaining the specified number of 
> files  or NameNode returnes files based on lock hold time instead of just 
> relying on a configuration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-21 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366440#comment-17366440
 ] 

lei w edited comment on HDFS-16081 at 6/21/21, 12:46 PM:
-

Our cluster dfs.ls.limit is not configured(default is 1000). In many cases, 
there will be nearly one million files in a directory, so every list operation 
has to wait for several minutes.


was (Author: lei w):
Our cluster dfs.ls.limit is not configured. In many cases, there will be nearly 
one million files in a directory, so every list operation has to wait for 
several minutes.

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of ls 
> files and return it to the client after obtaining the specified number of 
> files ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-21 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366440#comment-17366440
 ] 

lei w commented on HDFS-16081:
--

Our cluster dfs.ls.limit is not configured. In many cases, there will be nearly 
one million files in a directory, so every list operation has to wait for 
several minutes.

> List a large directory, the client waits for a long time
> 
>
> Key: HDFS-16081
> URL: https://issues.apache.org/jira/browse/HDFS-16081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: lei w
>Priority: Minor
>
> When we list a large directory, we need to wait a lot of time. This is 
> because the NameNode only returns the number of files corresponding to 
> dfs.ls.limit each time, and then the client iteratively obtains the remaining 
> files. But in many scenarios, we only need to know part of the files in the 
> current directory, and then process this part of the file. After processing, 
> go to get the remaining files. So can we add a limit on the number of ls 
> files and return it to the client after obtaining the specified number of 
> files ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16081) List a large directory, the client waits for a long time

2021-06-21 Thread lei w (Jira)
lei w created HDFS-16081:


 Summary: List a large directory, the client waits for a long time
 Key: HDFS-16081
 URL: https://issues.apache.org/jira/browse/HDFS-16081
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: lei w


When we list a large directory, we need to wait a lot of time. This is because 
the NameNode only returns the number of files corresponding to dfs.ls.limit 
each time, and then the client iteratively obtains the remaining files. But in 
many scenarios, we only need to know part of the files in the current 
directory, and then process this part of the file. After processing, go to get 
the remaining files. So can we add a limit on the number of ls files and return 
it to the client after obtaining the specified number of files ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364098#comment-17364098
 ] 

lei w commented on HDFS-16073:
--

[~ayushtkn] thanks for your comment . Add HDFS-16073.001.patch to adjust it in 
a single line.

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16073:
-
Attachment: HDFS-16073.001.patch

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-15 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364049#comment-17364049
 ] 

lei w commented on HDFS-16073:
--

 [~hexiaoqiao],  [~ayushsaxena], [~zhuqi] anyone have any suggestions?

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-15 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16073:
-
Attachment: HDFS-16073.patch

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >