[jira] [Created] (HDFS-17591) RBF: Router should follow X-FRAME-OPTIONS protection setting

2024-07-25 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17591:
---

 Summary: RBF: Router should follow X-FRAME-OPTIONS protection 
setting
 Key: HDFS-17591
 URL: https://issues.apache.org/jira/browse/HDFS-17591
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


Router UI doesn't have X-FRAME-OPTIONS in its header. Router should load the 
value of dfs.xframe.value.

This issue is reported by Daiki Mashima.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17468) Update ISA-L to 2.31.0 in the build image

2024-04-15 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17468:
---

 Summary: Update ISA-L to 2.31.0 in the build image
 Key: HDFS-17468
 URL: https://issues.apache.org/jira/browse/HDFS-17468
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


Intel ISA-L has several improvements in version 2.31.0. Let's update ISA-L in 
our build image to this version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17435) Fix TestRouterRpc failed

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17435.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Fix TestRouterRpc failed
> 
>
> Key: HDFS-17435
> URL: https://issues.apache.org/jira/browse/HDFS-17435
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> TestRouterRpc and TestRouterRpcMultiDestination are failing with the 
> following error.
> {noformat}
> [ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: jenkins is not allowed to impersonate jenkins
> {noformat}
> This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
> implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17441) Fix junit dependency by adding missing library in hadoop-hdfs-rbf

2024-03-25 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17441:
---

 Summary: Fix junit dependency by adding missing library in 
hadoop-hdfs-rbf
 Key: HDFS-17441
 URL: https://issues.apache.org/jira/browse/HDFS-17441
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


We need to add some missing junit libraries in hadoop-hdfs-rbf.

See: 
https://issues.apache.org/jira/browse/HDFS-17370?focusedCommentId=17829747=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-03-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17354.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
> phase  because StateStoreService is Initialized in  initialization phase.  
> Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17432.
-
Fix Version/s: 3.4.1
   3.5.0
   Resolution: Fixed

> Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17432
> URL: https://issues.apache.org/jira/browse/HDFS-17432
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
> both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to 
> the hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17435) Fix TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed

2024-03-20 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17435:
---

 Summary: Fix 
TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed
 Key: HDFS-17435
 URL: https://issues.apache.org/jira/browse/HDFS-17435
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma


TestRouterRpc and TestRouterRpcMultiDestination are failing with the following 
error.
{noformat}
[ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 User: jenkins is not allowed to impersonate jenkins
{noformat}
This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17432:
---

 Summary: Fix junit dependency to enable JUnit4 tests to run in 
hadoop-hdfs-rbf
 Key: HDFS-17432
 URL: https://issues.apache.org/jira/browse/HDFS-17432
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to the 
hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17361) DiskBalancer: Query command support with multiple nodes

2024-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17361.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> DiskBalancer: Query command support with multiple nodes
> ---
>
> Key: HDFS-17361
> URL: https://issues.apache.org/jira/browse/HDFS-17361
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, diskbalancer
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> For: https://issues.apache.org/jira/browse/HDFS-10821 mentioned, Query 
> command will support with multiple nodes.
> That means we can use command hdfs diskbalancer -query to print one or one 
> more datanodes status of the diskbalancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-02-02 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17370:
---

 Summary: Fix junit dependency for running parameterized tests in 
hadoop-hdfs-rbf
 Key: HDFS-17370
 URL: https://issues.apache.org/jira/browse/HDFS-17370
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


We need to add junit-jupiter-engine dependency for running parameterized tests 
in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17359) EC: recheck failed streamers should only after flushing all packets.

2024-02-01 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17359.
-
Fix Version/s: 3.3.9
   3.4.1
   3.5.0
   Resolution: Fixed

> EC: recheck failed streamers should only after flushing all packets.
> 
>
> Key: HDFS-17359
> URL: https://issues.apache.org/jira/browse/HDFS-17359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.4.1, 3.5.0
>
>
> In method DFSStripedOutputStream#checkStreamerFailures, we have below codes:
> {code:java}
>     Set newFailed = checkStreamers();
>     if (newFailed.size() == 0) {
>       return;
>     }    if (isNeedFlushAllPackets) {
>       // for healthy streamers, wait till all of them have fetched the new 
> block
>       // and flushed out all the enqueued packets.
>       flushAllInternals();
>     }
>     // recheck failed streamers again after the flush
>     newFailed = checkStreamers(); {code}
> We should better move the re-check logic into if condition to reduce useless 
> invocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-17348) Enhance Log when checkLocations in RecoveryTaskStriped

2024-01-30 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reopened HDFS-17348:
-

> Enhance Log when checkLocations in RecoveryTaskStriped
> --
>
> Key: HDFS-17348
> URL: https://issues.apache.org/jira/browse/HDFS-17348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> Enhance IOE log to better debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17348) Enhance Log when checkLocations in RecoveryTaskStriped

2024-01-30 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17348.
-
Resolution: Duplicate

I'd like to change the status to duplicate if HDFS-17358 fixes the issue.

> Enhance Log when checkLocations in RecoveryTaskStriped
> --
>
> Key: HDFS-17348
> URL: https://issues.apache.org/jira/browse/HDFS-17348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> Enhance IOE log to better debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-01-28 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17362:
---

 Summary: RBF: RouterObserverReadProxyProvider should use 
ConfiguredFailoverProxyProvider internally
 Key: HDFS-17362
 URL: https://issues.apache.org/jira/browse/HDFS-17362
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
we are to align RouterObserverReadProxyProvider with ObserverReadProxyProvider, 
RouterObserverReadProxyProvider should internally use 
ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has an 
issue with resolving HA configurations. (For example, IPFailoverProxyProvider 
cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17312.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17315) Optimize the namenode format code logic.

2024-01-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17315.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17277) Delete invalid code logic in namenode format

2023-12-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17277.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17301.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add read and write dataXceiver threads count metrics to datanode.
> -
>
> Key: HDFS-17301
> URL: https://issues.apache.org/jira/browse/HDFS-17301
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> # The DataNodeActiveXeiversCount metric contains the number of threads of all 
> Op types.
>  # In most cases, we focus more on the number of read and write dataXceiver 
> threads, so add read and write dataXceiver threads count metrics to datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17297.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17284) Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery

2023-12-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17284.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks 
> during block recovery
> --
>
> Key: HDFS-17284
> URL: https://issues.apache.org/jira/browse/HDFS-17284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks 
> during block recovery



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender

2023-12-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17298.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix NPE in DataNode.handleBadBlock and BlockSender
> --
>
> Key: HDFS-17298
> URL: https://issues.apache.org/jira/browse/HDFS-17298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There are some NPE issues on the DataNode side of our online environment.
> The detailed exception information is
> {code:java}
> 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending 
> block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK 
> operation  src: /xxx:41452 dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> if (!fromScanner && blockScanner.isEnabled()) {
>   // data.getVolume(block) is null
>   blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(),
>   block);
> } 
> {code}
> {code:java}
> 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - 
> xxx:50010:DataXceiver error processing COPY_BLOCK operation  src: /xxx:61052 
> dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> // Obtain a reference before reading data
> volumeRef = datanode.data.getVolume(block).obtainReference(); 
> //datanode.data.getVolume(block) is null  
> {code}
> We need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17294.
-
Resolution: Fixed

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17156.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16967) RBF: File based state stores should allow concurrent access to the records

2023-04-04 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16967.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> RBF: File based state stores should allow concurrent access to the records
> --
>
> Key: HDFS-16967
> URL: https://issues.apache.org/jira/browse/HDFS-16967
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> File based state store implementations (StateStoreFileImpl and 
> StateStoreFileSystemImpl) should allow updating as well as reading of the 
> state store records concurrently rather than serially. Concurrent access to 
> the record files on the hdfs based store seems to be improving the state 
> store cache loading performance by more than 10x.
> For instance, in order to maintain data integrity, when any mount table 
> record(s) is updated, the cache is reloaded. This reload operation seems to 
> be able to gain significant performance improvement by the concurrent access 
> of the mount table records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16958) EC: Fix bug in processing EC excess redundancy

2023-03-27 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16958.
-
Resolution: Not A Problem

> EC: Fix bug in processing EC excess redundancy 
> ---
>
> Key: HDFS-16958
> URL: https://issues.apache.org/jira/browse/HDFS-16958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When processing excess redundancy, the number of internal blocks is computed 
> by traversing `nonExcess`. This way is not accurate, because `nonExcess` 
> excludes replicas in abnormal states, such as corrupt ones, or maintenance 
> ones. `numOfTarget` may be smaller than the actual value, which will result 
> in inaccurate generated `excessTypes`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16903.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix javadoc of Class LightWeightResizableGSet
> -
>
> Key: HDFS-16903
> URL: https://issues.apache.org/jira/browse/HDFS-16903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Assignee: ZhangHB
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> After HDFS-16429 (Add DataSetLockManager to manage fine-grain locks for 
> FsDataSetImpl), the Class LightWeightResizableGSet is thread-safe. So we 
> should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16821.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix regression in HDFS-13522 that enables observer reads by default.
> 
>
> Key: HDFS-16821
> URL: https://issues.apache.org/jira/browse/HDFS-16821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Serving reads consistently from Observer Namenodes is a feature that was 
> introduced in HDFS-12943.
> Clients opt-into this feature by configuring the ObserverReadProxyProvider. 
> It is important that the opt-in is explicit because for third-party reads to 
> remain consistent, these clients then need to perform an msync before reads.
> In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus 
> enabling Observer reads for all clients by default. This breaks consistency 
> guarantees for clients that haven't opted into observer reads.
> [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354]
> We need to return to the old behavior of only using the ClientGSIContext when 
> users have explicitly opted into Observer reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16888) BlockManager#maxReplicationStreams, replicationStreamsHardLimit, blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be volatile

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16888.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be 
> volatile
> 
>
> Key: HDFS-16888
> URL: https://issues.apache.org/jira/browse/HDFS-16888
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout these 
> variables may be  writen by NameNode#reconfReplicationParameters then while 
> read by the other threads. 
> Thus they should be declared as volatile to make sure the "happens-before" 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16876) Garbage collect map entries in shared RouterStateIdContext using information from namenodeResolver instead of the map of active connectionPools.

2023-01-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16876.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Garbage collect map entries in shared RouterStateIdContext using information 
> from namenodeResolver instead of the map of active connectionPools.
> 
>
> Key: HDFS-16876
> URL: https://issues.apache.org/jira/browse/HDFS-16876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> An element in RouterStateIdContext#namespaceIdMap is deleted when there is no 
> connectionPool referencing the namespace. This is done by a thread in 
> ConnectionManager that cleans up stale connectionPools. I propose a less 
> aggressive approach, that is, cleaning up an entry when the router cannot 
> resolve a namenode belonging to the namespace.
> Some benefits of this approach are:
>  * Even when there are no active connections, the router still tracks a 
> recent state of the namenode. This will be beneficial for debugging.
>  * Simpler lifecycle for the map entries. The entries are long-lived.
>  * Few operations under the writeLock in ConnectionManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16889) Backport JIRAs related to RBF SBN to branch-3.3

2023-01-12 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-16889:
---

 Summary: Backport JIRAs related to RBF SBN to branch-3.3
 Key: HDFS-16889
 URL: https://issues.apache.org/jira/browse/HDFS-16889
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


This is an umbrella JIRA to backport RBF SBN to branch-3.3. There are some 
conflicts when trying to backport HDFS-13522 and HDFS-16767, the main 
implementations of RBF SBN. Currently, to solve the conflicts, we need to 
backport the following JIRAs sequentially. (Thanks [~simbadzina] for the 
information.)
 # HDFS-14090
 # HDFS-15417
 # HDFS-16296
 # HDFS-16302
 # HDFS-15757
 # HDFS-13274
 # Then HDFS-13522
 # HDFS-16065
 # HDFS-16313
 # HDFS-16273
 # Then HDFS-16767 + other bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16809) EC striped block is not sufficient when doing in maintenance

2022-12-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16809.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> EC striped block is not sufficient when doing in maintenance
> 
>
> Key: HDFS-16809
> URL: https://issues.apache.org/jira/browse/HDFS-16809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: dingshun
>Assignee: dingshun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When doing maintenance, ec striped block is not sufficient, which will lead 
> to miss block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16833) NameNode should log internal EC blocks instead of the EC block group when it receives block reports

2022-11-06 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16833.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> NameNode should log internal EC blocks instead of the EC block group when it 
> receives block reports
> ---
>
> Key: HDFS-16833
> URL: https://issues.apache.org/jira/browse/HDFS-16833
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When creating an EC file, NN only logs the EC block group for each of the 
> internal EC block. 
> {noformat}
> // replica file
> 2022-11-04 10:38:20,124 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11007 is added to blk_1073741825_1001 (size=1024)
> 2022-11-04 10:38:20,126 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11004 is added to blk_1073741825_1001 (size=1024)
> 2022-11-04 10:38:20,126 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11001 is added to blk_1073741825_1001 (size=1024)
> // ec file
> 2022-11-04 10:39:02,376 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11008 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,381 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11000 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,383 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11001 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,385 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11007 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,387 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11009 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,389 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11004 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,390 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11006 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,393 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11002 is added to blk_-9223372036854775792_1002 (size=0)
> 2022-11-04 10:39:02,395 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
> 127.0.0.1:11003 is added to blk_-9223372036854775792_1002 (size=0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16833) NameNode should log internal EC blocks instead of the EC block group when it receives block reports

2022-11-03 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-16833:
---

 Summary: NameNode should log internal EC blocks instead of the EC 
block group when it receives block reports
 Key: HDFS-16833
 URL: https://issues.apache.org/jira/browse/HDFS-16833
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


When creating an EC file, NN only logs the EC block group for each of the 
internal EC block. 
{noformat}
// replica file
2022-11-04 10:38:20,124 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11007 is added to blk_1073741825_1001 (size=1024)
2022-11-04 10:38:20,126 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11004 is added to blk_1073741825_1001 (size=1024)
2022-11-04 10:38:20,126 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11001 is added to blk_1073741825_1001 (size=1024)

// ec file
2022-11-04 10:39:02,376 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11008 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,381 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11000 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,383 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11001 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,385 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11007 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,387 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11009 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,389 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11004 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,390 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11006 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,393 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11002 is added to blk_-9223372036854775792_1002 (size=0)
2022-11-04 10:39:02,395 [Block report processor] INFO  BlockStateChange 
(BlockManager.java:addStoredBlock(3633)) - BLOCK* addStoredBlock: 
127.0.0.1:11003 is added to blk_-9223372036854775792_1002 (size=0)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16822) HostRestrictingAuthorizationFilter should pass through requests if they don't access WebHDFS API

2022-10-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16822.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> HostRestrictingAuthorizationFilter should pass through requests if they don't 
> access WebHDFS API
> 
>
> Key: HDFS-16822
> URL: https://issues.apache.org/jira/browse/HDFS-16822
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> After HDFS-15320, HostRestrictingAuthorizationFilter returns 404 error if the 
> request doesn't access WebHDFS API.
> With this change, the endpoints, such as /conf and /jmx are no longer 
> visible. This is very inconvenient for administrators.
> HostRestrictingAuthorizationFilter should pass through requests if they don't 
> access WebHDFS API.
> This issue is reported by [~hadachi].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16822) HostRestrictingAuthorizationFilter should pass through requests if they don't access WebHDFS API

2022-10-25 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-16822:
---

 Summary: HostRestrictingAuthorizationFilter should pass through 
requests if they don't access WebHDFS API
 Key: HDFS-16822
 URL: https://issues.apache.org/jira/browse/HDFS-16822
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


After HDFS-15320, HostRestrictingAuthorizationFilter returns 404 error if it 
receives a request that doesn't access WebHDFS API.
With this change, the endpoints, such as /conf and /jmx are no longer visible. 
This is very inconvenient for administrators.
HostRestrictingAuthorizationFilter should pass through requests if they don't 
access WebHDFS API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16776) Erasure Coding: The length of targets should be checked when DN gets a reconstruction task

2022-09-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16776.
-
Fix Version/s: 3.4.0
   3.3.9
   3.2.5
   Resolution: Fixed

> Erasure Coding: The length of targets should be checked when DN gets a 
> reconstruction task
> --
>
> Key: HDFS-16776
> URL: https://issues.apache.org/jira/browse/HDFS-16776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kidd5368
>Assignee: Kidd5368
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9, 3.2.5
>
>
> The length of targets should be checked when DN gets a EC reconstruction 
> task.For some reason (HDFS-14768, HDFS-16739) , the length of targets will be 
> larger than additionalReplRequired which causes some elements in targets get 
> the default value 0. It may trigger the bug which leads to the data 
> corrupttion just like HDFS-14768.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16579) Fix build failure for TestBlockManager on branch-3.2

2022-05-15 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16579.
-
Fix Version/s: 3.2.4
   Resolution: Fixed

> Fix build failure for TestBlockManager on branch-3.2
> 
>
> Key: HDFS-16579
> URL: https://issues.apache.org/jira/browse/HDFS-16579
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.4
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Fix build failure for TestBlockManager on branch-3.2. See HDFS-16552.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16519) Add throttler to EC reconstruction

2022-04-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16519.
-
Fix Version/s: 3.4.0
   3.3.4
   Resolution: Fixed

> Add throttler to EC reconstruction
> --
>
> Key: HDFS-16519
> URL: https://issues.apache.org/jira/browse/HDFS-16519
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ec
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> HDFS already have throttlers for data transfer(replication) and balancer, the 
> throttlers reduce the impact of these background procedures to user 
> read/write.
> We should add a throttler to EC background reconstruction too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16552) Fix NPE for TestBlockManager

2022-04-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16552.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
   Resolution: Fixed

> Fix NPE for TestBlockManager
> 
>
> Key: HDFS-16552
> URL: https://issues.apache.org/jira/browse/HDFS-16552
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a NPE in BlockManager when run 
> TestBlockManager#testSkipReconstructionWithManyBusyNodes2. Because 
> NameNodeMetrics is not initialized in this unit test.
>  
> Related ci link, see 
> [this|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt].
> {code:java}
> [ERROR] Tests run: 34, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 30.088 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
> [ERROR] 
> testSkipReconstructionWithManyBusyNodes2(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager)
>   Time elapsed: 2.783 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.scheduleReconstruction(BlockManager.java:2171)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSkipReconstructionWithManyBusyNodes2(TestBlockManager.java:947)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16510) Fix EC decommission when rack is not enough

2022-04-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16510.
-
Resolution: Duplicate

I'm closing this jira as it is superceded by HDFS-16456. Thanks for trying to 
solve this issue, [~cndaimin].

> Fix EC decommission when rack is not enough
> ---
>
> Key: HDFS-16510
> URL: https://issues.apache.org/jira/browse/HDFS-16510
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement, ec
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The decommission always fail when we start decommission multiple nodes on a 
> cluster whose racks is not enough, a cluster with 6 racks to deploy RS-6-3, 
> for example.
> We find that those decommission nodes cover at least a rack, it's actulaly 
> like we are decommission one or more racks. And rack decommission is not well 
> supported currently, especially for cluster whose racks is not enough already.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-20 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16544.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Assignee: qinyuren
   Resolution: Fixed

> EC decoding failed due to invalid buffer
> 
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file 
> have *one data block* and *the first parity block* corrupted, this error will 
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
> allowing null    at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>     at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
>     at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>     at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
>     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
> {code}
>  
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
> are corrupted.
>  # The readers for block[0] and block[6] will be closed after reading the 
> first stripe of an EC file;
>  # When the client reading the second stripe of the EC file, it will trigger 
> #prepareParityChunk for block[6]. 
>  # The decodeInputs[6] will not be constructed because the reader for 
> block[6] was closed.
>  
> {code:java}
> boolean prepareParityChunk(int index) {
>   Preconditions.checkState(index >= dataBlkNum
>   && alignedStripe.chunks[index] == null);
>   if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
> alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
> // we have failed the block reader before
> return false;
>   }
>   final int parityIndex = index - dataBlkNum;
>   ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
>   buf.position(cellSize * parityIndex);
>   buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
>   decodeInputs[index] =
>   new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
>   alignedStripe.chunks[index] =
>   new StripingChunk(decodeInputs[index].getBuffer());
>   return true;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16538) EC decoding failed due to not enough valid inputs

2022-04-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16538.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Assignee: qinyuren
   Resolution: Fixed

>  EC decoding failed due to not enough valid inputs
> --
>
> Key: HDFS-16538
> URL: https://issues.apache.org/jira/browse/HDFS-16538
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, we found this error if the #StripeReader.readStripe() have more 
> than one block read failed.
> We use the EC policy ec(6+3) in our cluster.
> {code:java}
> Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid 
> inputs are provided, not recoverable
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>         at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462)
>         at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>         at 
> org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406)
>         at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327)
>         at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892)
>         at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
>         at java.base/java.io.DataInputStream.read(DataInputStream.java:149) 
> {code}
>  
> {code:java}
> while (!futures.isEmpty()) {
>   try {
> StripingChunkReadResult r = StripedBlockUtil
> .getNextCompletedStripedRead(service, futures, 0);
> dfsStripedInputStream.updateReadStats(r.getReadStats());
> DFSClient.LOG.debug("Read task returned: {}, for stripe {}",
> r, alignedStripe);
> StripingChunk returnedChunk = alignedStripe.chunks[r.index];
> Preconditions.checkNotNull(returnedChunk);
> Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);
> if (r.state == StripingChunkReadResult.SUCCESSFUL) {
>   returnedChunk.state = StripingChunk.FETCHED;
>   alignedStripe.fetchedChunksNum++;
>   updateState4SuccessRead(r);
>   if (alignedStripe.fetchedChunksNum == dataBlkNum) {
> clearFutures();
> break;
>   }
> } else {
>   returnedChunk.state = StripingChunk.MISSING;
>   // close the corresponding reader
>   dfsStripedInputStream.closeReader(readerInfos[r.index]);
>   final int missing = alignedStripe.missingChunksNum;
>   alignedStripe.missingChunksNum++;
>   checkMissingBlocks();
>   readDataForDecoding();
>   readParityChunks(alignedStripe.missingChunksNum - missing);
> } {code}
> This error can be trigger by #StatefulStripeReader.decode.
> The reason is that:
>  # If there are more than one *data block* read failed, the 
> #readDataForDecoding will be called multiple times;
>  # The *decodeInputs array* will be initialized repeatedly.
>  # The *parity* *data* in *decodeInputs array* which filled by 
> #readParityChunks previously will be set to null.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16479) EC: NameNode should not send a reconstruction work when the source datanodes are insufficient

2022-04-13 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16479.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Assignee: Takanobu Asanuma
   Resolution: Fixed

Resolved. Thanks for reporting the issue, [~yuanbo].

> EC: NameNode should not send a reconstruction work when the source datanodes 
> are insufficient
> -
>
> Key: HDFS-16479
> URL: https://issues.apache.org/jira/browse/HDFS-16479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: Yuanbo Liu
>Assignee: Takanobu Asanuma
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We got this exception from DataNodes
> {color:#707070}java.lang.IllegalArgumentException: No enough live striped 
> blocks.{color}
> {color:#707070}        at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:141){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.(StripedReader.java:128){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReconstructor.(StripedReconstructor.java:135){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.(StripedBlockReconstructor.java:41){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:133){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:796){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1314){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1360){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1287){color}
> After going through the code of ErasureCodingWork.java, we found
> {code:java}
> targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded( new 
> ExtendedBlock(blockPoolId, stripedBlk), getSrcNodes(), targets, 
> getLiveBlockIndicies(), stripedBlk.getErasureCodingPolicy()); 
> {code}
>  
> the liveBusyBlockIndicies is not considered as liveBlockIndicies, hence 
> erasure coding reconstruction sometimes will fail as 'No enough live striped 
> blocks'.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16484.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
   Resolution: Fixed

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2022-04-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-14969.
-
Resolution: Duplicate

This issue seems to be resolved by HADOOP-17116.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>  Labels: multi-sbnn
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped

2022-04-08 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16427.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

Sorry, I forgot to close this jira.

> Add debug log for BlockManager#chooseExcessRedundancyStriped
> 
>
> Key: HDFS-16427
> URL: https://issues.apache.org/jira/browse/HDFS-16427
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> To solve this issue 
> [HDFS-16420|https://issues.apache.org/jira/browse/HDFS-16420] , we added some 
> debug logs, which were also necessary.  If there are other problems, we set 
> the log level to DEBUG, which is convenient to analyze it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16457) Make fs.getspaceused.classname reconfigurable

2022-04-07 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16457.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Make fs.getspaceused.classname reconfigurable
> -
>
> Key: HDFS-16457
> URL: https://issues.apache.org/jira/browse/HDFS-16457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Now if we want to switch fs.getspaceused.classname we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16413) Reconfig dfs usage parameters for datanode

2022-03-30 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16413.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Reconfig dfs usage parameters for datanode
> --
>
> Key: HDFS-16413
> URL: https://issues.apache.org/jira/browse/HDFS-16413
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Reconfig dfs usage parameters for datanode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16434) Add opname to read/write lock for remaining operations

2022-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16434.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add opname to read/write lock for remaining operations
> --
>
> Key: HDFS-16434
> URL: https://issues.apache.org/jira/browse/HDFS-16434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In this issue at 
> [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], we add opname 
> to read and write locks. However, there are still many operations that have 
> not been completed. When analyzing some operations that hold locks for a long 
> time, we can only find specific methods through stack. I suggest that these 
> remaining operations be completed to facilitate later performance 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16501) Print the exception when reporting a bad block

2022-03-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16501.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
 Assignee: qinyuren
   Resolution: Fixed

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16397) Reconfig slow disk parameters for datanode

2022-02-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16397.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Merged to trunk. I will try to backport it into branch-3.3 later.

> Reconfig slow disk parameters for datanode
> --
>
> Key: HDFS-16397
> URL: https://issues.apache.org/jira/browse/HDFS-16397
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In large clusters, rolling restart datanodes takes long time. We can make 
> slow peers parameters and slow disks parameters in datanode reconfigurable to 
> facilitate cluster operation and maintenance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16461) Expose JournalNode storage info in the jmx metrics

2022-02-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16461.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Merged to trunk. I didn't backport it into lower branches since it may breaks 
the compatibility. See: HDFS-16027

> Expose JournalNode storage info in the jmx metrics
> --
>
> Key: HDFS-16461
> URL: https://issues.apache.org/jira/browse/HDFS-16461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should expose the list of storage info of JournalNode's journals 
> (including layout version, namespace id, cluster id and creation time of the 
> FS state) in the jmx metrics.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16396) Reconfig slow peer parameters for datanode

2022-02-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16396.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Reconfig slow peer parameters for datanode
> --
>
> Key: HDFS-16396
> URL: https://issues.apache.org/jira/browse/HDFS-16396
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> In large clusters, rolling restart datanodes takes a long time. We can make 
> slow peers parameters and slow disks parameters in datanode reconfigurable to 
> facilitate cluster operation and maintenance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16398) Reconfig block report parameters for datanode

2022-01-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16398.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> Reconfig block report parameters for datanode
> -
>
> Key: HDFS-16398
> URL: https://issues.apache.org/jira/browse/HDFS-16398
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16423.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16399) Reconfig cache report parameters for datanode

2022-01-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16399.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Reconfig cache report parameters for datanode
> -
>
> Key: HDFS-16399
> URL: https://issues.apache.org/jira/browse/HDFS-16399
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16426) fix nextBlockReportTime when trigger full block report force

2022-01-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16426.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> fix nextBlockReportTime when trigger full block report force
> 
>
> Key: HDFS-16426
> URL: https://issues.apache.org/jira/browse/HDFS-16426
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When we trigger full block report force by command line, the next block 
> report time will be set like this:
> nextBlockReportTime.getAndAdd(blockReportIntervalMs);
> nextBlockReportTime will larger than blockReportIntervalMs.
> If we trigger full block report twice, the nextBlockReportTime will larger 
> than 2 * blockReportIntervalMs. This is obviously not what we want.
> We fix the nextBlockReportTime = now + blockReportIntervalMs after full block 
> report trigger by command line.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16297) striped block was deleted less than 1 replication

2022-01-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16297.
-
Resolution: Duplicate

It seems this issue is the same as HDFS-16420. So I'm closing it.

If the problem still exists with HDFS-16420, please reopen it. Thanks.

> striped block was deleted less than 1 replication
> -
>
> Key: HDFS-16297
> URL: https://issues.apache.org/jira/browse/HDFS-16297
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement, namanode
>Affects Versions: 3.2.1
>Reporter: chan
>Priority: Major
>
> In my cluster,balancer is open,i found a ec file miss block(6-3),four blocks 
> are deleted less than 1 replication, i think it`s dangerous,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks

2022-01-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16420.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> Avoid deleting unique data blocks when deleting redundancy striped blocks
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liubingxing
>Assignee: Jackson Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (HDFS-16400) Reconfig DataXceiver parameters for datanode

2022-01-13 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16400.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Reconfig DataXceiver parameters for datanode
> 
>
> Key: HDFS-16400
> URL: https://issues.apache.org/jira/browse/HDFS-16400
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> To avoid frequent rolling restarts of the DN, we should make DataXceiver 
> parameters reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16371) Exclude slow disks when choosing volume

2022-01-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16371.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Exclude slow disks when choosing volume
> ---
>
> Key: HDFS-16371
> URL: https://issues.apache.org/jira/browse/HDFS-16371
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently, the datanode can detect slow disks. See HDFS-11461.
> And after HDFS-16311, the slow disk information we collected is more accurate.
> So we can exclude these slow disks according to some rules when choosing 
> volume. This will prevents some slow disks from affecting the throughput of 
> the whole datanode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16348) Mark slownode as badnode to recover pipeline

2021-12-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16348.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Mark slownode as badnode to recover pipeline
> 
>
> Key: HDFS-16348
> URL: https://issues.apache.org/jira/browse/HDFS-16348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> In HDFS-16320, the DataNode can retrieve the SLOW status from each NameNode. 
> This ticket is to send this information back to Clients who are writing 
> blocks. If a Clients noticed the pipeline is build on a slownode, he/she can 
> choose to mark the slownode as a badnode to exclude the node or rebuild a 
> pipeline.
> In order to avoid the false positives, we added a config of "threshold", only 
> clients continuously receives slownode reply from the same node will the node 
> be marked as SLOW.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16385) Fix Datanode retrieve slownode information bug.

2021-12-20 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16385.
-
Resolution: Fixed

Merged the PR. Thanks for your contribution, [~JacksonWang]. I added you to a 
contributor role.

> Fix Datanode retrieve slownode information bug.
> ---
>
> Key: HDFS-16385
> URL: https://issues.apache.org/jira/browse/HDFS-16385
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jackson Wang
>Assignee: Jackson Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. 
> But namenode did not set isSlowNode to HeartbeatResponseProto in 
> DatanodeProtocolServerSideTranslatorPB#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16377) Should CheckNotNull before access FsDatasetSpi

2021-12-15 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16377.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> Should CheckNotNull before access FsDatasetSpi
> --
>
> Key: HDFS-16377
> URL: https://issues.apache.org/jira/browse/HDFS-16377
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2021-12-10-19-19-22-957.png, 
> image-2021-12-10-19-20-58-022.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When starting the DN, we found NPE in the staring DN's log, as follows:
> !image-2021-12-10-19-19-22-957.png|width=909,height=126!
> The logs of the upstream DN are as follows:
> !image-2021-12-10-19-20-58-022.png|width=905,height=239!
> This is mainly because *FsDatasetSpi* has not been initialized at the time of 
> access. 
> I noticed that checkNotNull is already done in these two 
> method({*}DataNode#getBlockLocalPathInfo{*} and 
> {*}DataNode#getVolumeInfo{*}). So we should add it to other places(interfaces 
> that clients and other DN can access directly) so that we can add a message 
> when throwing exceptions.
> Therefore, the client and the upstream DN know that FsDatasetSpi has not been 
> initialized, rather than blindly unaware of the specific cause of the NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16375) The FBR lease ID should be exposed to the log

2021-12-15 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16375.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> The FBR lease ID should be exposed to the log
> -
>
> Key: HDFS-16375
> URL: https://issues.apache.org/jira/browse/HDFS-16375
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our Hadoop version is 3.1.0. We encountered HDFS-12914 and HDFS-14314 in the 
> production environment.
> When locating the problem, the *fullBrLeaseId* was not exposed in the log, 
> which caused some difficulties. We should expose it to the log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16373) Fix MiniDFSCluster restart in case of multiple namenodes

2021-12-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16373.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> Fix MiniDFSCluster restart in case of multiple namenodes
> 
>
> Key: HDFS-16373
> URL: https://issues.apache.org/jira/browse/HDFS-16373
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In case of multiple namenodes, if more than one namenode are restarted, it 
> fails. Since the restartNamenode checks for all the namenodes to get up, But 
> if 2 namenodes are down, and we restart one, the other namenode won't be up, 
> so restart fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16327) Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable

2021-12-13 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16327.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable
> ---
>
> Key: HDFS-16327
> URL: https://issues.apache.org/jira/browse/HDFS-16327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As the HDFS cluster expands or shrinks, the number of slow nodes to be 
> filtered must be dynamically adjusted. So we should make 
> DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY reconfigurable.
> See HDFS-15879.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16333) fix balancer bug when transfer an EC block

2021-12-08 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16333.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2021-12-02 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16331.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Make dfs.blockreport.intervalMsec reconfigurable
> 
>
> Key: HDFS-16331
> URL: https://issues.apache.org/jira/browse/HDFS-16331
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-11-18-09-33-24-236.png, 
> image-2021-11-18-09-35-35-400.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> We have a cold data cluster, which stores as EC policy. There are 24 fast 
> disks on each node and each disk is 7 TB. 
> Recently, many nodes have more than 10 million blocks, and the interval of 
> FBR is 6h as default. Frequent FBR caused great pressure on NN.
> !image-2021-11-18-09-35-35-400.png|width=334,height=229!
> !image-2021-11-18-09-33-24-236.png|width=566,height=159!
> We want to increase the interval of FBR, but have to rolling restart the DNs, 
> this operation is very heavy. In this scenario, it is necessary to make 
> _dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16344) Improve DirectoryScanner.Stats#toString

2021-11-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16344.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> Improve DirectoryScanner.Stats#toString
> ---
>
> Key: HDFS-16344
> URL: https://issues.apache.org/jira/browse/HDFS-16344
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2021-11-21-19-35-16-838.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Improve DirectoryScanner.Stats#toString.
> !image-2021-11-21-19-35-16-838.png|width=1019,height=71!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16339) Show the threshold when mover threads quota is exceeded

2021-11-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16339.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> Show the threshold when mover threads quota is exceeded
> ---
>
> Key: HDFS-16339
> URL: https://issues.apache.org/jira/browse/HDFS-16339
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2021-11-20-17-23-04-924.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Show the threshold when mover threads quota is exceeded in 
> DataXceiver#replaceBlock and DataXceiver#copyBlock.
> !image-2021-11-20-17-23-04-924.png|width=1233,height=124!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16335) Fix HDFSCommands.md

2021-11-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16335.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> Fix HDFSCommands.md
> ---
>
> Key: HDFS-16335
> URL: https://issues.apache.org/jira/browse/HDFS-16335
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Fix HDFSCommands.md.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16326) Simplify the code for DiskBalancer

2021-11-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16326.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

> Simplify the code for DiskBalancer
> --
>
> Key: HDFS-16326
> URL: https://issues.apache.org/jira/browse/HDFS-16326
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Simplify the code for DiskBalancer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16310) RBF: Add client port to CallerContext for Router

2021-11-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16310.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> RBF: Add client port to CallerContext for Router
> 
>
> Key: HDFS-16310
> URL: https://issues.apache.org/jira/browse/HDFS-16310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We mentioned in [HDFS-16266|https://issues.apache.org/jira/browse/HDFS-16266] 
> that adding the client port to the CallerContext of the Router.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2021-11-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16323.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-15 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16315.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16311) Metric metadataOperationRate calculation error in DataNodeVolumeMetrics

2021-11-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16311.
-
Fix Version/s: 3.4.0
   2.10.2
   3.3.2
   3.2.4
   Resolution: Fixed

> Metric metadataOperationRate calculation error in DataNodeVolumeMetrics
> ---
>
> Key: HDFS-16311
> URL: https://issues.apache.org/jira/browse/HDFS-16311
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-09-20-22-26-828.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Metric metadataOperationRate calculation error in 
> DataNodeVolumeMetrics#addFileIoError, causing MetadataOperationRateAvgTime is 
> very large in some cases.
> !image-2021-11-09-20-22-26-828.png|width=450,height=205!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16312) Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents

2021-11-10 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16312.
-
Fix Version/s: 3.4.0
   2.10.2
   3.3.2
   3.2.4
   Resolution: Fixed

> Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents
> 
>
> Key: HDFS-16312
> URL: https://issues.apache.org/jira/browse/HDFS-16312
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16298) Improve error msg for BlockMissingException

2021-11-10 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16298.
-
Fix Version/s: 3.4.0
   2.10.2
   3.3.2
   3.2.4
   Resolution: Fixed

> Improve error msg for BlockMissingException
> ---
>
> Key: HDFS-16298
> URL: https://issues.apache.org/jira/browse/HDFS-16298
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-04-15-28-05-886.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When the client fails to obtain a block, a BlockMissingException is thrown. 
> To analyze the issues, we can add the relevant location information to error 
> msg here.
> !image-2021-11-04-15-28-05-886.png|width=624,height=144!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16301) Improve BenchmarkThroughput#SIZE naming standardization

2021-11-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16301.
-
Fix Version/s: 3.4.0
   2.10.2
   3.3.2
   3.2.4
   Resolution: Fixed

> Improve BenchmarkThroughput#SIZE naming standardization
> ---
>
> Key: HDFS-16301
> URL: https://issues.apache.org/jira/browse/HDFS-16301
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks, test
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the BenchmarkThroughput#run() method, there is a local variable: SIZE. 
> This variable is used in a local scope, and it may be more appropriate to 
> change it to a lowercase name.
> public int run(String[] args) throws IOException {
> ..
> long SIZE = conf.getLong("dfsthroughput.file.size",
>  10L * 1024 * 1024 * 1024);
> ..
> }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16299) Fix bug for TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics

2021-11-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16299.
-
Fix Version/s: 3.4.0
   3.3.2
   3.2.4
   Resolution: Fixed

> Fix bug for TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics
> -
>
> Key: HDFS-16299
> URL: https://issues.apache.org/jira/browse/HDFS-16299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fix bug for TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16273) RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics

2021-11-07 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16273.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics
> -
>
> Key: HDFS-16273
> URL: https://issues.apache.org/jira/browse/HDFS-16273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add the availableHandlerOnPerNs metrics to monitor whether the number of 
> handlers configured for each NS is reasonable when using 
> RouterRpcFairnessPolicyController.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16294) Remove invalid DataNode#CONFIG_PROPERTY_SIMULATED

2021-11-04 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16294.
-
Fix Version/s: 3.2.4
   3.3.2
   2.10.2
   3.4.0
   Resolution: Fixed

> Remove invalid DataNode#CONFIG_PROPERTY_SIMULATED
> -
>
> Key: HDFS-16294
> URL: https://issues.apache.org/jira/browse/HDFS-16294
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: screenshot.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As early as when HDFS-2907 was resolved, 
> SimulatedFSDataset#CONFIG_PROPERTY_SIMULATED was removed. Replaced by: 
> SimulatedFSDataset#Factory and 
> DFSConfigKyes#DFS_DATANODE_FSDATASET_FACTORY_KEY.
> However, the introduction to CONFIG_PROPERTY_SIMULATED is still retained in 
> the DataNode.
>  !screenshot.png! 
> Here are some traces related to HDFS-2907:
> https://issues.apache.org/jira/browse/HDFS-2907
> https://github.com/apache/hadoop/commit/efbc58f30c8e8d9f26c6a82d32d53716fb2b222a#diff-ab77612831fcb9a35e14c294417f0919c7a30c0cef9a4aec6b32d5f2df957020



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16266) Add remote port information to HDFS audit log

2021-11-03 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16266.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add remote port information to HDFS audit log
> -
>
> Key: HDFS-16266
> URL: https://issues.apache.org/jira/browse/HDFS-16266
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> In our production environment, we occasionally encounter a problem where a 
> user submits an abnormal computation task, causing a sudden flood of 
> requests, which causes the queueTime and processingTime of the Namenode to 
> rise very high, causing a large backlog of tasks.
> We usually locate and kill specific Spark, Flink, or MapReduce tasks based on 
> metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but 
> there is no port information, so it is difficult to locate specific processes 
> sometimes. Therefore, I propose that we add the port information to the audit 
> log, so that we can easily track the upstream process.
> Currently, some projects contain port information in audit logs, such as 
> Hbase and Alluxio. I think it is also necessary to add port information for 
> HDFS audit logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16279) Print detail datanode info when process first storage report

2021-10-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16279.
-
Fix Version/s: 3.2.4
   3.3.2
   3.4.0
   Resolution: Fixed

> Print detail datanode info when process first storage report
> 
>
> Key: HDFS-16279
> URL: https://issues.apache.org/jira/browse/HDFS-16279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-10-19-20-37-55-850.png
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Print detail datanode info when process block report.
> !image-2021-10-19-20-37-55-850.png|width=547,height=98!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15018) DataNode doesn't shutdown although the number of failed disks reaches dfs.datanode.failed.volumes.tolerated

2021-10-06 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15018.
-
Resolution: Duplicate

> DataNode doesn't shutdown although the number of failed disks reaches 
> dfs.datanode.failed.volumes.tolerated
> ---
>
> Key: HDFS-15018
> URL: https://issues.apache.org/jira/browse/HDFS-15018
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.3
> Environment: HDP-2.6.5
>Reporter: Toshihiro Suzuki
>Priority: Major
> Attachments: thread_dumps.txt
>
>
> In our case, we set dfs.datanode.failed.volumes.tolerated=0 but a DataNode 
> didn't shutdown when a disk in the DataNode host got failed for some reason.
> The the following log messages were shown in the DataNode log which indicates 
> the DataNode detected the disk failure, but the DataNode didn't shutdown:
> {code}
> 2019-09-17T13:15:43.262-0400 WARN 
> org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskErrorAsync callback 
> got 1 failed volumes: [/data2/hdfs/current]
> 2019-09-17T13:15:43.262-0400 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockScanner: Removing scanner for 
> volume /data2/hdfs (StorageID DS-329dec9d-a476-4334-9570-651a7e4d1f44)
> 2019-09-17T13:15:43.263-0400 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
> VolumeScanner(/data2/hdfs, DS-329dec9d-a476-4334-9570-651a7e4d1f44) exiting.
> {code}
> Looking at the HDFS code, it looks like when the DataNode detects a disk 
> failure, DataNode waits until the volume reference of the disk is released.
> https://github.com/hortonworks/hadoop/blob/HDP-2.6.5.0-292-tag/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L246
> I'm suspecting that the volume reference is not released after the failure 
> detection, but not sure the reason.
> And we took thread dumps when the issue was happening. It looks like the 
> following thread is waiting for the volume reference of the disk to be 
> released:
> {code}
> "pool-4-thread-1" #174 daemon prio=5 os_prio=0 tid=0x7f9e7c7bf800 
> nid=0x8325 in Object.wait() [0x7f9e629cb000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:262)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.handleVolumeFailures(FsVolumeList.java:246)
> - locked <0x000670559278> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.handleVolumeFailures(FsDatasetImpl.java:2178)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.handleVolumeFailures(DataNode.java:3410)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$100(DataNode.java:248)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$4.call(DataNode.java:2013)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.invokeCallback(DatasetVolumeChecker.java:394)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.cleanup(DatasetVolumeChecker.java:387)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.onFailure(DatasetVolumeChecker.java:370)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.executeListener(AbstractFuture.java:991)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.complete(AbstractFuture.java:885)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.setException(AbstractFuture.java:739)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.TimeoutFuture$Fire.run(TimeoutFuture.java:137)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Resolved] (HDFS-16203) Discover datanodes with unbalanced block pool usage by the standard deviation

2021-09-15 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16203.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Discover datanodes with unbalanced block pool usage by the standard deviation
> -
>
> Key: HDFS-16203
> URL: https://issues.apache.org/jira/browse/HDFS-16203
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-09-01-19-16-27-172.png
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> *Discover datanodes with unbalanced volume usage by the standard deviation.*
> *In some scenarios, we may cause unbalanced datanode disk usage:*
>  1. Repair the damaged disk and make it online again.
>  2. Add disks to some Datanodes.
>  3. Some disks are damaged, resulting in slow data writing.
>  4. Use some custom volume choosing policies.
> In the case of unbalanced disk usage, a sudden increase in datanode write 
> traffic may result in busy disk I/O with low volume usage, resulting in 
> decreased throughput across datanodes.
> We need to find these nodes in time to do diskBalance, or other processing. 
> Based on the volume usage of each datanode, we can calculate the standard 
> deviation of the volume usage. The more unbalanced the volume, the higher the 
> standard deviation.
> *We can display the result on the Web of namenode, and then sorting directly 
> to find the nodes where the volumes usages are unbalanced.*
> *{color:#172b4d}This interface is only used to obtain metrics and does not 
> adversely affect namenode performance.{color}*
>  
> {color:#172b4d}!image-2021-09-01-19-16-27-172.png|width=581,height=216!{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16076) Avoid using slow DataNodes for reading by sorting locations

2021-06-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16076.
-
Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

> Avoid using slow DataNodes for reading by sorting locations
> ---
>
> Key: HDFS-16076
> URL: https://issues.apache.org/jira/browse/HDFS-16076
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After sorting the expected location list will be: live -> slow -> stale -> 
> staleAndSlow -> entering_maintenance -> decommissioned. This reduces the 
> probability that slow nodes will be used for reading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-16068:
---

 Summary: WebHdfsFileSystem has a possible connection leak in 
connection with HttpFS
 Key: HDFS-16068
 URL: https://issues.apache.org/jira/browse/HDFS-16068
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
after the filesystems are closed until GC runs. After investigating it for a 
while, I found that there is a potential connection leak in WebHdfsFileSystem.
{code:java}
// Close both the InputStream and the connection.
@VisibleForTesting
void closeInputStream(RunnerState rs) throws IOException {
  if (in != null) {
IOUtils.close(cachedConnection);
in = null;
  }
  cachedConnection = null;
  runnerState = rs;
}
{code}
In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
is not null, {{cachedConnection}} doesn't close and the connection remains. I 
think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16057.
-
Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2021-06-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16054.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16048) RBF: Print network topology on the router web

2021-06-08 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16048.
-
Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

> RBF: Print network topology on the router web
> -
>
> Key: HDFS-16048
> URL: https://issues.apache.org/jira/browse/HDFS-16048
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
> Attachments: topology-json.jpg, topology-text.jpg
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the router web. It's related to HDFS-15970.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16050) Some dynamometer tests fail

2021-06-07 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16050.
-
   Fix Version/s: 3.3.2
  3.4.0
Target Version/s: 3.4.0, 3.3.2  (was: 3.4.0)
  Resolution: Fixed

> Some dynamometer tests fail
> ---
>
> Key: HDFS-16050
> URL: https://issues.apache.org/jira/browse/HDFS-16050
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed:
> {quote}hadoop.tools.dynamometer.TestDynamometerInfra
>  hadoop.tools.dynamometer.blockgenerator.TestBlockGen
> hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt]
> {quote}[ERROR] 
> testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator)
>  Time elapsed: 1.353 s <<< ERROR!
>  java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16041) TestErasureCodingCLI fails

2021-05-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16041.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> TestErasureCodingCLI fails
> --
>
> Key: HDFS-16041
> URL: https://issues.apache.org/jira/browse/HDFS-16041
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Because of HDFS-16018, TestErasureCodingCLI fails, reported by HDFS-13671



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15991.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add location into datanode info for NameNodeMXBean
> --
>
> Key: HDFS-15991
> URL: https://issues.apache.org/jira/browse/HDFS-15991
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15974) RBF: Unable to display the datanode UI of the router

2021-04-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15974.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> RBF: Unable to display the datanode UI of the router
> 
>
> Key: HDFS-15974
> URL: https://issues.apache.org/jira/browse/HDFS-15974
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.4.0
>Reporter: zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15358-1.patch, image-2021-04-15-11-36-47-644.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Clicking the Datanodes tag on the Router UI does not respond.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15989) Split TestBalancer into two classes

2021-04-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15989.
-
Resolution: Fixed

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15970) Print network topology on the web

2021-04-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15970.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15975) Use LongAdder instead of AtomicLong

2021-04-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15975.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HDFS-15975
> URL: https://issues.apache.org/jira/browse/HDFS-15975
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When counting some indicators, we can use LongAdder instead of AtomicLong to 
> improve performance. The long value is not an atomic snapshot in LongAdder, 
> but I think we can tolerate that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15883) Add a metric BlockReportQueueFullCount

2021-04-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15883.
-
Resolution: Won't Fix

Closes as the PR closed.

> Add a metric BlockReportQueueFullCount
> --
>
> Key: HDFS-15883
> URL: https://issues.apache.org/jira/browse/HDFS-15883
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add a metric that reflects the number of times the block report queue is full



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   >