date:20240211

[jira] [Updated] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16616:
--
  Component/s: hdfs-common
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove the use if Sets#newHashSet and Sets#newTreeSet 
> --
>
> Key: HDFS-16616
> URL: https://issues.apache.org/jira/browse/HDFS-16616
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-common
>Affects Versions: 3.4.0
>Reporter: Samrat Deb
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As part of removing guava dependencies  HADOOP-17115, HADOOP-17721, 
> HADOOP-17722 and HADOOP-17720 are fixed,
> Currently the code call util function to create HashSet and TreeSet in the 
> repo . These function calls dont have much importance as it is calling 
> internally new HashSet<> / new TreeSet<> from java.utils 
> This task is to clean up all the function calls to create sets which is 
> redundant 
> Before moving to java8 , sets were created using guava functions and API , 
> now since this is moved away and util code in the hadoop now looks like
> 1. 
> public static  TreeSet newTreeSet() {  return new 
> TreeSet(); 
> 2. 
> public static  HashSet newHashSet()
> { return new HashSet(); }
> These interfaces dont do anything much just a extra layer of function call 
> please refer to the task 
> https://issues.apache.org/jira/browse/HADOOP-17726
> Can anyone review if this ticket add some value in the code. 
> Looking forward to some input/ thoughts . If not adding any value we can 
> close it and not move forward with changes !



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16522) Set Http and Ipc ports for Datanodes in MiniDFSCluster

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16522:
--
  Component/s: tets
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Set Http and Ipc ports for Datanodes in MiniDFSCluster
> --
>
> Key: HDFS-16522
> URL: https://issues.apache.org/jira/browse/HDFS-16522
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: tets
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We should provide options to set Http and Ipc ports for Datanodes in 
> MiniDFSCluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16502) Reconfigure Block Invalidate limit

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16502:
--
  Component/s: block placement
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Reconfigure Block Invalidate limit
> --
>
> Key: HDFS-16502
> URL: https://issues.apache.org/jira/browse/HDFS-16502
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: block placement
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Based on the cluster load, it would be helpful to consider tuning block 
> invalidate limit (dfs.block.invalidate.limit). The only way we can do this 
> without restarting Namenode as of today is by reconfiguring heartbeat 
> interval 
> {code:java}
> Math.max(heartbeatInt*20, blockInvalidateLimit){code}
> , this logic is not straightforward and operators are usually not aware of it 
> (lack of documentation), also updating heartbeat interval is not desired in 
> all the cases.
> We should provide the ability to alter block invalidation limit without 
> affecting heartbeat interval on the live cluster to adjust some load at 
> Datanode level.
> We should also take this opportunity to keep (heartbeatInterval * 20) 
> computation logic in a common method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16481) Provide support to set Http and Rpc ports in MiniJournalCluster

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16481:
--
  Component/s: test
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Provide support to set Http and Rpc ports in MiniJournalCluster
> ---
>
> Key: HDFS-16481
> URL: https://issues.apache.org/jira/browse/HDFS-16481
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> We should provide support for clients to set Http and Rpc ports of 
> JournalNodes in MiniJournalCluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16054:
--
  Component/s: hdfs-common
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-common
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16435) Remove no need TODO comment for ObserverReadProxyProvider

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16435:
--
  Component/s: namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove no need TODO comment for ObserverReadProxyProvider
> -
>
> Key: HDFS-16435
> URL: https://issues.apache.org/jira/browse/HDFS-16435
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Based on discussion in 
> [HDFS-13923|https://issues.apache.org/jira/browse/HDFS-13923], we don't think 
> need to Add a configuration to turn on/off observer reads.
> So I suggest removing the `TODO comment` that are not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16541) Fix a typo in NameNodeLayoutVersion.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16541:
--
  Component/s: namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix a typo in NameNodeLayoutVersion.
> 
>
> Key: HDFS-16541
> URL: https://issues.apache.org/jira/browse/HDFS-16541
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: ZhiWei Shi
>Assignee: ZhiWei Shi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Fix a typo in NameNodeLayoutVersion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16587) Allow configuring Handler number for the JournalNodeRpcServer

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16587:
--
  Component/s: journal-node
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Allow configuring Handler number for the JournalNodeRpcServer
> -
>
> Key: HDFS-16587
> URL: https://issues.apache.org/jira/browse/HDFS-16587
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We can allow configuring the handler number for the JournalNodeRpcServer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16339) Show the threshold when mover threads quota is exceeded

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16339:
--
  Component/s: datanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.4, 3.3.2, 3.4.0
Affects Version/s: 3.2.4
   3.3.2
   3.4.0

> Show the threshold when mover threads quota is exceeded
> ---
>
> Key: HDFS-16339
> URL: https://issues.apache.org/jira/browse/HDFS-16339
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2, 3.2.4
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-20-17-23-04-924.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show the threshold when mover threads quota is exceeded in 
> DataXceiver#replaceBlock and DataXceiver#copyBlock.
> !image-2021-11-20-17-23-04-924.png|width=1233,height=124!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16335) Fix HDFSCommands.md

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16335:
--
  Component/s: documentation
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.2, 3.4.0
Affects Version/s: 3.3.2
   3.4.0

> Fix HDFSCommands.md
> ---
>
> Key: HDFS-16335
> URL: https://issues.apache.org/jira/browse/HDFS-16335
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: documentation
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Fix HDFSCommands.md.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16326) Simplify the code for DiskBalancer

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16326:
--
  Component/s: diskbalancer
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.4, 3.3.2, 3.4.0
Affects Version/s: 3.2.4
   3.3.2
   3.4.0

> Simplify the code for DiskBalancer
> --
>
> Key: HDFS-16326
> URL: https://issues.apache.org/jira/browse/HDFS-16326
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: diskbalancer
>Affects Versions: 3.4.0, 3.3.2, 3.2.4
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Simplify the code for DiskBalancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16319:
--
  Component/s: metrics
 Target Version/s: 3.3.2, 3.4.0
Affects Version/s: 3.3.2
   3.4.0

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: metrics
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16298) Improve error msg for BlockMissingException

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16298:
--
  Component/s: hdfs-client
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.4, 3.3.2, 2.10.2, 3.4.0
Affects Version/s: 3.2.4
   3.3.2
   2.10.2
   3.4.0

> Improve error msg for BlockMissingException
> ---
>
> Key: HDFS-16298
> URL: https://issues.apache.org/jira/browse/HDFS-16298
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs-client
>Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-04-15-28-05-886.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the client fails to obtain a block, a BlockMissingException is thrown. 
> To analyze the issues, we can add the relevant location information to error 
> msg here.
> !image-2021-11-04-15-28-05-886.png|width=624,height=144!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16312) Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16312:
--
  Component/s: datanode
   metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.4, 3.3.2, 2.10.2, 3.4.0
Affects Version/s: 3.2.4
   3.3.2
   2.10.2
   3.4.0

> Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents
> 
>
> Key: HDFS-16312
> URL: https://issues.apache.org/jira/browse/HDFS-16312
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: datanode, metrics
>Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16280) Fix typo for ShortCircuitReplica#isStale

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16280:
--
  Component/s: hdfs-client
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix typo for ShortCircuitReplica#isStale
> 
>
> Key: HDFS-16280
> URL: https://issues.apache.org/jira/browse/HDFS-16280
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs-client
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Fix typo for ShortCircuitReplica#isStale.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16281) Fix flaky unit tests failed due to timeout

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16281:
--
  Component/s: test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix flaky unit tests failed due to timeout
> --
>
> Key: HDFS-16281
> URL: https://issues.apache.org/jira/browse/HDFS-16281
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I found that this unit test 
> *_TestViewFileSystemOverloadSchemeWithHdfsScheme_* failed several times due 
> to timeout. Can we change the timeout for some methods from _*3s*_ to *_30s_* 
> to be consistent with the other methods?
> {code:java}
> [ERROR] Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 
> 65.39 s <<< FAILURE! - in 
> org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR]
>  Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 65.39 s <<< 
> FAILURE! - in 
> org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR]
>  
> testNflyRepair(org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS)
>   Time elapsed: 4.132 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3000 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1577) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1535) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1432) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy26.setTimes(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:1059)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>  at com.sun.proxy.$Proxy27.setTimes(Unknown Source) at 
> org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:2658) at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1978)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1975)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1988)
>  at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:542) 
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.setTimes(ChRootedFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.commit(NflyFSystem.java:439)
>  at 
> org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.close(NflyFSystem.java:395)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.writeString(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:685)
>  at 
> org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.testNflyRepair(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:622)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.

[jira] [Updated] (HDFS-16194) Simplify the code with DatanodeID#getXferAddrWithHostname

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16194:
--
  Component/s: datanode
   metrics
   namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Simplify the code with DatanodeID#getXferAddrWithHostname   
> 
>
> Key: HDFS-16194
> URL: https://issues.apache.org/jira/browse/HDFS-16194
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: datanode, metrics, namanode
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Simplify the code with DatanodeID#getXferAddrWithHostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16131) Show storage type for failed volumes on namenode web

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16131:
--
  Component/s: namanode
   ui
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Show storage type for failed volumes on namenode web
> 
>
> Key: HDFS-16131
> URL: https://issues.apache.org/jira/browse/HDFS-16131
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: namanode, ui
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: failed-volumes.jpg
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> To make it easy to query the storage type for failed volumes,  we can display 
> them on namenode web.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16110:
--
  Component/s: dfsclient
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove unused method reportChecksumFailure in DFSClient
> ---
>
> Key: HDFS-16110
> URL: https://issues.apache.org/jira/browse/HDFS-16110
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: dfsclient
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Remove unused method reportChecksumFailure and fix some code styles by the 
> way in DFSClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16106) Fix flaky unit test TestDFSShell

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16106:
--
  Component/s: test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix flaky unit test TestDFSShell
> 
>
> Key: HDFS-16106
> URL: https://issues.apache.org/jira/browse/HDFS-16106
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This unit test occasionally fails.
> The value set for dfs.namenode.accesstime.precision is too low, result in the 
> execution of the method, accesstime could be set many times, eventually 
> leading to failed assert.
> IMO, dfs.namenode.accesstime.precision should be greater than or equal to the 
> timeout(120s) of TestDFSShell#testCopyCommandsWithPreserveOption(), or 
> directly set to 0 to disable this feature.
>  
> {code:java}
> [ERROR] Tests run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
> 106.778 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell[ERROR] Tests 
> run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 106.778 s <<< 
> FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell [ERROR] 
> testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell)  Time 
> elapsed: 2.353 s  <<< FAILURE! java.lang.AssertionError: 
> expected:<1625095098319> but was:<1625095099374> at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:647) at 
> org.junit.Assert.assertEquals(Assert.java:633) at 
> org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2282)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> [ERROR] 
> testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell)  Time 
> elapsed: 2.467 s  <<< FAILURE! java.lang.AssertionError: 
> expected:<1625095192527> but was:<1625095193950> at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:647) at 
> org.junit.Assert.assertEquals(Assert.java:633) at 
> org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2323)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> [ERROR] 
> testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell)  Time 
> elapsed: 2.173 s  <<< FAILURE! java.lang.AssertionError: 
> expected:<1625095196756> but was:<1625095197975> at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:647) at 
> org.junit.Assert.assertEquals(Assert.java:633) at 
> org.apache.hadoop.hdfs.TestDFSShell.tes

[jira] [Updated] (HDFS-16089) EC: Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16089:
--
  Component/s: erasure-coding
   metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.2, 3.4.0
Affects Version/s: 3.3.2
   3.4.0

> EC: Add metric EcReconstructionValidateTimeMillis for 
> StripedBlockReconstructor
> ---
>
> Key: HDFS-16089
> URL: https://issues.apache.org/jira/browse/HDFS-16089
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: erasure-coding, metrics
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor, 
> so that we can count the elapsed time for striped block reconstructing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16104) Remove unused parameter and fix java doc for DiskBalancerCLI

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16104:
--
  Component/s: diskbalancer
   documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove unused parameter and fix java doc for DiskBalancerCLI
> 
>
> Key: HDFS-16104
> URL: https://issues.apache.org/jira/browse/HDFS-16104
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: diskbalancer, documentation
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Remove unused parameter and fix java doc for DiskBalancerCLI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16079) Improve the block state change log

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16079:
--
  Component/s: block placement
 Target Version/s: 3.3.2, 3.4.0
Affects Version/s: 3.3.2
   3.4.0

> Improve the block state change log
> --
>
> Key: HDFS-16079
> URL: https://issues.apache.org/jira/browse/HDFS-16079
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: block placement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Improve the block state change log. Add readOnlyReplicas and 
> replicasOnStaleNodes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16078) Remove unused parameters for DatanodeManager.handleLifeline()

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16078:
--
  Component/s: namanode
 Target Version/s: 3.3.2, 3.2.3, 3.4.0
Affects Version/s: 3.3.2
   3.2.3
   3.4.0

> Remove unused parameters for DatanodeManager.handleLifeline()
> -
>
> Key: HDFS-16078
> URL: https://issues.apache.org/jira/browse/HDFS-16078
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: namanode
>Affects Versions: 3.4.0, 3.2.3, 3.3.2
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Remove unused parameters (blockPoolId, maxTransfers) for 
> DatanodeManager.handleLifeline().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15991:
--
  Component/s: metrics
   namanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Add location into datanode info for NameNodeMXBean
> --
>
> Key: HDFS-15991
> URL: https://issues.apache.org/jira/browse/HDFS-15991
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: metrics, namanode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16535) SlotReleaser should reuse the domain socket based on socket paths

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HDFS-16535:
-

Assignee: Quanlong Huang

> SlotReleaser should reuse the domain socket based on socket paths
> -
>
> Key: HDFS-16535
> URL: https://issues.apache.org/jira/browse/HDFS-16535
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HDFS-13639 improves the performance of short-circuit shm slot releasing by 
> reusing the domain socket that the client previously used to send release 
> request to the DataNode.
> This is good when there are only one DataNode locates with the client (truth 
> in most of the production environment). However, if we launch multiple 
> DataNodes on a machine (usually for testing, e.g. Impala's end-to-end tests), 
> the request could be sent to the wrong DataNode. See an example in 
> IMPALA-11234.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15951) Remove unused parameters in NameNodeProxiesClient

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15951:
--
  Component/s: hdfs-client
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Remove unused parameters in NameNodeProxiesClient
> -
>
> Key: HDFS-15951
> URL: https://issues.apache.org/jira/browse/HDFS-15951
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs-client
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Remove unused parameters in org.apache.hadoop.hdfs.NameNodeProxiesClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15975) Use LongAdder instead of AtomicLong

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15975:
--
  Component/s: metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HDFS-15975
> URL: https://issues.apache.org/jira/browse/HDFS-15975
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: metrics
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When counting some indicators, we can use LongAdder instead of AtomicLong to 
> improve performance. The long value is not an atomic snapshot in LongAdder, 
> but I think we can tolerate that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15938) Fix java doc in FSEditLog

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15938:
--
  Component/s: documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix java doc in FSEditLog
> -
>
> Key: HDFS-15938
> URL: https://issues.apache.org/jira/browse/HDFS-15938
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Fix java doc in 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog#logAddCacheDirectiveInfo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15906) Close FSImage and FSNamesystem after formatting is complete

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15906:
--
  Component/s: namanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> Close FSImage and FSNamesystem after formatting is complete
> ---
>
> Key: HDFS-15906
> URL: https://issues.apache.org/jira/browse/HDFS-15906
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: namanode
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Close FSImage and FSNamesystem after formatting is complete. 
> org.apache.hadoop.hdfs.server.namenode#format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15892) Add metric for editPendingQ in FSEditLogAsync

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15892:
--
  Component/s: metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> Add metric for editPendingQ in FSEditLogAsync
> -
>
> Key: HDFS-15892
> URL: https://issues.apache.org/jira/browse/HDFS-15892
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: metrics
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> To monitor editPendingQ in FSEditLogAsync, we add a metric  
> and print log when the queue is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15870) Remove unused configuration dfs.namenode.stripe.min

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15870:
--
  Component/s: configuration
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> Remove unused configuration dfs.namenode.stripe.min
> ---
>
> Key: HDFS-15870
> URL: https://issues.apache.org/jira/browse/HDFS-15870
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: configuration
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Remove unused configuration dfs.namenode.stripe.min.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15854) Make some parameters configurable for SlowDiskTracker and SlowPeerTracker

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15854:
--
  Component/s: block placement
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Make some parameters configurable for SlowDiskTracker and SlowPeerTracker
> -
>
> Key: HDFS-15854
> URL: https://issues.apache.org/jira/browse/HDFS-15854
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: block placement
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Make some parameters configurable for SlowDiskTracker and SlowPeerTracker. 
> Related to https://issues.apache.org/jira/browse/HDFS-15814.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13274:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Extend RouterRpcClient to use multiple sockets
> ---
>
> Key: HDFS-13274
> URL: https://issues.apache.org/jira/browse/HDFS-13274
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HADOOP-13144 introduces the ability to create multiple connections for the 
> same user and use different sockets. The RouterRpcClient should use this 
> approach to get a better throughput.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16598) Fix DataNode FsDatasetImpl lock issue without GS checks.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16598:
--
  Component/s: datanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix DataNode FsDatasetImpl lock issue without GS checks.
> 
>
> Key: HDFS-16598
> URL: https://issues.apache.org/jira/browse/HDFS-16598
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16600) Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16600:
--
  Component/s: datanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.
> -
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16526) Add metrics for slow DataNode

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16526:
--
  Component/s: datanode
   metrics
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add metrics for slow DataNode
> -
>
> Key: HDFS-16526
> URL: https://issues.apache.org/jira/browse/HDFS-16526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, metrics
>Affects Versions: 3.4.0
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Metrics-html.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add some more metrics for slow datanode operations - FlushOrSync, 
> PacketResponder send ACK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16488) [SPS]: Expose metrics to JMX for external SPS

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16488:
--
  Component/s: metrics
   sps
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SPS]: Expose metrics to JMX for external SPS
> -
>
> Key: HDFS-16488
> URL: https://issues.apache.org/jira/browse/HDFS-16488
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: metrics, sps
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-02-26-22-15-25-543.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently, external SPS has no monitoring metrics. We do not know how many 
> blocks are waiting to be processed, how many blocks are waiting to be 
> retried, and how many blocks have been migrated.
> We can expose these metrics in JMX for easy collection and display by 
> monitoring systems.
> !image-2022-02-26-22-15-25-543.png|width=631,height=170!
> For example, in our cluster, we exposed these metrics to JMX, collected by 
> JMX-Exporter and combined with Prometheus, and finally display by Grafana.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16460) [SPS]: Handle failure retries for moving tasks

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16460:
--
  Component/s: sps
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SPS]: Handle failure retries for moving tasks
> --
>
> Key: HDFS-16460
> URL: https://issues.apache.org/jira/browse/HDFS-16460
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: sps
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Handle failure retries for moving tasks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16484:
--
  Component/s: sps
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.2.4, 3.4.0
Affects Version/s: 3.3.5
   3.2.4
   3.4.0

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: sps
>Affects Versions: 3.4.0, 3.2.4, 3.3.5
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15987) Improve oiv tool to parse fsimage file in parallel with delimited format

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15987:
--
  Component/s: tools
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Improve oiv tool to parse fsimage file in parallel with delimited format
> 
>
> Key: HDFS-15987
> URL: https://issues.apache.org/jira/browse/HDFS-15987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.4.0
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Improve_oiv_tool_001.pdf
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The purpose of this Jira is to improve oiv tool to parse fsimage file with 
> sub-sections (see -HDFS-14617-) in parallel with delmited format. 
> 1.Serial parsing is time-consuming
> The time to serially parse a large fsimage with delimited format (e.g. `hdfs 
> oiv -p Delimited -t  ...`) is as follows: 
> {code:java}
> 1) Loading string table: -> Not time consuming.
> 2) Loading inode references: -> Not time consuming
> 3) Loading directories in INode section: -> Slightly time consuming (3%)
> 4) Loading INode directory section:  -> A bit time consuming (11%)
> 5) Output:   -> Very time consuming (86%){code}
> Therefore, output is the most parallelized stage.
> 2.How to output in parallel
> The sub-sections are grouped in order, and each thread processes a group and 
> outputs it to the file corresponding to each thread, and finally merges the 
> output files.
> 3. The result of a test
> {code:java}
>  input fsimage file info:
>  3.4G, 12 sub-sections, 55976500 INodes
>  -
>  Threads TotalTime OutputTime MergeTime
>  1   18m37s 16m18s  –
>  48m7s  4m49s   41s{code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16477:
--
  Component/s: sps
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SPS]: Add metric PendingSPSPaths for getting the number of paths to be 
> processed by SPS
> 
>
> Key: HDFS-16477
> URL: https://issues.apache.org/jira/browse/HDFS-16477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: sps
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently we have no idea how many paths are waiting to be processed when 
> using the SPS feature. We should add metric PendingSPSPaths for getting the 
> number of paths to be processed by SPS in NameNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16499) [SPS]: Should not start indefinitely while another SPS process is running

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16499:
--
  Component/s: sps
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SPS]: Should not start indefinitely while another SPS process is running
> -
>
> Key: HDFS-16499
> URL: https://issues.apache.org/jira/browse/HDFS-16499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: sps
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Normally, we can only start one SPS process at a time. When one process is 
> running, start another process and retry indefinitely. I think, in this case, 
> we should exit immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13248) RBF: Namenode need to choose block location for the client

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13248:
--
  Component/s: rbf
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 2.10.2, 3.4.0
Affects Version/s: 3.3.5
   2.10.2
   3.4.0

> RBF: Namenode need to choose block location for the client
> --
>
> Key: HDFS-13248
> URL: https://issues.apache.org/jira/browse/HDFS-13248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0, 2.10.2, 3.3.5
>Reporter: Wu Weiwei
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.5
>
> Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, 
> HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, 
> HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality 
> Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When execute a put operation via router, the NameNode will choose block 
> location for the router, not for the real client. This will affect the file's 
> locality.
> I think on both NameNode and Router, we should add a new addBlock method, or 
> add a parameter for the current addBlock method, to pass the real client 
> information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16458) [SPS]: Fix bug for unit test of reconfiguring SPS mode

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16458:
--
  Component/s: sps
   test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SPS]: Fix bug for unit test of reconfiguring SPS mode
> --
>
> Key: HDFS-16458
> URL: https://issues.apache.org/jira/browse/HDFS-16458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: sps, test
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> TestNameNodeReconfigure#verifySPSEnabled was compared with 
> itself({*}isSPSRunning{*}) at assertEquals.
> In addition, after an *internal SPS* has been removed, *spsService daemon* 
> will not start within StoragePolicySatisfyManager. I think the relevant code 
> can be removed to simplify the code.
> IMO, after reconfig SPS mode, we just need to confirm whether the mode is 
> correct and whether spsManager is NULL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16222) Fix ViewDFS with mount points for HDFS only API

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16222:
--
  Component/s: viewfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix ViewDFS with mount points for HDFS only API
> ---
>
> Key: HDFS-16222
> URL: https://issues.apache.org/jira/browse/HDFS-16222
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.4.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: test_to_repro.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Presently, For HDFS specific API, The ones not present in ViewFileSystem. The 
> resolved path seems to be coming wrong.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16231) Fix TestDataNodeMetrics#testReceivePacketSlowMetrics

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16231:
--
  Component/s: datanode
   metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix TestDataNodeMetrics#testReceivePacketSlowMetrics
> 
>
> Key: HDFS-16231
> URL: https://issues.apache.org/jira/browse/HDFS-16231
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, metrics
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> TestDataNodeMetrics#testReceivePacketSlowMetrics fails with stacktrace:
> {code:java}
> java.lang.AssertionError: Expected exactly one metric for name 
> TotalPacketsReceived 
> Expected :1
> Actual   :0
>  
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.test.MetricsAsserts.checkCaptured(MetricsAsserts.java:278)
>   at 
> org.apache.hadoop.test.MetricsAsserts.getLongCounter(MetricsAsserts.java:237)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testReceivePacketSlowMetrics(TestDataNodeMetrics.java:200)
> {code}
> {code:java}
> // Error MetricsName in current code，e.g 
> TotalPacketsReceived，TotalPacketsSlowWriteToMirror，TotalPacketsSlowWriteToDisk，TotalPacketsSlowWriteToOsCache
>   MetricsRecordBuilder dnMetrics = 
> getMetrics(datanode.getMetrics().name());
>   assertTrue("More than 1 packet received",
>   getLongCounter("TotalPacketsReceived", dnMetrics) > 1L); 
>   assertTrue("More than 1 slow packet to mirror",
>   getLongCounter("TotalPacketsSlowWriteToMirror", dnMetrics) > 1L);
>   assertCounter("TotalPacketsSlowWriteToDisk", 1L, dnMetrics);
>   assertCounter("TotalPacketsSlowWriteToOsCache", 0L, dnMetrics);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16192) ViewDistributedFileSystem#rename wrongly using src in the place of dst.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16192:
--
  Component/s: viewfs
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.2, 3.4.0
Affects Version/s: 3.3.2
   3.4.0

> ViewDistributedFileSystem#rename wrongly using src in the place of dst.
> ---
>
> Key: HDFS-16192
> URL: https://issues.apache.org/jira/browse/HDFS-16192
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In ViewDistributedFileSystem, we are mistakenly used src path in the place of 
> dst path when finding mount path info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15671:
--
  Component/s: balancer
   test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
> --
>
> Key: HDFS-15671
> URL: https://issues.apache.org/jira/browse/HDFS-15671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer, test
>Affects Versions: 3.4.0
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> qbt report shows failures on TestBalancer
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault
> Failing for the past 1 build (Since Failed#317 )
> Took 45 sec.
> Error Message
> Timed out waiting for /tmp.txt to reach 20 replicas
> Stacktrace
> java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to 
> reach 20 replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15973) RBF: Add permission check before doing router federation rename.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15973:
--
  Component/s: rbf
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Add permission check before doing router federation rename.
> 
>
> Key: HDFS-15973
> URL: https://issues.apache.org/jira/browse/HDFS-15973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, 
> HDFS-15973.003.patch, HDFS-15973.004.patch, HDFS-15973.005.patch, 
> HDFS-15973.006.patch, HDFS-15973.007.patch, HDFS-15973.008.patch, 
> HDFS-15973.009.patch, HDFS-15973.010.patch
>
>
> The router federation rename is lack of permission check. It is a security 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13975) TestBalancer#testMaxIterationTime fails sporadically

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13975:
--
 Component/s: balancer
  test
Target Version/s: 3.2.3, 2.10.2, 3.3.1, 3.4.0

> TestBalancer#testMaxIterationTime fails sporadically
> 
>
> Key: HDFS-13975
> URL: https://issues.apache.org/jira/browse/HDFS-13975
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer, test
>Affects Versions: 3.2.0
>Reporter: Jason Darrell Lowe
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: flaky-test, pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A number of precommit builds have seen this test fail like this:
> {noformat}
> java.lang.AssertionError: Unexpected iteration runtime: 4021ms > 3.5s
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testMaxIterationTime(TestBalancer.java:1649)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15848) Snapshot Operations: Add debug logs at the entry point

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15848:
--
  Component/s: snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Snapshot Operations: Add debug logs at the entry point
> --
>
> Key: HDFS-15848
> URL: https://issues.apache.org/jira/browse/HDFS-15848
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS-15848.001.patch, HDFS-15848.002.patch, 
> HDFS-15848.003.patch, HDFS-15848.004.patch
>
>
> Add debug logs at the entry point for various Snapshot Operations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15847) create client protocol: add ecPolicyName & storagePolicy param to debug statement string

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15847:
--
  Component/s: erasure-coding
   namanode
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> create client protocol: add ecPolicyName & storagePolicy param to debug 
> statement string 
> -
>
> Key: HDFS-15847
> URL: https://issues.apache.org/jira/browse/HDFS-15847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namanode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15847.0001.patch
>
>
> A create (ClientProtocol) ==> namesystem.startFileInt does not print 
> "ecPolicyName & storagePolicy" param, It will be good to have these params 
> added in debug statement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15834) Remove the usage of org.apache.log4j.Level

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15834:
--
  Component/s: hdfs-common
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove the usage of org.apache.log4j.Level
> --
>
> Key: HDFS-15834
> URL: https://issues.apache.org/jira/browse/HDFS-15834
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-common
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Replace org.apache.log4j.Level with org.slf4j.event.Level in hadoop-hdfs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15820) Ensure snapshot root trash provisioning happens only post safe mode exit

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15820:
--
  Component/s: snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Ensure snapshot root trash provisioning happens only post safe mode exit
> 
>
> Key: HDFS-15820
> URL: https://issues.apache.org/jira/browse/HDFS-15820
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, on namenode startup, snapshot trash root provisioning starts as 
> along with trash emptier service but namenode might not be out of safe mode 
> by then. This can fail the snapshot trash dir creation thereby crashing the 
> namenode. The idea here is to trigger snapshot trash provisioning only post 
> safe mode exit.
> {code:java}
> 2021-02-04 11:23:47,323 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring 
> NN shutdown. Shutting down immediately.
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
> directory /upgrade/.Trash. Name node is in safe mode.
> The reported blocks 0 needs additional 1383 blocks to reach the threshold 
> 0.9990 of total blocks 1385.
> The number of live datanodes 0 needs an additional 1 live datanodes to reach 
> the minimum number 1.
> Safe mode will be turned off automatically once the thresholds have been 
> reached. NamenodeHostName:quasar-brabeg-5.quasar-brabeg.root.hwx.site
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1542)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1529)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3288)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAndProvisionSnapshotTrashRoots(FSNamesystem.java:8269)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1939)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:967)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:936)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1740)
> 2021-02-04 11:23:47,334 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot 
> create directory /upgrade/.Trash. Name node is in safe mode.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15817) Rename snapshots while marking them deleted

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15817:
--
  Component/s: snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Rename snapshots while marking them deleted 
> 
>
> Key: HDFS-15817
> URL: https://issues.apache.org/jira/browse/HDFS-15817
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> With ordered snapshot feature turned on, a snapshot will be just marked as 
> deleted but won't actually be deleted if its not the oldest one. Since, the 
> snapshot is just marked deleted, creation of  new snapshot having the same 
> name as the one which was marked deleted will fail. In order to mitigate such 
> problems, the idea here is to rename the snapshot getting marked as deleted 
> by appending deletion timestamp along with snapshot id to it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15767) RBF: Router federation rename of directory.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15767:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Router federation rename of directory.
> ---
>
> Key: HDFS-15767
> URL: https://issues.apache.org/jira/browse/HDFS-15767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15767.001.patch, HDFS-15767.002.patch, 
> HDFS-15767.003.patch, HDFS-15767.004.patch, HDFS-15767.005.patch, 
> HDFS-15767.006.patch, HDFS-15767.007.patch
>
>
> This Jira trys to support rename of directory across namespaces using 
> fedbalance framework. 
> We can do the router federation rename when:
>  # Both the src and dst has only one remote location.
>  # The src and dst remote locations are at different namespaces.
>  # The src is a directory.(Fedbalance depends on snapshot).
>  # The dst doesn't exist.
> We can implement router federation rename of file in a new task so the patch 
> won't be too big to review.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15672) TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy fails on trunk

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15672:
--
  Component/s: balancer
   test
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy 
> fails on trunk
> ---
>
> Key: HDFS-15672
> URL: https://issues.apache.org/jira/browse/HDFS-15672
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer, test
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> qbt report shows the following error:
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy
> Failing for the past 1 build (Since Failed#317 )
> Took 10 min.
> Error Message
> test timed out after 60 milliseconds
> Stacktrace
> org.junit.runners.model.TestTimedOutException: test timed out after 60 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.sleep(TestBalancerWithMultipleNameNodes.java:353)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.wait(TestBalancerWithMultipleNameNodes.java:159)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runBalancer(TestBalancerWithMultipleNameNodes.java:175)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runTest(TestBalancerWithMultipleNameNodes.java:550)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy(TestBalancerWithMultipleNameNodes.java:609)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15762) TestMultipleNNPortQOP#testMultipleNNPortOverwriteDownStream fails intermittently

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15762:
--
  Component/s: hdfs
   test
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> TestMultipleNNPortQOP#testMultipleNNPortOverwriteDownStream fails 
> intermittently
> 
>
> Key: HDFS-15762
> URL: https://issues.apache.org/jira/browse/HDFS-15762
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
>  Labels: flaky-test, pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: PR2585#1-TestMultipleNNPortQOP-output.txt
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This unit test failed in https://github.com/apache/hadoop/pull/2585 due to an 
> AssertionError.
> {code}
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.hdfs.TestMultipleNNPortQOP.testMultipleNNPortOverwriteDownStream(TestMultipleNNPortQOP.java:267)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> The failure occurred at the following assertion.
> {code}
>   doTest(fsPrivacy, PATH1);
>   for (int i = 0; i < 2; i++) {
> DataNode dn = dataNodes.get(i);
> SaslDataTransferClient saslClient = dn.getSaslClient();
> String qop = null;
> // It may take some time for the qop to populate
> // to all DNs, check in a loop.
> for (int trial = 0; trial < 10; trial++) {
>   qop = saslClient.getTargetQOP();
>   if (qop != null) {
> break;
>   }
>   Thread.sleep(100);
> }
> assertEquals("auth", qop);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e

[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14558:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch, 
> HDFS-14558.003.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15702) Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15702:
--
  Component/s: hdfs
   test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission
> --
>
> Key: HDFS-15702
> URL: https://issues.apache.org/jira/browse/HDFS-15702
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 3.4.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.hdfs.TestDecommission.testAllocAndIBRWhileDecommission(TestDecommission.java:1025)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15766) RBF: MockResolver.getMountPoints() breaks the semantic of FileSubclusterResolver.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15766:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: MockResolver.getMountPoints() breaks the semantic of 
> FileSubclusterResolver.
> -
>
> Key: HDFS-15766
> URL: https://issues.apache.org/jira/browse/HDFS-15766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15766.001.patch, HDFS-15766.002.patch, 
> HDFS-15766.003.patch
>
>
> MockResolver.getMountPoints() breaks the semantic of 
> FileSubclusterResolver.getMountPoints(). Currently it returns null when the 
> path is a mount point and no mount points are under the path. 
> {quote}Return zero-length list if the path is a mount point but there are no 
> mount points under the path.
> {quote}
>  
> This is required by router federation rename. I found this bug when writing 
> unit test for the rbf rename. Let's fix it here to avoid mixing up with the 
> router federation rename.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15748) RBF: Move the router related part from hadoop-federation-balance module to hadoop-hdfs-rbf.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15748:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Move the router related part from hadoop-federation-balance module to 
> hadoop-hdfs-rbf.
> ---
>
> Key: HDFS-15748
> URL: https://issues.apache.org/jira/browse/HDFS-15748
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15748.001.patch, HDFS-15748.002.patch, 
> HDFS-15748.003.patch, HDFS-15748.004.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15648) TestFileChecksum should be parameterized

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15648:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestFileChecksum should be parameterized
> 
>
> Key: HDFS-15648
> URL: https://issues.apache.org/jira/browse/HDFS-15648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 
> methods that return a constant flag True/False.
> The class is useless and it causes confusion with two different jiras, while 
> the main bug should be in TestFileChecksum.
> The {{TestFileChecksum}} should be parameterized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15677) TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15677:
--
  Component/s: rbf
   test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk
> 
>
> Key: HDFS-15677
> URL: https://issues.apache.org/jira/browse/HDFS-15677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.4.0
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in 
> testGetCachedDatanodeReport



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15674:
--
  Component/s: datanode
   test
 Target Version/s: 3.3.6, 3.4.0
Affects Version/s: 3.3.6
   3.4.0

> TestBPOfferService#testMissBlocksWhenReregister fails on trunk
> --
>
> Key: HDFS-15674
> URL: https://issues.apache.org/jira/browse/HDFS-15674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in 
> testMissBlocksWhenReregister 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15643) EC: Fix checksum computation in case of native encoders

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15643:
--
  Component/s: erasure-coding
 Target Version/s: 3.2.3, 3.3.1, 3.2.2, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.2.2
   3.4.0

> EC: Fix checksum computation in case of native encoders
> ---
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>Reporter: Ahmed Hussein
>Assignee: Ayush Saxena
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, 
> org.apache.hadoop.hdfs.TestFileChecksum-output.txt, 
> org.apache.hadoop.hdfs.TestFileChecksum.txt
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> {code:bash}
> Error Message
> `/striped/stripedFileChecksum1': Fail to get block checksum for 
> LocatedStripedBlock{BP-1299291876-172.17.0.3-1

[jira] [Updated] (HDFS-15460) TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15460:
--
  Component/s: hdfs
   test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently
> --
>
> Key: HDFS-15460
> URL: https://issues.apache.org/jira/browse/HDFS-15460
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 3.4.0
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available, test
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {{TestFileCreation.testServerDefaultsWithMinimalCaching}} fails 
> intermittently on trunk
> {code:bash}
> [ERROR] Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 103.413 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestFileCreation
> [ERROR] 
> testServerDefaultsWithMinimalCaching(org.apache.hadoop.hdfs.TestFileCreation) 
>  Time elapsed: 2.435 s  <<< FAILURE!
> java.lang.AssertionError: expected:<402653184> but was:<268435456>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testServerDefaultsWithMinimalCaching(TestFileCreation.java:279)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9776) TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-9776:
-
  Component/s: test
 Target Version/s: 3.2.3, 3.3.1, 3.2.2, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.2.2
   3.4.0

> TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky
> -
>
> Key: HDFS-9776
> URL: https://issues.apache.org/jira/browse/HDFS-9776
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>Reporter: Vinayakumar B
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: TestHAAppend.testMultipleAppendsDuringCatchupTailing.log
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Initial analysys of Recent test failure in 
> {{TestHAAppend#testMultipleAppendsDuringCatchupTailing}}
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/14420/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestHAAppend/testMultipleAppendsDuringCatchupTailing/]
>  
> has found that, if the Active NameNode goes down immediately after truncate 
> operation, but before BlockRecovery command sent to datanode,
> Then this block will never be truncated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15640) Add diff threshold to FedBalance

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15640:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add diff threshold to FedBalance
> 
>
> Key: HDFS-15640
> URL: https://issues.apache.org/jira/browse/HDFS-15640
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15640.001.patch, HDFS-15640.002.patch, 
> HDFS-15640.003.patch, HDFS-15640.004.patch
>
>
> Currently in the DistCpProcedure it must submit distcp round by round until 
> there is no diff to go to the final distcp stage. The condition is very 
> strict. During incremental copy stage, if the diff size is under the given 
> threshold scope then we don't need to wait for no diff. We can start the 
> final distcp directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15614:
--
  Component/s: namanode
   snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Initialize snapshot trash root during NameNode startup if enabled
> -
>
> Key: HDFS-15614
> URL: https://issues.apache.org/jira/browse/HDFS-15614
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namanode, snapshots
>Affects Versions: 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up to HDFS-15607.
> Goal:
> Initialize (create) snapshot trash root for all existing snapshottable 
> directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to 
> {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually 
> on all those existing snapshottable directories.
> The change is expected to land in {{FSNamesystem}}.
> Discussion:
> 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the 
> client side. But in order for NN to create it at startup, the logic must 
> (also) be implemented on the server side as well. -- which is also a 
> requirement by WebHDFS (HDFS-15612).
> 2. Alternatively, we can provide an extra parameter to the 
> {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to 
> initialize/provision trash root on all existing snapshottable dirs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15598) ViewHDFS#canonicalizeUri should not be restricted to DFS only API.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15598:
--
 Component/s: viewfs
Target Version/s: 3.4.0

> ViewHDFS#canonicalizeUri should not be restricted to DFS only API.
> --
>
> Key: HDFS-15598
> URL: https://issues.apache.org/jira/browse/HDFS-15598
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As part of HIve Partitions verification, insert failed due to canonicalizeUri 
> restricted to DFS only. This can be relaxed and delegate to 
> vfs#canonicalizeUri



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15585) ViewDFS#getDelegationToken should not throw UnsupportedOperationException.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15585:
--
 Component/s: viewfs
Target Version/s: 3.3.1, 3.4.0

> ViewDFS#getDelegationToken should not throw UnsupportedOperationException.
> --
>
> Key: HDFS-15585
> URL: https://issues.apache.org/jira/browse/HDFS-15585
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When starting Hive in secure environment, it is throwing 
> UnsupportedOprationException from ViewDFS.
> at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:736) 
> ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54]
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1077)
>  ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54]
>   ... 9 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.hadoop.hdfs.ViewDistributedFileSystem.getDelegationToken(ViewDistributedFileSystem.java:1042)
>  ~[hadoop-hdfs-client-3.1.1.7.2.3.0-54.jar:?]
>   at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
>  ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?]
>   at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
>  ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?]
>   at 
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:140)
>  ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54]
>   at 
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:101)
>  ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54]
>   at 
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:77)
>  ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createLlapCredentials(TezSessionState.java:443)
>  ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:354)
>  ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:313)
>  ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15532) listFiles on root/InternalDir will fail if fallback root has file

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15532:
--
 Component/s: viewfs
Target Version/s: 3.3.1, 3.4.0

> listFiles on root/InternalDir will fail if fallback root has file
> -
>
> Key: HDFS-15532
> URL: https://issues.apache.org/jira/browse/HDFS-15532
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> listFiles implementation gets the RemoteIterator created in 
> InternalViewFSDirFs as the root is an InternalViewFSDir.  
> If there is a fallback and a file exist at root level, it would have 
> collected when collecting locatedStatuses. 
> When its iterating over to that fallbacks file from  RemoteIterator (which 
> was returned from InternalViewFSDirFs ), iterator's next will will call 
> getFileBlockLocations if it's a file.
> {code:java}
> @Override
> public LocatedFileStatus next() throws IOException {
>  System.out.println(this);
>  if (!hasNext()) {
>  throw new NoSuchElementException("No more entries in " + f);
>  }
>  FileStatus result = stats[i++];
>  // for files, use getBlockLocations(FileStatus, int, int) to avoid
>  // calling getFileStatus(Path) to load the FileStatus again
>  BlockLocation[] locs = result.isFile() ?
>  getFileBlockLocations(result, 0, result.getLen()) :
>  null;
>  return new LocatedFileStatus(result, locs);
> }{code}
>  
> this getFileBlockLocations will be made on InternalViewFSDirFs, as that 
> Iterator created originally from that fs. 
> InternalViewFSDirFs#getFileBlockLocations does not handle fallback cases. 
> It's always expecting "/", this means it always assuming the dir.
> But with the fallback and returning Iterator from InternalViewFSDirFs, will 
> create problems.
> Probably we need to handle fallback case in getFileBlockLocations as well.( 
> Fallback only should be the reason for call coming to InternalViewFSDirFs 
> with other than "/")
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15558) ViewDistributedFileSystem#recoverLease should call super.recoverLease when there are no mounts configured

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15558:
--
  Component/s: viewfs
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> ViewDistributedFileSystem#recoverLease should call super.recoverLease when 
> there are no mounts configured
> -
>
> Key: HDFS-15558
> URL: https://issues.apache.org/jira/browse/HDFS-15558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15496) Add UI for deleted snapshots

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15496:
--
  Component/s: snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add UI for deleted snapshots
> 
>
> Key: HDFS-15496
> URL: https://issues.apache.org/jira/browse/HDFS-15496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
> Fix For: 3.4.0
>
>
> Add UI for deleted snapshots
> a) Show the list of snapshots per snapshottable directory
> b) Add deleted status in the JMX output for the Snapshot along with a snap ID
> e) NN UI, should sort the snapshots for snapIds. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15518) Wrong operation name in FsNamesystem for listSnapshots

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15518:
--
  Component/s: snapshots
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Wrong operation name in FsNamesystem for listSnapshots
> --
>
> Key: HDFS-15518
> URL: https://issues.apache.org/jira/browse/HDFS-15518
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Aryan Gupta
>Priority: Major
> Fix For: 3.4.0
>
>
> List snapshots makes use of listSnapshotDirectory as the string in place of 
> ListSnapshot.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L7026



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15374) Add documentation for fedbalance tool

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15374:
--
  Component/s: documentation
   rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add documentation for fedbalance tool
> -
>
> Key: HDFS-15374
> URL: https://issues.apache.org/jira/browse/HDFS-15374
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation, rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: BalanceProcedureScheduler.png, 
> FedBalance_Screenshot1.jpg, FedBalance_Screenshot2.jpg, 
> FedBalance_Screenshot3.jpg, HDFS-15374.001.patch, HDFS-15374.002.patch, 
> HDFS-15374.003.patch, HDFS-15374.004.patch, HDFS-15374.005.patch
>
>
> Add documentation for fedbalance tool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15410) Add separated config file hdfs-fedbalance-default.xml for fedbalance tool

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15410:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add separated config file hdfs-fedbalance-default.xml for fedbalance tool
> -
>
> Key: HDFS-15410
> URL: https://issues.apache.org/jira/browse/HDFS-15410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch, 
> HDFS-15410.003.patch, HDFS-15410.004.patch, HDFS-15410.005.patch
>
>
> Add a separated config file named hdfs-fedbalance-default.xml for fedbalance 
> tool configs. It's like the ditcp-default.xml for distcp tool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15346) FedBalance tool implementation

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15346:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> FedBalance tool implementation
> --
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch, 
> HDFS-15346.009.patch, HDFS-15346.010.patch, HDFS-15346.011.patch, 
> HDFS-15346.012.patch
>
>
> This Jira implements the HDFS FedBalance tool based on the basic frame work 
> in HDFS-15340.  The whole process of hdfs federation tool is implemented in 
> this jira. See the documentation at HDFS-15374/patch-v05 for a detailed 
> description of the HDFS fedbalance tool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15340) RBF: Implement BalanceProcedureScheduler basic framework

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15340:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Implement BalanceProcedureScheduler basic framework
> 
>
> Key: HDFS-15340
> URL: https://issues.apache.org/jira/browse/HDFS-15340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15340.001.patch, HDFS-15340.002.patch, 
> HDFS-15340.003.patch, HDFS-15340.004.patch, HDFS-15340.005.patch, 
> HDFS-15340.006.patch, HDFS-15340.007.patch, HDFS-15340.008.patch
>
>
> This Jira implements the basic framework(Balance Procedure Scheduler) of the 
> hdfs federation balance tool. 
>  The Balance Procedure Scheduler implements a state machine. It’s responsible 
> for scheduling a balance job, including submit, run, delay and recover. See 
> the documentation at HDFS-15374/patch-v05 for a detailed description of the 
> state machine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15146) TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15146:
--
  Component/s: balancer
   test
 Target Version/s: 2.10.1, 3.2.2, 3.3.0, 3.4.0
Affects Version/s: 2.10.1
   3.2.2
   3.3.0
   3.4.0

> TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently
> -
>
> Key: HDFS-15146
> URL: https://issues.apache.org/jira/browse/HDFS-15146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer, test
>Affects Versions: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch
>
>
> TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault fails intermittently when 
> the number of blocks does not match the expected. In 
> {{testBalancerRPCDelay}}, it seems like some datanodes will not be up by the 
> time we fetch the block locations.
> I see the following stack trace:
> {code:bash}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 39.969 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay
> [ERROR] 
> testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay)
>   Time elapsed: 12.035 s  <<< FAILURE!
> java.lang.AssertionError: Number of getBlocks should be not less than 20
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15898) Test case TestOfflineImageViewer fails

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15898:
--
  Component/s: test
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Test case TestOfflineImageViewer fails
> --
>
> Key: HDFS-15898
> URL: https://issues.apache.org/jira/browse/HDFS-15898
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following 3 cases failed locally
> TestOfflineImageViewer#testWriterOutputEntryBuilderForFile
>  
> {code:java}
> org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected 
> :/path/file,5,2000-01-01 00:00,2000-01-01 
> 00:00,1024,3,3072,0,0,-rwx-wx-w-+,user_1,group_1Actual   
> :/path/file,5,2000-01-01 08:00,2000-01-01 
> 08:00,1024,3,3072,0,0,-rwx-wx-w-+,user_1,group_1
> at org.junit.Assert.assertEquals(Assert.java:115) at 
> org.junit.Assert.assertEquals(Assert.java:144) at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForFile(TestOfflineImageViewer.java:760){code}
> TestOfflineImageViewer#testWriterOutputEntryBuilderForDirectory
> {code:java}
> org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected 
> :/path/dir,0,2000-01-01 00:00,1970-01-01 
> 00:00,0,0,0,700,1000,drwx-wx-w-+,user_1,group_1Actual   
> :/path/dir,0,2000-01-01 08:00,1970-01-01 
> 08:00,0,0,0,700,1000,drwx-wx-w-+,user_1,group_1 at 
> org.junit.Assert.assertEquals(Assert.java:115) at 
> org.junit.Assert.assertEquals(Assert.java:144) at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForDirectory(TestOfflineImageViewer.java:768){code}
> TestOfflineImageViewer#testWriterOutputEntryBuilderForSymlink
> {code:java}
> org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected 
> :/path/sym,0,2000-01-01 00:00,2000-01-01 
> 00:00,0,0,0,0,0,-rwx-wx-w-,user_1,group_1Actual   :/path/sym,0,2000-01-01 
> 08:00,2000-01-01 08:00,0,0,0,0,0,-rwx-wx-w-,user_1,group_1 difference> at org.junit.Assert.assertEquals(Assert.java:115) at 
> org.junit.Assert.assertEquals(Assert.java:144) at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForSymlink(TestOfflineImageViewer.java:776){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15576) Erasure Coding: Add rs and rs-legacy codec test for addPolicies

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15576:
--
  Component/s: erasure-coding
   test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Erasure Coding: Add rs and rs-legacy codec test for addPolicies
> ---
>
> Key: HDFS-15576
> URL: https://issues.apache.org/jira/browse/HDFS-15576
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: erasure-coding, test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS-15576.001.patch, HDFS-15576.002.patch
>
>
> * Add rs and rs-legacy codec test for  TestErasureCodingCLI
> * Add comments for failed test RS
> * Modify UT, change "RS" to "rs", because "RS" is not supported 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15690:
--
  Component/s: test
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Add lz4-java as hadoop-hdfs test dependency
> ---
>
> Key: HDFS-15690
> URL: https://issues.apache.org/jira/browse/HDFS-15690
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: 
> net/jpountz/lz4/LZ4Factory":
> https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/
> We need to add lz4-java to hadoop-hdfs test dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15559) Complement initialize member variables in TestHdfsConfigFields#initializeMemberVariables

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15559:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Complement initialize member variables in 
> TestHdfsConfigFields#initializeMemberVariables
> 
>
> Key: HDFS-15559
> URL: https://issues.apache.org/jira/browse/HDFS-15559
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS-15559.001.patch, HDFS-15559.002.patch
>
>
> There are some missing constant interfaces in 
> TestHdfsConfigFields#initializeMemberVariables
> {code:java}
> @Override
> public void initializeMemberVariables() {
>   xmlFilename = new String("hdfs-default.xml");
>   configurationClasses = new Class[] { HdfsClientConfigKeys.class,
>   HdfsClientConfigKeys.Failover.class,
>   HdfsClientConfigKeys.StripedRead.class, DFSConfigKeys.class,
>   HdfsClientConfigKeys.BlockWrite.class,
>   HdfsClientConfigKeys.BlockWrite.ReplaceDatanodeOnFailure.class };
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16550:
--
  Component/s: journal-node
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SBN read] Improper cache-size for journal node may cause cluster crash
> ---
>
> Key: HDFS-16550
> URL: https://issues.apache.org/jira/browse/HDFS-16550
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-04-21-09-54-29-751.png, 
> image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When we introduced {*}SBN Read{*}, we encountered a situation during upgrade 
> the JournalNodes.
> Cluster Info: 
> *Active: nn0*
> *Standby: nn1*
> 1. Rolling restart journal node. {color:#ff}(related config: 
> fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}
> 2. The cluster runs for a while, edits cache usage is increasing and memory 
> is used up.
> 3. {color:#ff}Active namenode(nn0){color} shutdown because of “{_}Timed 
> out waiting 12ms for a quorum of nodes to respond”{_}.
> 4. Transfer nn1 to Active state.
> 5. {color:#ff}New Active namenode(nn1){color} also shutdown because of 
> “{_}Timed out waiting 12ms for a quorum of nodes to respond” too{_}.
> 6. {color:#ff}The cluster crashed{color}.
>  
> Related code:
> {code:java}
> JournaledEditsCache(Configuration conf) {
>   capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
>   DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
>   if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
> Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
> "maximum JVM memory is only %d bytes. It is recommended that you " +
> "decrease the cache size or increase the heap size.",
> capacity, Runtime.getRuntime().maxMemory()));
>   }
>   Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
>   "of bytes: " + capacity);
>   ReadWriteLock lock = new ReentrantReadWriteLock(true);
>   readLock = new AutoCloseableLock(lock.readLock());
>   writeLock = new AutoCloseableLock(lock.writeLock());
>   initialize(INVALID_TXN_ID);
> } {code}
> Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
> than the memory requested by the process. If 
> {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
> Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
> journalnode startup. This can easily be overlooked by users. However, as the 
> cluster runs to a certain period of time, it is likely to cause the cluster 
> to crash.
>  
> NN log:
> !image-2022-04-21-09-54-57-111.png|width=1012,height=47!
> !image-2022-04-21-12-32-56-170.png|width=809,height=218!
> IMO, we should not set the {{cache size}} to a fixed value, but to the ratio 
> of maximum memory, which is 0.2 by default.
> This avoids the problem of too large cache size. In addition, users can 
> actively adjust the heap size when they need to increase the cache size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16547:
--
  Component/s: namanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [SBN read] Namenode in safe mode should not be transfered to observer state
> ---
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode 
> manually), we can transfer this Namenode to Observer by command. This 
> Observer node may receive many requests and then throw a SafemodeException, 
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16593) Correct inaccurate BlocksRemoved metric on DataNode side

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16593:
--
  Component/s: datanode
   metrics
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Correct inaccurate BlocksRemoved metric on DataNode side
> 
>
> Key: HDFS-16593
> URL: https://issues.apache.org/jira/browse/HDFS-16593
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, metrics
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tracing the root cause of production issue, I found that the 
> BlocksRemoved  metric on Datanode size was inaccurate.
> {code:java}
> case DatanodeProtocol.DNA_INVALIDATE:
>   //
>   // Some local block(s) are obsolete and can be 
>   // safely garbage-collected.
>   //
>   Block toDelete[] = bcmd.getBlocks();
>   try {
> // using global fsdataset
> dn.getFSDataset().invalidate(bcmd.getBlockPoolId(), toDelete);
>   } catch(IOException e) {
> // Exceptions caught here are not expected to be disk-related.
> throw e;
>   }
>   dn.metrics.incrBlocksRemoved(toDelete.length);
>   break;
> {code}
> Because even if the invalidate method throws an exception, some blocks may 
> have been successfully deleted internally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16659:
--
  Component/s: journal-node
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> JournalNode should throw NewerTxnIdException if SinceTxId is bigger than 
> HighestWrittenTxId
> ---
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16623) IllegalArgumentException in LifelineSender

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16623:
--
  Component/s: datanode
 Target Version/s: 3.3.5, 3.2.4, 3.4.0
Affects Version/s: 3.3.5
   3.2.4
   3.4.0

> IllegalArgumentException in LifelineSender
> --
>
> Key: HDFS-16623
> URL: https://issues.apache.org/jira/browse/HDFS-16623
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.2.4, 3.3.5
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In our production environment, an IllegalArgumentException occurred in the 
> LifelineSender at one DataNode which was undergoing GC at that time. 
> And the bug code is at line 1060 in BPServiceActor.java, because the sleep 
> time is negative.
> {code:java}
> while (shouldRun()) {
>  try {
> if (lifelineNamenode == null) {
>   lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr);
> }
> sendLifelineIfDue();
> Thread.sleep(scheduler.getLifelineWaitTime());
>   } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
>   } catch (IOException e) {
> LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, 
> e);
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16583:
--
  Component/s: datanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.2.4, 3.4.0
Affects Version/s: 3.3.5
   3.2.4
   3.4.0

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -
>
> Key: HDFS-16583
> URL: https://issues.apache.org/jira/browse/HDFS-16583
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.2.4, 3.3.5
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> 
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>   Number of suppressed write-lock reports: 0
>   Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
> private void check() {
>   final Iterator>>
>   it = new CyclicIteration<>(outOfServiceNodeBlocks,
>   iterkey).iterator();
>   final LinkedList toRemove = new LinkedList<>();
>   while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>   .isRunning()) {
> numNodesChecked++;
> final Ma

[jira] [Updated] (HDFS-15225) RBF: Add snapshot counts to content summary in router

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15225:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Add snapshot counts to content summary in router
> -
>
> Key: HDFS-15225
> URL: https://issues.apache.org/jira/browse/HDFS-15225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Quan Li
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16572) Fix typo in readme of hadoop-project-dist

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16572:
--
 Component/s: documentation
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Fix typo in readme of hadoop-project-dist
> -
>
> Key: HDFS-16572
> URL: https://issues.apache.org/jira/browse/HDFS-16572
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Change *not* to *no*.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16552) Fix NPE for TestBlockManager

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16552:
--
  Component/s: test
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.2.4, 3.4.0
Affects Version/s: 3.3.5
   3.2.4
   3.4.0

> Fix NPE for TestBlockManager
> 
>
> Key: HDFS-16552
> URL: https://issues.apache.org/jira/browse/HDFS-16552
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.2.4, 3.3.5
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a NPE in BlockManager when run 
> TestBlockManager#testSkipReconstructionWithManyBusyNodes2. Because 
> NameNodeMetrics is not initialized in this unit test.
>  
> Related ci link, see 
> [this|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt].
> {code:java}
> [ERROR] Tests run: 34, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 30.088 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
> [ERROR] 
> testSkipReconstructionWithManyBusyNodes2(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager)
>   Time elapsed: 2.783 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.scheduleReconstruction(BlockManager.java:2171)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSkipReconstructionWithManyBusyNodes2(TestBlockManager.java:947)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16507:
--
 Component/s: namanode
Target Version/s: 3.3.3, 3.4.0  (was: 3.3.3)

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceC

[jira] [Updated] (HDFS-16498) Fix NPE for checkBlockReportLease

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16498:
--
  Component/s: datanode
   namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix NPE for checkBlockReportLease
> -
>
> Key: HDFS-16498
> URL: https://issues.apache.org/jira/browse/HDFS-16498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namanode
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> During the restart of Namenode, a Datanode is not registered, but this 
> Datanode triggers FBR, which causes NPE.
> !image-2022-03-09-20-35-22-028.png|width=871,height=158!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16503) Should verify whether the path name is valid in the WebHDFS

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16503:
--
  Component/s: webhdfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Should verify whether the path name is valid in the WebHDFS
> ---
>
> Key: HDFS-16503
> URL: https://issues.apache.org/jira/browse/HDFS-16503
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-03-14-09-35-49-860.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When creating a file using WebHDFS, there are two main steps:
> 1. Obtain the location of the Datanode to be written.
> 2. Put the file to this location.
> Currently *NameNodeRpcServer* verifies that pathName is valid, but 
> *NamenodeWebHdfsMethods* and *RouterWebHdfsMethods* do not.
> So if we use an invalid path(such as duplicated slash), the first step 
> returns success, but the second step throws an {*}InvalidPathException{*}. 
> IMO, we should also do the validation in WebHdfs, which is consistent with 
> the NameNodeRpcServer.
> !image-2022-03-14-09-35-49-860.png|width=548,height=164!
> The same webHDFS operations are: CREATE, APPEND, OPEN, GETFILECHECKSUM. So we 
> can add DFSUtil.isValidName to redirectURI for *NamenodeWebHdfsMethods* and 
> *RouterWebHdfsMethods.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16406) DataNode metric ReadsFromLocalClient does not count short-circuit reads

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16406:
--
  Component/s: datanode
   metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> DataNode metric ReadsFromLocalClient does not count short-circuit reads
> ---
>
> Key: HDFS-16406
> URL: https://issues.apache.org/jira/browse/HDFS-16406
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, metrics
>Affects Versions: 3.4.0
>Reporter: secfree
>Assignee: secfree
>Priority: Minor
>  Labels: metrics, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The following test case failed. 
> {code}
>   @Test
>   public void testNodeLocalMetrics() throws Exception {
> Assume.assumeTrue(null == DomainSocket.getLoadingFailureReason());
> Configuration conf = new HdfsConfiguration();
> conf.setBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY, true);
> TemporarySocketDirectory sockDir = new TemporarySocketDirectory();
> DomainSocket.disableBindPathValidation();
> conf.set(DFSConfigKeys.DFS_DOMAIN_SOCKET_PATH_KEY,
> new File(sockDir.getDir(),
> "testNodeLocalMetrics._PORT.sock").getAbsolutePath());
> MiniDFSCluster cluster = new 
> MiniDFSCluster.Builder(conf).numDataNodes(1).build();
> try {
>   cluster.waitActive();
>   FileSystem fs = cluster.getFileSystem();
>   Path testFile = new Path("/testNodeLocalMetrics.txt");
>   long file_len = 10;
>   DFSTestUtil.createFile(fs, testFile, file_len, (short)1, 1L);
>   DFSTestUtil.readFile(fs, testFile);
>   List datanodes = cluster.getDataNodes();
>   assertEquals(datanodes.size(), 1);
>   DataNode datanode = datanodes.get(0);
>   MetricsRecordBuilder rb = getMetrics(datanode.getMetrics().name());
>   // Write related metrics
>   assertCounter("WritesFromLocalClient", 1L, rb);
>   // Read related metrics
>   assertCounter("ReadsFromLocalClient", 1L, rb); // failed here
> } finally {
>   if (cluster != null) {
> cluster.shutdown();
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16303:
--
 Component/s: block placement
  datanode
Target Version/s: 3.3.5, 3.2.4, 3.4.0

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement, datanode
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Root Cause
> The HDFS Namenode class "DatanodeAdminManager" is responsible for 
> decommissioning datanodes.
> As per this "hdfs-site" configuration:
> {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes 
>  Default Value = 100
> The maximum number of decommission-in-progress datanodes nodes that will be 
> tracked at one time by the namenode. Tracking a decommission-in-progress 
> datanode consumes additional NN memory proportional to the number of blocks 
> on the datnode. Having a conservative limit reduces the potential impact of 
> decomissioning a large number of nodes at once. A value of 0 means no limit 
> will be enforced.
> {quote}
> The Namenode will only actively track up to 100 datanodes for decommissioning 
> at any given time, as to avoid Namenode memory pressure.
> Looking into the "DatanodeAdminManager" code:
>  * a new datanode is only removed from the "tracked.nodes" set when it 
> finishes decommissioning
>  * a new datanode is only added to the "tracked.nodes" set if there is fewer 
> than 100 datanodes being tracked
> So in the event that there are more than 100 datanodes being decommissioned 
> at a given time, some of those datanodes will not be in the "tracked.nodes" 
> set until 1 or more datanodes in the "tracked.nodes" finishes 
> decommissioning. This is generally not a problem because the datanodes in 
> "tracked.nodes" will eventually finish decommissioning, but there is an edge 
> case where this logic prevents the namenode from making any forward progress 
> towards decommissioning.
> If all 100 datanodes in the "tracked.nodes" are unable to finish 
> decommissioning, then other datanodes (which may be able to be 
> decommissioned) will never get added to "tracked.nodes" and therefore will 
> never get the opportunity to be decommissioned.
> This can occur due the following issue:
> {quote}2021-10-21 12:39:24,048 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager 
> (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In 
> Progress. Cannot be safely decommissioned or be in maintenance since there is 
> risk of reduced data durability or data loss. Either restart the failed node 
> or force decommissioning or maintenance by removing, calling refreshNodes, 
> then re-adding to the excludes or host config files.
> {quote}
> If a Datanode is lost while decommissioning (for example if the underlying 
> hardware fails or is lost), then it will remain in state decommissioning 
> forever.
> If 100 or more Datanodes are lost while decommissioning over the Hadoop 
> cluster lifetime, then this is enough to completely fill up the 
> "tracked.nodes" set. With the entire "tracked.nodes" set filled with 
> datanodes that can never finish decommissioning, any datanodes added after 
> this point will never be able to be decommissioned because they will never be 
> added to the "tracked.nodes" set.
> In this scenario:
>  * the "tracked.nodes" set is filled with datanodes which are lost & cannot 
> be recovered (and can never finish decommissioning so they will never be 
> removed from the set)
>  * the actual live datanodes being decommissioned are enqueued waiting to 
> enter the "tracked.nodes" set (and are stuck waiting indefinitely)
> This means that no progress towards decommissioning the live datanodes will 
> be made unless the user takes the following action:
> {quote}Either restart the failed node or force decommissioning or maintenance 
> by removing, calling refreshNodes, then re-adding to the excludes or host 
> config files.
> {quote}
> Ideally, the Namenode should be able to gracefully handle scenarios where th

1 2 3 >

1 - 100 of 277 matches

Mail list logo