[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814682#comment-17814682
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

ZanderXu commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929028734

   Thanks @hfutatzhanghb for your report. 
   
   Is `DNA_ACCESSKEYUPDATE` blocked by `BlockCommand`? If so, how about letting 
`CommandProcessingThread` only handle  command that might take a long time?
   
   




> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814683#comment-17814683
 ] 

ASF GitHub Bot commented on HDFS-17362:
---

tasanuma commented on code in PR #6510:
URL: https://github.com/apache/hadoop/pull/6510#discussion_r1479405963


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RouterObserverReadProxyProvider.java:
##
@@ -84,7 +84,8 @@ public class RouterObserverReadProxyProvider extends 
AbstractNNFailoverProxyP
 
   public RouterObserverReadProxyProvider(Configuration conf, URI uri, Class 
xface,
   HAProxyFactory factory) {
-this(conf, uri, xface, factory, new IPFailoverProxyProvider<>(conf, uri, 
xface, factory));
+this(conf, uri, xface, factory,
+new ConfiguredFailoverProxyProvider<>(conf, uri, xface, factory));

Review Comment:
   I have implemented `RouterObserverReadConfiguredFailoverProxyProvider` and 
updated `TestObserverWithRouter` accordingly.





> RBF: RouterObserverReadProxyProvider should use 
> ConfiguredFailoverProxyProvider internally
> --
>
> Key: HDFS-17362
> URL: https://issues.apache.org/jira/browse/HDFS-17362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
> while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
> we are to align RouterObserverReadProxyProvider with 
> ObserverReadProxyProvider, RouterObserverReadProxyProvider should internally 
> use ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has 
> an issue with resolving HA configurations. (For example, 
> IPFailoverProxyProvider cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814690#comment-17814690
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

hfutatzhanghb commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929053830

   > Thanks @hfutatzhanghb for your report.
   > 
   > Is `DNA_ACCESSKEYUPDATE` blocked by `BlockCommand`? If so, how about 
letting `CommandProcessingThread` only handle command that might take a long 
time?
   
   @ZanderXu Hi, sir.  Yes, DNA_ACCESSKEYUPDATE blocked by BlockCommand.  Good 
idea, but still have some problems to solve. For example, the heartbeat 
response may contain commands with kind of command type, we can't know whether 
it contains DNA_ACCESSKEYUPDATE except we iterate `resp.getCommands()` and 
iteration will introduce extra cost.
   What's your opinion ? sir. Hope to receive your reply, Thanks a lot.
   




> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17342) Fix DataNode may invalidates normal block causing missing block

2024-02-06 Thread Shuyan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyan Zhang resolved HDFS-17342.
-
Hadoop Flags: Reviewed
Target Version/s: 3.5.0
  Resolution: Fixed

> Fix DataNode may invalidates normal block causing missing block
> ---
>
> Key: HDFS-17342
> URL: https://issues.apache.org/jira/browse/HDFS-17342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When users read an append file, occasional exceptions may occur, such as 
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx.
> This can happen if one thread is reading the block while writer thread is 
> finalizing it simultaneously.
> *Root cause:*
> # The reader thread obtains a RBW replica from VolumeMap, such as: 
> blk_xxx_xxx[RBW] and  the data file should be in /XXX/rbw/blk_xxx.
> # Simultaneously, the writer thread will finalize this block, moving it from 
> the RBW directory to the FINALIZE directory. the data file is move from 
> /XXX/rbw/block_xxx to /XXX/finalize/block_xxx.
> # The reader thread attempts to open this data input stream but encounters a 
> FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file 
> /XXX/rbw/blk_xxx_xxx doesn't exist at this moment.
> # The reader thread  will treats this block as corrupt, removes the replica 
> from the volume map, and the DataNode reports the deleted block to the 
> NameNode.
> # The NameNode removes this replica for the block.
> # If the current file replication is 1, this file will cause a missing block 
> issue until this DataNode executes the DirectoryScanner again.
> As described above, when the reader thread encountered FileNotFoundException 
> is as expected, because the file is moved.
> So we need to add a double check to the invalidateMissingBlock logic to 
> verify whether the data file or meta file exists to avoid similar cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17342) Fix DataNode may invalidates normal block causing missing block

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814706#comment-17814706
 ] 

ASF GitHub Bot commented on HDFS-17342:
---

zhangshuyan0 commented on PR #6464:
URL: https://github.com/apache/hadoop/pull/6464#issuecomment-1929152978

   Committed to trunk. Thanks for your contribution @haiyang1987 @ZanderXu 
@smarthanwang .




> Fix DataNode may invalidates normal block causing missing block
> ---
>
> Key: HDFS-17342
> URL: https://issues.apache.org/jira/browse/HDFS-17342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When users read an append file, occasional exceptions may occur, such as 
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx.
> This can happen if one thread is reading the block while writer thread is 
> finalizing it simultaneously.
> *Root cause:*
> # The reader thread obtains a RBW replica from VolumeMap, such as: 
> blk_xxx_xxx[RBW] and  the data file should be in /XXX/rbw/blk_xxx.
> # Simultaneously, the writer thread will finalize this block, moving it from 
> the RBW directory to the FINALIZE directory. the data file is move from 
> /XXX/rbw/block_xxx to /XXX/finalize/block_xxx.
> # The reader thread attempts to open this data input stream but encounters a 
> FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file 
> /XXX/rbw/blk_xxx_xxx doesn't exist at this moment.
> # The reader thread  will treats this block as corrupt, removes the replica 
> from the volume map, and the DataNode reports the deleted block to the 
> NameNode.
> # The NameNode removes this replica for the block.
> # If the current file replication is 1, this file will cause a missing block 
> issue until this DataNode executes the DirectoryScanner again.
> As described above, when the reader thread encountered FileNotFoundException 
> is as expected, because the file is moved.
> So we need to add a double check to the invalidateMissingBlock logic to 
> verify whether the data file or meta file exists to avoid similar cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814721#comment-17814721
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

hadoop-yetus commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929257936

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m 24s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  buf  |   0m  0s |  |  buf was not available.  |
   | +0 :ok: |  buf  |   0m  0s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 50s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 35s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  cc  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  cc  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6530/1/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 199m 38s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6530/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 26s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 297m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6530/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6530 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint 
bufcompat |
   | uname | Linux 6c378f02494a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bd1fa8447b7462e2defdb31a9201a46f9165d427 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/li

[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814734#comment-17814734
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

zhangshuyan0 commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929311426

   I am a bit confused about this. How often do you update the block key? 
@hfutatzhanghb 




> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814741#comment-17814741
 ] 

ASF GitHub Bot commented on HDFS-17362:
---

hadoop-yetus commented on PR #6510:
URL: https://github.com/apache/hadoop/pull/6510#issuecomment-1929328194

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 13s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 55s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  32m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  33m  6s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m  4s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 59s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  32m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 27s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  |  13m 53s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 173m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestObserverWithRouter |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6510 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 28e884691938 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 543645c16730c0faf66d0d65dbab28299815f87a |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-

[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814744#comment-17814744
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

hfutatzhanghb commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-192931

   we use default, 10 hours.
   
   
   
   --

> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814753#comment-17814753
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

zhangshuyan0 commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929344404

   > we use default, 10 hours.
   
   Has command DNA_ACCESSKEYUPDATE not been executed for ten hours?




> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814754#comment-17814754
 ] 

ASF GitHub Bot commented on HDFS-17362:
---

hadoop-yetus commented on PR #6510:
URL: https://github.com/apache/hadoop/pull/6510#issuecomment-1929344958

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 32s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 43s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  32m 59s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  33m 20s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  32m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 26s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  13m 49s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 173m 55s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6510 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a93b561fa761 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 279c68805e8e9da5467d7e981d2cebc5ae53ec79 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/5/testReport/ |
   | Max. process+thread count | 2988 (vs. ulimit of 5500) |
   | modul

[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814756#comment-17814756
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

hfutatzhanghb commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1929346985

   yes, it was blocked by block command。
   
   
   
   --

> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17181) WebHDFS not considering whether a DN is good when called from outside the cluster

2024-02-06 Thread Kanaka Kumar Avvaru (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814851#comment-17814851
 ] 

Kanaka Kumar Avvaru commented on HDFS-17181:


Thanks  [~larsfrancke], we also see this issue in our clusters; Webhdfs writes 
fails after decommissioning  a DN as NN choosen the removed DN for write;

Can some one review the PR from  [~larsfrancke]  and merge to branch 3.3 

cc: [~surendralilhore]  [~rakeshr] 

> WebHDFS not considering whether a DN is good when called from outside the 
> cluster
> -
>
> Key: HDFS-17181
> URL: https://issues.apache.org/jira/browse/HDFS-17181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, webhdfs
>Affects Versions: 3.3.6
>Reporter: Lars Francke
>Priority: Major
>  Labels: pull-request-available
> Attachments: Test_fix_for_HDFS-171811.patch
>
>
> When calling WebHDFS to create a file (I'm sure the same problem occurs for 
> other actions e.g. OPEN but I haven't checked all of them yet) it will 
> happily redirect to nodes that are in maintenance.
> The reason is in the 
> [{{chooseDatanode}}|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L307C9-L315]
>  method in {{NamenodeWebHdfsMethods}} where it will only call the 
> {{BlockPlacementPolicy}} (which considers all these edge cases) in case the 
> {{remoteAddr}} (i.e. the address making the request to WebHDFS) is also 
> running a DataNode.
>  
> In all other cases it just refers to 
> [{{NetworkTopology#chooseRandom}}|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L342-L343]
>  which does not consider any of these circumstances (e.g. load, maintenance).
> I don't understand the reason to not just always refer to the placement 
> policy and we're currently testing a patch to do just that.
> I have attached a draft patch for now.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814863#comment-17814863
 ] 

ASF GitHub Bot commented on HDFS-17362:
---

hadoop-yetus commented on PR #6510:
URL: https://github.com/apache/hadoop/pull/6510#issuecomment-1930035843

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 25s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 52s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  32m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m  2s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 53s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  32m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 27s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  13m 49s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 173m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6510 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 746ecb197a58 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / aca8c4ec079570299ae30f3bdf3e443b8a5ff22e |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6510/6/testReport/ |
   | Max. process+thread count | 3009 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/jo

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814919#comment-17814919
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

shahrs87 commented on code in PR #6513:
URL: https://github.com/apache/hadoop/pull/6513#discussion_r1479057807


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##
@@ -1618,24 +1621,33 @@ private void setupPipelineForAppendOrRecovery() throws 
IOException {
   LOG.warn(msg);
   lastException.set(new IOException(msg));
   streamerClosed = true;
-  return;
+  return false;
 }
-setupPipelineInternal(nodes, storageTypes, storageIDs);
+return setupPipelineInternal(nodes, storageTypes, storageIDs);
   }
 
-  protected void setupPipelineInternal(DatanodeInfo[] datanodes,
+  protected boolean setupPipelineInternal(DatanodeInfo[] datanodes,
   StorageType[] nodeStorageTypes, String[] nodeStorageIDs)
   throws IOException {
 boolean success = false;
 long newGS = 0L;
+boolean isCreateStage = BlockConstructionStage.PIPELINE_SETUP_CREATE == 
stage;
 while (!success && !streamerClosed && dfsClient.clientRunning) {
   if (!handleRestartingDatanode()) {
-return;
+return false;
+  }
+
+  final boolean isRecovery = errorState.hasInternalError() && 
!isCreateStage;
+
+  // During create stage, if we remove a node (nodes.length - 1)
+  //  min replication should still be satisfied.
+  if (isCreateStage && !(dfsClient.dtpReplaceDatanodeOnFailureReplication 
> 0 &&

Review Comment:
   Reason behind adding this check here:
   We are already doing this check in catch block of 
`addDatanode2ExistingPipeline` method 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1528-L1539).
   But when `isAppend` flag is set to `false` and we are in 
`PIPELINE_SETUP_CREATE` phase, we exit early from 
`addDatanode2ExistingPipeline` method 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1489-L1492)
   Irrespective of ReplaceDatanodeOnFailure policy, we will NEVER add a new 
datanode to the pipeline during PIPELINE_SETUP_CREATE stage and if removing one 
bad datanode is going to violate 
`dfs.client.block.write.replace-datanode-on-failure.min-replication` property 
then we should exit early.





> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStre

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814924#comment-17814924
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

shahrs87 commented on code in PR #6513:
URL: https://github.com/apache/hadoop/pull/6513#discussion_r1480341310


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##
@@ -1618,24 +1621,33 @@ private void setupPipelineForAppendOrRecovery() throws 
IOException {
   LOG.warn(msg);
   lastException.set(new IOException(msg));
   streamerClosed = true;
-  return;
+  return false;
 }
-setupPipelineInternal(nodes, storageTypes, storageIDs);
+return setupPipelineInternal(nodes, storageTypes, storageIDs);
   }
 
-  protected void setupPipelineInternal(DatanodeInfo[] datanodes,
+  protected boolean setupPipelineInternal(DatanodeInfo[] datanodes,
   StorageType[] nodeStorageTypes, String[] nodeStorageIDs)
   throws IOException {
 boolean success = false;
 long newGS = 0L;
+boolean isCreateStage = BlockConstructionStage.PIPELINE_SETUP_CREATE == 
stage;
 while (!success && !streamerClosed && dfsClient.clientRunning) {
   if (!handleRestartingDatanode()) {
-return;
+return false;
+  }
+
+  final boolean isRecovery = errorState.hasInternalError() && 
!isCreateStage;
+
+  // During create stage, if we remove a node (nodes.length - 1)
+  //  min replication should still be satisfied.
+  if (isCreateStage && !(dfsClient.dtpReplaceDatanodeOnFailureReplication 
> 0 &&

Review Comment:
   Thinking about the case where we are in PIPELINE_SETUP_CREATE stage but 
isAppend is set to true, then it will not exit early from 
addDatanode2ExistingPipeline method 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1489-L1492)
 
   
   Assuming the `replication factor is 3` and 
`dfs.client.block.write.replace-datanode-on-failure.min-replication is set to 
3` and there is 1 bad node in the pipeline and there are valid nodes in the 
cluster, this patch will return false early. 
   I think we should move this check after `handleDatanodeReplacement` method. 
@ritegarg 





> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [T

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815033#comment-17815033
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

shahrs87 commented on code in PR #6513:
URL: https://github.com/apache/hadoop/pull/6513#discussion_r1479041865


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##
@@ -1607,8 +1607,11 @@ private void transfer(final DatanodeInfo src, final 
DatanodeInfo[] targets,
* it can be written to.
* This happens when a file is appended or data streaming fails
* It keeps on trying until a pipeline is setup
+   *
+   * Returns boolean whether pipeline was setup successfully or not.
+   * This boolean is used upstream on whether to continue creating pipeline or 
throw exception
*/
-  private void setupPipelineForAppendOrRecovery() throws IOException {
+  private boolean setupPipelineForAppendOrRecovery() throws IOException {

Review Comment:
   We are changing the return type of `setupPipelineForAppendOrRecovery` and 
`setupPipelineInternal` methods.
   IIUC this is the reason: `handleBadDatanode` can silently fail to handle bad 
datanode 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1700-L1706)
 and `setupPipelineInternal` will silently return 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1637-L1638)
 without bubbling up the exception. 
   





> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.prot

[jira] [Commented] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815039#comment-17815039
 ] 

ASF GitHub Bot commented on HDFS-17362:
---

simbadzina commented on code in PR #6510:
URL: https://github.com/apache/hadoop/pull/6510#discussion_r1480692356


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RouterObserverReadProxyProvider.java:
##
@@ -84,7 +84,8 @@ public class RouterObserverReadProxyProvider extends 
AbstractNNFailoverProxyP
 
   public RouterObserverReadProxyProvider(Configuration conf, URI uri, Class 
xface,
   HAProxyFactory factory) {
-this(conf, uri, xface, factory, new IPFailoverProxyProvider<>(conf, uri, 
xface, factory));
+this(conf, uri, xface, factory,
+new ConfiguredFailoverProxyProvider<>(conf, uri, xface, factory));

Review Comment:
   Cool. That sounds good to me.





> RBF: RouterObserverReadProxyProvider should use 
> ConfiguredFailoverProxyProvider internally
> --
>
> Key: HDFS-17362
> URL: https://issues.apache.org/jira/browse/HDFS-17362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
> while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
> we are to align RouterObserverReadProxyProvider with 
> ObserverReadProxyProvider, RouterObserverReadProxyProvider should internally 
> use ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has 
> an issue with resolving HA configurations. (For example, 
> IPFailoverProxyProvider cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17372) CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high priority command blocked by low priority command

2024-02-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815072#comment-17815072
 ] 

ASF GitHub Bot commented on HDFS-17372:
---

hfutatzhanghb commented on PR #6530:
URL: https://github.com/apache/hadoop/pull/6530#issuecomment-1931179669

   @zhangshuyan0 Sir, thanks  a lot for your valuable suggestion, Agree with 
you and @ZanderXu . I will try to use the trick you mentioned above to modify 
this PR. 




> CommandProcessingThread#queue should use LinkedBlockingDeque to prevent high 
> priority command blocked by low priority command
> -
>
> Key: HDFS-17372
> URL: https://issues.apache.org/jira/browse/HDFS-17372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org