date:20220605

[jira] [Updated] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16622:
--
Labels: pull-request-available  (was: )

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=778501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778501
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 02:53
Start Date: 06/Jun/22 02:53
Worklog Time Spent: 10m 
  Work Description: ZanderXu opened a new pull request, #4407:
URL: https://github.com/apache/hadoop/pull/4407

   JIRA: [HDFS-16622](https://issues.apache.org/jira/browse/HDFS-16622).  
addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
   I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo 
rdbi,DatanodeStorage storage)(line 250).
   Bug code in the for loop:
   synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
 DatanodeStorage storage) {
   // Make sure another entry for the same block is first removed.
   // There may only be one such entry.
   for (PerStorageIBR perStorage : pendingIBRs.values()) {
 if (perStorage.remove(rdbi.getBlock()) != null) {
   break;
 }
   }
   getPerStorageIBR(storage).put(rdbi);
 }
   Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS 
when complete one block.




Issue Time Tracking
---

Worklog Id: (was: 778501)
Remaining Estimate: 0h
Time Spent: 10m

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778486
 ]

ASF GitHub Bot logged work on HDFS-16621:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 01:51
Start Date: 06/Jun/22 01:51
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on PR #4404:
URL: https://github.com/apache/hadoop/pull/4404#issuecomment-1146953721

   Here are some failed tests.
   TestClientProtocolForPipelineRecovery
   TestReplaceDatanodeFailureReplication
   TestFileCreation
   TestExternalStoragePolicySatisfier
   TestBalancerRPCDelay
   TestDataNodeUUID
   TestRedudantBlocks
   TestSeveralNameNodes
   TestAddOverReplicatedStripedBlocks
   
   It looks like these failed tests have little to do with the code I submitted.
   Hi @arp7  @ayushtkn  @tomscut , can you guys help to review this pr?
   thank you very much.




Issue Time Tracking
---

Worklog Id: (was: 778486)
Time Spent: 0.5h  (was: 20m)

> Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
> -
>
> Key: HDFS-16621
> URL: https://issues.apache.org/jira/browse/HDFS-16621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, qjm
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In JNStorage, sd.getCurrentDir() is used in 5~6 places,
> It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16463) Make dirent cross platform compatible

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16463?focusedWorklogId=778484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778484
 ]

ASF GitHub Bot logged work on HDFS-16463:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 01:28
Start Date: 06/Jun/22 01:28
Worklog Time Spent: 10m 
  Work Description: goiri commented on code in PR #4370:
URL: https://github.com/apache/hadoop/pull/4370#discussion_r889767207


##
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.cc:
##
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "x-platform/c-api/dirent.h"
+#include "x-platform/dirent.h"
+
+#if defined(WIN32) && defined(__cplusplus)

Review Comment:
   There's no cleaner way to do this?



##
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.h:
##
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NATIVE_LIBHDFSPP_LIB_CROSS_PLATFORM_C_API_DIRENT_H
+#define NATIVE_LIBHDFSPP_LIB_CROSS_PLATFORM_C_API_DIRENT_H
+
+/*
+ * We will use XPlatform's dirent on Windows or when the macro
+ * USE_X_PLATFORM_DIRENT is defined.
+ */
+#if defined(WIN32) || defined(USE_X_PLATFORM_DIRENT)
+
+/*
+ * We will use extern "C" only on Windows.
+ */
+#if defined(WIN32) && defined(__cplusplus)
+extern "C" {
+#endif
+
+/**
+ * DIR struct holds the pointer to XPlatform::Dirent instance. Since this will
+ * be used in C, we can't hold the pointer to XPlatform::Dirent. We're working
+ * around this by using a void pointer and casting it to XPlatform::Dirent when
+ * needed in C++.
+ */
+typedef struct DIR {
+  void *x_platform_dirent_ptr;
+} DIR;
+
+/**
+ * dirent struct contains the name of the file/folder while iterating through
+ * the directory's children.
+ */
+struct dirent {
+  char d_name[256];
+};
+
+/**
+ * Opens a directory for iteration. Internally, it instantiates DIR struct for
+ * the given path. closedir must be called on the returned pointer to DIR 
struct
+ * when done.
+ *
+ * @param dir_path The path to the directory to iterate through.
+ * @return A pointer to the DIR struct.
+ */
+DIR *opendir(const char *dir_path);
+
+/**
+ * For iterating through the children of the directory pointed to by the DIR
+ * struct pointer.
+ *
+ * @param dir The pointer to the DIR struct.
+ * @return A pointer to dirent struct containing the name of the current child
+ * file/folder.
+ */
+struct dirent *readdir(DIR *dir);
+
+/**
+ * De-allocates the XPlatform::Dirent instance pointed to by the DIR pointer.
+ *
+ * @param dir The pointer to DIR struct to close.
+ * @return 0 if successful.
+ */
+int closedir(DIR *dir);
+
+#if defined(WIN32) && defined(__cplusplus)
+}
+#endif
+
+#else
+/*
+ * For non-Windows environments, we use the dirent.h header itself.
+ */
+#include 

Review Comment:
   It might be easier to read to have this one first.



##
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.cc:
##
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778482
 ]

ASF GitHub Bot logged work on HDFS-16595:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 01:08
Start Date: 06/Jun/22 01:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4405:
URL: https://github.com/apache/hadoop/pull/4405#issuecomment-1146932233

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  15m  9s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  buf  |   0m  0s |  |  buf was not available.  |
   | +0 :ok: |  buf  |   0m  0s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 21s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  29m 12s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   4m 30s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   2m 58s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   2m 51s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   6m 29s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  30m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 25s |  |  the patch passed  |
   | +1 :green_heart: |  cc  |   4m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 46s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   6m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  31m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 35s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 229m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 391m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4405 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint 
bufcompat |
   | uname | Linux 75034c6e3cbd 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 
19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / 35f87e3afa1b311d282cbc600ca3fe298093bcc6 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/testReport/ |
   | Max. process+thread count | 2431 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking

[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16619:
--
Labels: pull-request-available  (was: )

> impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
> --
>
> Key: HDFS-16619
> URL: https://issues.apache.org/jira/browse/HDFS-16619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HttpHeaders.Values and HttpHeaders.Names are deprecated, use 
> HttpHeaderValues and HttpHeaderNames instead.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16619?focusedWorklogId=778481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778481
 ]

ASF GitHub Bot logged work on HDFS-16619:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 00:59
Start Date: 06/Jun/22 00:59
Worklog Time Spent: 10m 
  Work Description: slfan1989 opened a new pull request, #4406:
URL: https://github.com/apache/hadoop/pull/4406

   JIRA: HDFS-16619. impove HttpHeaders.Values And HttpHeaders.Names With 
recommended Class
   
   HttpHeaders.Values and HttpHeaders.Names are deprecated, use 
HttpHeaderValues and HttpHeaderNames instead.




Issue Time Tracking
---

Worklog Id: (was: 778481)
Remaining Estimate: 0h
Time Spent: 10m

> impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
> --
>
> Key: HDFS-16619
> URL: https://issues.apache.org/jira/browse/HDFS-16619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
> Fix For: 3.4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HttpHeaders.Values and HttpHeaders.Names are deprecated, use 
> HttpHeaderValues and HttpHeaderNames instead.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778480
 ]

ASF GitHub Bot logged work on HDFS-16618:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 00:58
Start Date: 06/Jun/22 00:58
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4402:
URL: https://github.com/apache/hadoop/pull/4402#discussion_r889762386


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java:
##
@@ -216,16 +216,20 @@ synchronized void shutdown() {
 }
   }
 
-  public void submitSyncFileRangeRequest(FsVolumeImpl volume,
-  final ReplicaOutputStreams streams, final long offset, final long nbytes,
-  final int flags) {
-execute(volume, new Runnable() {
-  @Override
-  public void run() {
+  public void submitSyncFileRangeRequest(FsVolumeImpl volume, final 
ReplicaOutputStreams streams,
+  final long offset, final long nbytes, final int flags) {
+execute(volume, () -> {
+  try {
+streams.syncFileRangeIfPossible(offset, nbytes, flags);
+  } catch (NativeIOException e) {
 try {
-  streams.syncFileRangeIfPossible(offset, nbytes, flags);
-} catch (NativeIOException e) {
-  LOG.warn("sync_file_range error", e);
+  LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, 
Available space: {}, "

Review Comment:
   Hi @virajjasani , please remove this extra space.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java:
##
@@ -216,16 +216,20 @@ synchronized void shutdown() {
 }
   }
 
-  public void submitSyncFileRangeRequest(FsVolumeImpl volume,
-  final ReplicaOutputStreams streams, final long offset, final long nbytes,
-  final int flags) {
-execute(volume, new Runnable() {
-  @Override
-  public void run() {
+  public void submitSyncFileRangeRequest(FsVolumeImpl volume, final 
ReplicaOutputStreams streams,
+  final long offset, final long nbytes, final int flags) {
+execute(volume, () -> {
+  try {
+streams.syncFileRangeIfPossible(offset, nbytes, flags);
+  } catch (NativeIOException e) {
 try {
-  streams.syncFileRangeIfPossible(offset, nbytes, flags);
-} catch (NativeIOException e) {
-  LOG.warn("sync_file_range error", e);
+  LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, 
Available space: {}, "

Review Comment:
   ```suggestion
 LOG.warn("sync_file_range error. Volume: {}, Capacity: {}, 
Available space: {}, "
   ```



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java:
##
@@ -216,16 +216,20 @@ synchronized void shutdown() {
 }
   }
 
-  public void submitSyncFileRangeRequest(FsVolumeImpl volume,
-  final ReplicaOutputStreams streams, final long offset, final long nbytes,
-  final int flags) {
-execute(volume, new Runnable() {
-  @Override
-  public void run() {
+  public void submitSyncFileRangeRequest(FsVolumeImpl volume, final 
ReplicaOutputStreams streams,
+  final long offset, final long nbytes, final int flags) {
+execute(volume, () -> {
+  try {
+streams.syncFileRangeIfPossible(offset, nbytes, flags);
+  } catch (NativeIOException e) {
 try {
-  streams.syncFileRangeIfPossible(offset, nbytes, flags);
-} catch (NativeIOException e) {
-  LOG.warn("sync_file_range error", e);
+  LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, 
Available space: {}, "
+  + "File range offset: {}, length: {}, flags: {}", volume, 
volume.getCapacity(),
+  volume.getAvailable(), offset, nbytes, flags, e);
+} catch (IOException ioe) {
+  LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, "

Review Comment:
   ```suggestion
 LOG.warn("sync_file_range error. Volume: {}, Capacity: {}, "
   ```





Issue Time Tracking
---

Worklog Id: (was: 778480)
Time Spent: 40m  (was: 0.5h)

> sync_file_range error should include more volume and file info
> --
>
> Key: HDFS-16618
> URL: https://issues.apache.org/jira/browse/HDFS-16618
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Having seen multiple sync_file_range errors recently with Bad file 
> descriptor, it would

[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

2022-06-05 Thread fanshilun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fanshilun updated HDFS-16619:
-
Description: HttpHeaders.Values and HttpHeaders.Names are deprecated, use 
HttpHeaderValues and HttpHeaderNames instead.

> impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
> --
>
> Key: HDFS-16619
> URL: https://issues.apache.org/jira/browse/HDFS-16619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
> Fix For: 3.4.0
>
>
> HttpHeaders.Values and HttpHeaders.Names are deprecated, use 
> HttpHeaderValues and HttpHeaderNames instead.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778479
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 00:36
Start Date: 06/Jun/22 00:36
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146920391

   Thank you for your contribution, but I still have some concerns about 
HDFS-16534. I feel that for a new feature, multiple prs should not be used to 
fix the problem separately, which makes the code very difficult to read. I 
recommend creating it under HDFS-15382 A subtask to fix HDFS-16598 and 
HDFS-16600 together, @ZanderXu @MingXiangLi.




Issue Time Tracking
---

Worklog Id: (was: 778479)
Time Spent: 2h  (was: 1h 50m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778471
 ]

ASF GitHub Bot logged work on HDFS-16621:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 23:05
Start Date: 05/Jun/22 23:05
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4404:
URL: https://github.com/apache/hadoop/pull/4404#issuecomment-1146899630

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 40s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 31s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 388m 59s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4404/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  6s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 522m 24s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   |   | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
   |   | hadoop.hdfs.TestReplaceDatanodeFailureReplication |
   |   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4404/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4404 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a2d9f3b810de 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 
19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 088a2a92b484c219f14584ed2ac44e42306b3f1d |
   | Default Java | Private

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778466
 ]

ASF GitHub Bot logged work on HDFS-16618:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 18:37
Start Date: 05/Jun/22 18:37
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on PR #4402:
URL: https://github.com/apache/hadoop/pull/4402#issuecomment-1146862949

   @tomscut @jojochuang could you please review this PR?




Issue Time Tracking
---

Worklog Id: (was: 778466)
Time Spent: 0.5h  (was: 20m)

> sync_file_range error should include more volume and file info
> --
>
> Key: HDFS-16618
> URL: https://issues.apache.org/jira/browse/HDFS-16618
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Having seen multiple sync_file_range errors recently with Bad file 
> descriptor, it would be good to include more volume stats as well as file 
> offset/length info with the error log to get some more insights.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778465
 ]

ASF GitHub Bot logged work on HDFS-16595:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 18:36
Start Date: 05/Jun/22 18:36
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on PR #4405:
URL: https://github.com/apache/hadoop/pull/4405#issuecomment-1146862842

   Here is the branch-3.3 backport PR.
   FYI @tomscut @jojochuang 




Issue Time Tracking
---

Worklog Id: (was: 778465)
Time Spent: 3.5h  (was: 3h 20m)

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778464
 ]

ASF GitHub Bot logged work on HDFS-16595:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 18:35
Start Date: 05/Jun/22 18:35
Worklog Time Spent: 10m 
  Work Description: virajjasani opened a new pull request, #4405:
URL: https://github.com/apache/hadoop/pull/4405

   Backport PR from (#4357)
   
   Reviewed-by: Tao Li 
   Signed-off-by: Wei-Chiu Chuang 
   




Issue Time Tracking
---

Worklog Id: (was: 778464)
Time Spent: 3h 20m  (was: 3h 10m)

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-05 Thread ZanderXu (Jira)

ZanderXu created HDFS-16622:
---

 Summary: addRDBI in IncrementalBlockReportManager may remove the 
block with bigger GS.
 Key: HDFS-16622
 URL: https://issues.apache.org/jira/browse/HDFS-16622
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: ZanderXu
Assignee: ZanderXu


In our production environment,  there is a strange missing block, according to 
the log, I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo 
rdbi,DatanodeStorage storage)(line 250).

Bug code in the for loop:
{code:java}
synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
  DatanodeStorage storage) {
// Make sure another entry for the same block is first removed.
// There may only be one such entry.
for (PerStorageIBR perStorage : pendingIBRs.values()) {
  if (perStorage.remove(rdbi.getBlock()) != null) {
break;
  }
}
getPerStorageIBR(storage).put(rdbi);
  }
{code}

Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than the 
GS of the Block in rdbi. And NN will invalidate the Replicate will small GS 
when complete one block. 
So If there is only one replicate for one block, there is a possibility of 
missingblock because of this wrong logic. 




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work started] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-16621 started by JiangHua Zhu.
---
> Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
> -
>
> Key: HDFS-16621
> URL: https://issues.apache.org/jira/browse/HDFS-16621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, qjm
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In JNStorage, sd.getCurrentDir() is used in 5~6 places,
> It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778453
 ]

ASF GitHub Bot logged work on HDFS-16621:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 14:21
Start Date: 05/Jun/22 14:21
Worklog Time Spent: 10m 
  Work Description: jianghuazhu opened a new pull request, #4404:
URL: https://github.com/apache/hadoop/pull/4404

   ### Description of PR
   In JNStorage, sd.getCurrentDir() is used in 5~6 places,
   It can be replaced with JNStorage#getCurrentDir(), which will be more 
concise.
   Details: HDFS-16621
   
   ### How was this patch tested?
   For testing, there is not much pressure.
   




Issue Time Tracking
---

Worklog Id: (was: 778453)
Remaining Estimate: 0h
Time Spent: 10m

> Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
> -
>
> Key: HDFS-16621
> URL: https://issues.apache.org/jira/browse/HDFS-16621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, qjm
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In JNStorage, sd.getCurrentDir() is used in 5~6 places,
> It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16621:
--
Labels: pull-request-available  (was: )

> Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
> -
>
> Key: HDFS-16621
> URL: https://issues.apache.org/jira/browse/HDFS-16621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, qjm
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In JNStorage, sd.getCurrentDir() is used in 5~6 places,
> It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu reassigned HDFS-16621:
---

Assignee: JiangHua Zhu

> Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
> -
>
> Key: HDFS-16621
> URL: https://issues.apache.org/jira/browse/HDFS-16621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, qjm
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>
> In JNStorage, sd.getCurrentDir() is used in 5~6 places,
> It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

2022-06-05 Thread JiangHua Zhu (Jira)

JiangHua Zhu created HDFS-16621:
---

 Summary: Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
 Key: HDFS-16621
 URL: https://issues.apache.org/jira/browse/HDFS-16621
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node, qjm
Affects Versions: 3.3.0
Reporter: JiangHua Zhu


In JNStorage, sd.getCurrentDir() is used in 5~6 places,
It can be replaced with JNStorage#getCurrentDir(), which will be more concise.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778450
 ]

ASF GitHub Bot logged work on HDFS-16598:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 13:02
Start Date: 05/Jun/22 13:02
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4366:
URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146800739

   Thanks @slfan1989 for your comment.
   At present, I can only confirm that createRBW needs to use 
getReplicaInfo(String bpid, long blkid), but whether it is reasonable to use 
getReplicaInfo(String bpid, long blkid) in other places, I need further 
confirmation.




Issue Time Tracking
---

Worklog Id: (was: 778450)
Time Spent: 1h 50m  (was: 1h 40m)

> All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
> --
>
> Key: HDFS-16598
> URL: https://issues.apache.org/jira/browse/HDFS-16598
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778448
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:58
Start Date: 05/Jun/22 12:58
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146800123

   @slfan1989 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
sometimes succeed? I have run it several times locally and all fail.
   
   > How do you judge the occurrence of DeadLock?
   Deadlock is trigged when evictLazyPersistBlocks is required in createRbw. 
Because createRbw hold BLOCK_POOL read lock, but evictLazyPersistBlocks try to 
hold BLOCK_POOL write lock.




Issue Time Tracking
---

Worklog Id: (was: 778448)
Time Spent: 1h 50m  (was: 1h 40m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778447
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:47
Start Date: 05/Jun/22 12:47
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146798609

   @ZanderXu  I found that some tests of Junit Test of HDFS (DN) sometimes 
succeed and sometimes fail, but there is no way to judge whether it is related 
to DeadLock. How do you judge the occurrence of DeadLock?
   
   @MingXiangLi HDFS-16534 is a very big change, which will greatly help the 
performance improvement of DN, but ZanderXu has already proposed 2 Jiras for 
this change. Can you help to re-examine this HDFS-16534, if it is separate each 
time The commit fixes pr, worried that it will bring more problems.




Issue Time Tracking
---

Worklog Id: (was: 778447)
Time Spent: 1h 40m  (was: 1.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778446
 ]

ASF GitHub Bot logged work on HDFS-16598:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:37
Start Date: 05/Jun/22 12:37
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4366:
URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146797019

   > @MingXiangLi Thanks for you review.
   
   @ZanderXu @MingXiangLi 
   I would like to ask a question, after reading your discussion, is it 
possible that block GS of client may be smaller than DN appears in all places 
where getReplicaInfo(String bpid, long blkid) is called?




Issue Time Tracking
---

Worklog Id: (was: 778446)
Time Spent: 1h 40m  (was: 1.5h)

> All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
> --
>
> Key: HDFS-16598
> URL: https://issues.apache.org/jira/browse/HDFS-16598
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778444
 ]

ASF GitHub Bot logged work on HDFS-16618:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:00
Start Date: 05/Jun/22 12:00
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4402:
URL: https://github.com/apache/hadoop/pull/4402#issuecomment-1146791706

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 346m 13s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 462m  3s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4402/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4402 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 7ecdb3722981 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 75cf43a071c65210c74cd77b2675840419c2ff41 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4402/1/testReport/ |
   | Max. process+thread count | 2365 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U:

[jira] [Work logged] (HDFS-16601) Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16601?focusedWorklogId=778442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778442
 ]

ASF GitHub Bot logged work on HDFS-16601:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:58
Start Date: 05/Jun/22 09:58
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4369:
URL: https://github.com/apache/hadoop/pull/4369#issuecomment-1146776005

   @Hexiaoqiao @MingXiangLi can you help me review this patch? thanks~




Issue Time Tracking
---

Worklog Id: (was: 778442)
Time Spent: 0.5h  (was: 20m)

> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try
> --
>
> Key: HDFS-16601
> URL: https://issues.apache.org/jira/browse/HDFS-16601
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our production environment, we found a bug and stack like:
> {code:java}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1418)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1478)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1704)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1605)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> And the root cause is that DFSClient cannot  perceive the exception of 
> TransferBlock during PipelineRecovery. If failed during TransferBlock, the 
> DFSClient will retry all datanodes in the cluster and then failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778441
 ]

ASF GitHub Bot logged work on HDFS-16598:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:55
Start Date: 05/Jun/22 09:55
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4366:
URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146775645

   @MingXiangLi @Hexiaoqiao Can you catch the root cause of this bug? Need 
further explanation?




Issue Time Tracking
---

Worklog Id: (was: 778441)
Time Spent: 1.5h  (was: 1h 20m)

> All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
> --
>
> Key: HDFS-16598
> URL: https://issues.apache.org/jira/browse/HDFS-16598
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778440
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:51
Start Date: 05/Jun/22 09:51
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146775213

   Thanks @MingXiangLi for your review. 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534) means a lot to 
me and I learned a lot from it.




Issue Time Tracking
---

Worklog Id: (was: 778440)
Time Spent: 1.5h  (was: 1h 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778439
 ]

ASF GitHub Bot logged work on HDFS-16598:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:49
Start Date: 05/Jun/22 09:49
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on PR #4366:
URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146774912

   @ZanderXu LGTM. When we acquire volume lock we should invoke 
getReplicaInfo(..) to get volume uuid for replica.So the GS is not necessary 
check for acquire lock stage.The original method will do check GS If GS check 
is necessary.




Issue Time Tracking
---

Worklog Id: (was: 778439)
Time Spent: 1h 20m  (was: 1h 10m)

> All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
> --
>
> Key: HDFS-16598
> URL: https://issues.apache.org/jira/browse/HDFS-16598
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778436
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:27
Start Date: 05/Jun/22 09:27
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146772397

   @ZanderXu  LGTM.The core logic in HDFS-16534 is change block pool write lock 
to read lock and add volume lock for each replica under this block pool.And we 
didn't change this method in HDFS-16534 because it's not a heavy call. So this 
commit is makes sense to me.




Issue Time Tracking
---

Worklog Id: (was: 778436)
Time Spent: 1h 20m  (was: 1h 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server

2022-06-05 Thread Yongpan Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongpan Liu updated HDFS-16620:
---
Attachment: HDFS-16620-001.patch
Status: Patch Available  (was: Open)

> Improve msync performance by a separate RPC server
> --
>
> Key: HDFS-16620
> URL: https://issues.apache.org/jira/browse/HDFS-16620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongpan Liu
>Priority: Minor
> Attachments: HDFS-16620-001.patch
>
>
> ClientProtocol#msync is invoked frequently in clusters where the Observer is 
> enabled. When NameNode is overwhelmed with load, the average response time of 
> msync will increase, which can affect Observer Read efficiency. We can split 
> out a separate RPC server just like HDFS-9311.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server

2022-06-05 Thread Yongpan Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongpan Liu updated HDFS-16620:
---
Description: ClientProtocol#msync is invoked frequently in clusters where 
the Observer is enabled. When NameNode is overwhelmed with load, the average 
response time of msync will increase, which can affect Observer Read 
efficiency. We can split out a separate RPC server just like HDFS-9311.  (was: 
ClientProtocol#msync is invoked frequently in clusters where the Observer is 
enabled. When NameNode is heavily loaded, the average response time of msync 
increases, which can affect Observer Read efficiency. We can split out a 
separate RPC server just like HDFS-9311.)

> Improve msync performance by a separate RPC server
> --
>
> Key: HDFS-16620
> URL: https://issues.apache.org/jira/browse/HDFS-16620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongpan Liu
>Priority: Minor
>
> ClientProtocol#msync is invoked frequently in clusters where the Observer is 
> enabled. When NameNode is overwhelmed with load, the average response time of 
> msync will increase, which can affect Observer Read efficiency. We can split 
> out a separate RPC server just like HDFS-9311.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16620) Improve msync performance by a separate RPC server

2022-06-05 Thread Yongpan Liu (Jira)

Yongpan Liu created HDFS-16620:
--

 Summary: Improve msync performance by a separate RPC server
 Key: HDFS-16620
 URL: https://issues.apache.org/jira/browse/HDFS-16620
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongpan Liu


ClientProtocol#msync is invoked frequently in clusters where the Observer is 
enabled. When NameNode is heavily loaded, the average response time of msync 
increases, which can affect Observer Read efficiency. We can split out a 
separate RPC server just like HDFS-9311.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

2022-06-05 Thread fanshilun (Jira)

fanshilun created HDFS-16619:


 Summary: impove HttpHeaders.Values And HttpHeaders.Names With 
recommended Class
 Key: HDFS-16619
 URL: https://issues.apache.org/jira/browse/HDFS-16619
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.4.0
Reporter: fanshilun
Assignee: fanshilun
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Work logged] (HDFS-16463) Make dirent cross platform compatible

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

[jira] [Work logged] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

[jira] [Created] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

[jira] [Work started] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Updated] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Assigned] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Created] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info

[jira] [Work logged] (HDFS-16601) Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server

[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server

[jira] [Created] (HDFS-16620) Improve msync performance by a separate RPC server

[jira] [Created] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class

34 matches

Site Navigation

Mail list logo

Footer information