[jira] [Updated] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
[ https://issues.apache.org/jira/browse/HDFS-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16622: -- Labels: pull-request-available (was: ) > addRDBI in IncrementalBlockReportManager may remove the block with bigger GS. > - > > Key: HDFS-16622 > URL: https://issues.apache.org/jira/browse/HDFS-16622 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In our production environment, there is a strange missing block, according > to the log, I suspect there is a bug in function > addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250). > Bug code in the for loop: > {code:java} > synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi, > DatanodeStorage storage) { > // Make sure another entry for the same block is first removed. > // There may only be one such entry. > for (PerStorageIBR perStorage : pendingIBRs.values()) { > if (perStorage.remove(rdbi.getBlock()) != null) { > break; > } > } > getPerStorageIBR(storage).put(rdbi); > } > {code} > Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than > the GS of the Block in rdbi. And NN will invalidate the Replicate will small > GS when complete one block. > So If there is only one replicate for one block, there is a possibility of > missingblock because of this wrong logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
[ https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=778501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778501 ] ASF GitHub Bot logged work on HDFS-16622: - Author: ASF GitHub Bot Created on: 06/Jun/22 02:53 Start Date: 06/Jun/22 02:53 Worklog Time Spent: 10m Work Description: ZanderXu opened a new pull request, #4407: URL: https://github.com/apache/hadoop/pull/4407 JIRA: [HDFS-16622](https://issues.apache.org/jira/browse/HDFS-16622). addRDBI in IncrementalBlockReportManager may remove the block with bigger GS. I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250). Bug code in the for loop: synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi, DatanodeStorage storage) { // Make sure another entry for the same block is first removed. // There may only be one such entry. for (PerStorageIBR perStorage : pendingIBRs.values()) { if (perStorage.remove(rdbi.getBlock()) != null) { break; } } getPerStorageIBR(storage).put(rdbi); } Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS when complete one block. Issue Time Tracking --- Worklog Id: (was: 778501) Remaining Estimate: 0h Time Spent: 10m > addRDBI in IncrementalBlockReportManager may remove the block with bigger GS. > - > > Key: HDFS-16622 > URL: https://issues.apache.org/jira/browse/HDFS-16622 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In our production environment, there is a strange missing block, according > to the log, I suspect there is a bug in function > addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250). > Bug code in the for loop: > {code:java} > synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi, > DatanodeStorage storage) { > // Make sure another entry for the same block is first removed. > // There may only be one such entry. > for (PerStorageIBR perStorage : pendingIBRs.values()) { > if (perStorage.remove(rdbi.getBlock()) != null) { > break; > } > } > getPerStorageIBR(storage).put(rdbi); > } > {code} > Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than > the GS of the Block in rdbi. And NN will invalidate the Replicate will small > GS when complete one block. > So If there is only one replicate for one block, there is a possibility of > missingblock because of this wrong logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778486 ] ASF GitHub Bot logged work on HDFS-16621: - Author: ASF GitHub Bot Created on: 06/Jun/22 01:51 Start Date: 06/Jun/22 01:51 Worklog Time Spent: 10m Work Description: jianghuazhu commented on PR #4404: URL: https://github.com/apache/hadoop/pull/4404#issuecomment-1146953721 Here are some failed tests. TestClientProtocolForPipelineRecovery TestReplaceDatanodeFailureReplication TestFileCreation TestExternalStoragePolicySatisfier TestBalancerRPCDelay TestDataNodeUUID TestRedudantBlocks TestSeveralNameNodes TestAddOverReplicatedStripedBlocks It looks like these failed tests have little to do with the code I submitted. Hi @arp7 @ayushtkn @tomscut , can you guys help to review this pr? thank you very much. Issue Time Tracking --- Worklog Id: (was: 778486) Time Spent: 0.5h (was: 20m) > Replace sd.getCurrentDir() with JNStorage#getCurrentDir() > - > > Key: HDFS-16621 > URL: https://issues.apache.org/jira/browse/HDFS-16621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, qjm >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In JNStorage, sd.getCurrentDir() is used in 5~6 places, > It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16463) Make dirent cross platform compatible
[ https://issues.apache.org/jira/browse/HDFS-16463?focusedWorklogId=778484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778484 ] ASF GitHub Bot logged work on HDFS-16463: - Author: ASF GitHub Bot Created on: 06/Jun/22 01:28 Start Date: 06/Jun/22 01:28 Worklog Time Spent: 10m Work Description: goiri commented on code in PR #4370: URL: https://github.com/apache/hadoop/pull/4370#discussion_r889767207 ## hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.cc: ## @@ -0,0 +1,100 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include + +#include "x-platform/c-api/dirent.h" +#include "x-platform/dirent.h" + +#if defined(WIN32) && defined(__cplusplus) Review Comment: There's no cleaner way to do this? ## hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.h: ## @@ -0,0 +1,93 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NATIVE_LIBHDFSPP_LIB_CROSS_PLATFORM_C_API_DIRENT_H +#define NATIVE_LIBHDFSPP_LIB_CROSS_PLATFORM_C_API_DIRENT_H + +/* + * We will use XPlatform's dirent on Windows or when the macro + * USE_X_PLATFORM_DIRENT is defined. + */ +#if defined(WIN32) || defined(USE_X_PLATFORM_DIRENT) + +/* + * We will use extern "C" only on Windows. + */ +#if defined(WIN32) && defined(__cplusplus) +extern "C" { +#endif + +/** + * DIR struct holds the pointer to XPlatform::Dirent instance. Since this will + * be used in C, we can't hold the pointer to XPlatform::Dirent. We're working + * around this by using a void pointer and casting it to XPlatform::Dirent when + * needed in C++. + */ +typedef struct DIR { + void *x_platform_dirent_ptr; +} DIR; + +/** + * dirent struct contains the name of the file/folder while iterating through + * the directory's children. + */ +struct dirent { + char d_name[256]; +}; + +/** + * Opens a directory for iteration. Internally, it instantiates DIR struct for + * the given path. closedir must be called on the returned pointer to DIR struct + * when done. + * + * @param dir_path The path to the directory to iterate through. + * @return A pointer to the DIR struct. + */ +DIR *opendir(const char *dir_path); + +/** + * For iterating through the children of the directory pointed to by the DIR + * struct pointer. + * + * @param dir The pointer to the DIR struct. + * @return A pointer to dirent struct containing the name of the current child + * file/folder. + */ +struct dirent *readdir(DIR *dir); + +/** + * De-allocates the XPlatform::Dirent instance pointed to by the DIR pointer. + * + * @param dir The pointer to DIR struct to close. + * @return 0 if successful. + */ +int closedir(DIR *dir); + +#if defined(WIN32) && defined(__cplusplus) +} +#endif + +#else +/* + * For non-Windows environments, we use the dirent.h header itself. + */ +#include Review Comment: It might be easier to read to have this one first. ## hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/c-api/dirent.cc: ## @@ -0,0 +1,100 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for
[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778482 ] ASF GitHub Bot logged work on HDFS-16595: - Author: ASF GitHub Bot Created on: 06/Jun/22 01:08 Start Date: 06/Jun/22 01:08 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4405: URL: https://github.com/apache/hadoop/pull/4405#issuecomment-1146932233 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 15m 9s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 5 new or modified test files. | _ branch-3.3 Compile Tests _ | | +0 :ok: | mvndep | 14m 21s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 29m 12s | | branch-3.3 passed | | +1 :green_heart: | compile | 4m 30s | | branch-3.3 passed | | +1 :green_heart: | checkstyle | 1m 22s | | branch-3.3 passed | | +1 :green_heart: | mvnsite | 2m 58s | | branch-3.3 passed | | +1 :green_heart: | javadoc | 2m 51s | | branch-3.3 passed | | +1 :green_heart: | spotbugs | 6m 29s | | branch-3.3 passed | | +1 :green_heart: | shadedclient | 30m 57s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 39s | | the patch passed | | +1 :green_heart: | compile | 4m 25s | | the patch passed | | +1 :green_heart: | cc | 4m 25s | | the patch passed | | +1 :green_heart: | javac | 4m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 46s | | the patch passed | | +1 :green_heart: | javadoc | 2m 20s | | the patch passed | | +1 :green_heart: | spotbugs | 6m 54s | | the patch passed | | +1 :green_heart: | shadedclient | 31m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 35s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 229m 42s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 391m 39s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4405 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint bufcompat | | uname | Linux 75034c6e3cbd 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-3.3 / 35f87e3afa1b311d282cbc600ca3fe298093bcc6 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/testReport/ | | Max. process+thread count | 2431 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4405/1/console | | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking
[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
[ https://issues.apache.org/jira/browse/HDFS-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16619: -- Labels: pull-request-available (was: ) > impove HttpHeaders.Values And HttpHeaders.Names With recommended Class > -- > > Key: HDFS-16619 > URL: https://issues.apache.org/jira/browse/HDFS-16619 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > HttpHeaders.Values and HttpHeaders.Names are deprecated, use > HttpHeaderValues and HttpHeaderNames instead. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
[ https://issues.apache.org/jira/browse/HDFS-16619?focusedWorklogId=778481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778481 ] ASF GitHub Bot logged work on HDFS-16619: - Author: ASF GitHub Bot Created on: 06/Jun/22 00:59 Start Date: 06/Jun/22 00:59 Worklog Time Spent: 10m Work Description: slfan1989 opened a new pull request, #4406: URL: https://github.com/apache/hadoop/pull/4406 JIRA: HDFS-16619. impove HttpHeaders.Values And HttpHeaders.Names With recommended Class HttpHeaders.Values and HttpHeaders.Names are deprecated, use HttpHeaderValues and HttpHeaderNames instead. Issue Time Tracking --- Worklog Id: (was: 778481) Remaining Estimate: 0h Time Spent: 10m > impove HttpHeaders.Values And HttpHeaders.Names With recommended Class > -- > > Key: HDFS-16619 > URL: https://issues.apache.org/jira/browse/HDFS-16619 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Fix For: 3.4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > HttpHeaders.Values and HttpHeaders.Names are deprecated, use > HttpHeaderValues and HttpHeaderNames instead. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info
[ https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778480 ] ASF GitHub Bot logged work on HDFS-16618: - Author: ASF GitHub Bot Created on: 06/Jun/22 00:58 Start Date: 06/Jun/22 00:58 Worklog Time Spent: 10m Work Description: tomscut commented on code in PR #4402: URL: https://github.com/apache/hadoop/pull/4402#discussion_r889762386 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java: ## @@ -216,16 +216,20 @@ synchronized void shutdown() { } } - public void submitSyncFileRangeRequest(FsVolumeImpl volume, - final ReplicaOutputStreams streams, final long offset, final long nbytes, - final int flags) { -execute(volume, new Runnable() { - @Override - public void run() { + public void submitSyncFileRangeRequest(FsVolumeImpl volume, final ReplicaOutputStreams streams, + final long offset, final long nbytes, final int flags) { +execute(volume, () -> { + try { +streams.syncFileRangeIfPossible(offset, nbytes, flags); + } catch (NativeIOException e) { try { - streams.syncFileRangeIfPossible(offset, nbytes, flags); -} catch (NativeIOException e) { - LOG.warn("sync_file_range error", e); + LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, Available space: {}, " Review Comment: Hi @virajjasani , please remove this extra space. ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java: ## @@ -216,16 +216,20 @@ synchronized void shutdown() { } } - public void submitSyncFileRangeRequest(FsVolumeImpl volume, - final ReplicaOutputStreams streams, final long offset, final long nbytes, - final int flags) { -execute(volume, new Runnable() { - @Override - public void run() { + public void submitSyncFileRangeRequest(FsVolumeImpl volume, final ReplicaOutputStreams streams, + final long offset, final long nbytes, final int flags) { +execute(volume, () -> { + try { +streams.syncFileRangeIfPossible(offset, nbytes, flags); + } catch (NativeIOException e) { try { - streams.syncFileRangeIfPossible(offset, nbytes, flags); -} catch (NativeIOException e) { - LOG.warn("sync_file_range error", e); + LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, Available space: {}, " Review Comment: ```suggestion LOG.warn("sync_file_range error. Volume: {}, Capacity: {}, Available space: {}, " ``` ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java: ## @@ -216,16 +216,20 @@ synchronized void shutdown() { } } - public void submitSyncFileRangeRequest(FsVolumeImpl volume, - final ReplicaOutputStreams streams, final long offset, final long nbytes, - final int flags) { -execute(volume, new Runnable() { - @Override - public void run() { + public void submitSyncFileRangeRequest(FsVolumeImpl volume, final ReplicaOutputStreams streams, + final long offset, final long nbytes, final int flags) { +execute(volume, () -> { + try { +streams.syncFileRangeIfPossible(offset, nbytes, flags); + } catch (NativeIOException e) { try { - streams.syncFileRangeIfPossible(offset, nbytes, flags); -} catch (NativeIOException e) { - LOG.warn("sync_file_range error", e); + LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, Available space: {}, " + + "File range offset: {}, length: {}, flags: {}", volume, volume.getCapacity(), + volume.getAvailable(), offset, nbytes, flags, e); +} catch (IOException ioe) { + LOG.warn("sync_file_range error. Volume: {} , Capacity: {}, " Review Comment: ```suggestion LOG.warn("sync_file_range error. Volume: {}, Capacity: {}, " ``` Issue Time Tracking --- Worklog Id: (was: 778480) Time Spent: 40m (was: 0.5h) > sync_file_range error should include more volume and file info > -- > > Key: HDFS-16618 > URL: https://issues.apache.org/jira/browse/HDFS-16618 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Having seen multiple sync_file_range errors recently with Bad file > descriptor, it would
[jira] [Updated] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
[ https://issues.apache.org/jira/browse/HDFS-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fanshilun updated HDFS-16619: - Description: HttpHeaders.Values and HttpHeaders.Names are deprecated, use HttpHeaderValues and HttpHeaderNames instead. > impove HttpHeaders.Values And HttpHeaders.Names With recommended Class > -- > > Key: HDFS-16619 > URL: https://issues.apache.org/jira/browse/HDFS-16619 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Fix For: 3.4.0 > > > HttpHeaders.Values and HttpHeaders.Names are deprecated, use > HttpHeaderValues and HttpHeaderNames instead. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778479 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 06/Jun/22 00:36 Start Date: 06/Jun/22 00:36 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146920391 Thank you for your contribution, but I still have some concerns about HDFS-16534. I feel that for a new feature, multiple prs should not be used to fix the problem separately, which makes the code very difficult to read. I recommend creating it under HDFS-15382 A subtask to fix HDFS-16598 and HDFS-16600 together, @ZanderXu @MingXiangLi. Issue Time Tracking --- Worklog Id: (was: 778479) Time Spent: 2h (was: 1h 50m) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778471 ] ASF GitHub Bot logged work on HDFS-16621: - Author: ASF GitHub Bot Created on: 05/Jun/22 23:05 Start Date: 05/Jun/22 23:05 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4404: URL: https://github.com/apache/hadoop/pull/4404#issuecomment-1146899630 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 2s | | trunk passed | | +1 :green_heart: | compile | 1m 40s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 18s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 42s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 28s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 2s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 31s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 388m 59s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4404/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 6s | | The patch does not generate ASF License warnings. | | | | 522m 24s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.TestReplaceDatanodeFailureReplication | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4404/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4404 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a2d9f3b810de 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 088a2a92b484c219f14584ed2ac44e42306b3f1d | | Default Java | Private
[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info
[ https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778466 ] ASF GitHub Bot logged work on HDFS-16618: - Author: ASF GitHub Bot Created on: 05/Jun/22 18:37 Start Date: 05/Jun/22 18:37 Worklog Time Spent: 10m Work Description: virajjasani commented on PR #4402: URL: https://github.com/apache/hadoop/pull/4402#issuecomment-1146862949 @tomscut @jojochuang could you please review this PR? Issue Time Tracking --- Worklog Id: (was: 778466) Time Spent: 0.5h (was: 20m) > sync_file_range error should include more volume and file info > -- > > Key: HDFS-16618 > URL: https://issues.apache.org/jira/browse/HDFS-16618 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Having seen multiple sync_file_range errors recently with Bad file > descriptor, it would be good to include more volume stats as well as file > offset/length info with the error log to get some more insights. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778465 ] ASF GitHub Bot logged work on HDFS-16595: - Author: ASF GitHub Bot Created on: 05/Jun/22 18:36 Start Date: 05/Jun/22 18:36 Worklog Time Spent: 10m Work Description: virajjasani commented on PR #4405: URL: https://github.com/apache/hadoop/pull/4405#issuecomment-1146862842 Here is the branch-3.3 backport PR. FYI @tomscut @jojochuang Issue Time Tracking --- Worklog Id: (was: 778465) Time Spent: 3.5h (was: 3h 20m) > Slow peer metrics - add median, mad and upper latency limits > > > Key: HDFS-16595 > URL: https://issues.apache.org/jira/browse/HDFS-16595 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Slow datanode metrics include slow node and it's reporting node details. With > HDFS-16582, we added the aggregate latency that is perceived by the reporting > nodes. > In order to get more insights into how the outlier slownode's latencies > differ from the rest of the nodes, we should also expose median, median > absolute deviation and the calculated upper latency limit details. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?focusedWorklogId=778464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778464 ] ASF GitHub Bot logged work on HDFS-16595: - Author: ASF GitHub Bot Created on: 05/Jun/22 18:35 Start Date: 05/Jun/22 18:35 Worklog Time Spent: 10m Work Description: virajjasani opened a new pull request, #4405: URL: https://github.com/apache/hadoop/pull/4405 Backport PR from (#4357) Reviewed-by: Tao Li Signed-off-by: Wei-Chiu Chuang Issue Time Tracking --- Worklog Id: (was: 778464) Time Spent: 3h 20m (was: 3h 10m) > Slow peer metrics - add median, mad and upper latency limits > > > Key: HDFS-16595 > URL: https://issues.apache.org/jira/browse/HDFS-16595 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Slow datanode metrics include slow node and it's reporting node details. With > HDFS-16582, we added the aggregate latency that is perceived by the reporting > nodes. > In order to get more insights into how the outlier slownode's latencies > differ from the rest of the nodes, we should also expose median, median > absolute deviation and the calculated upper latency limit details. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
ZanderXu created HDFS-16622: --- Summary: addRDBI in IncrementalBlockReportManager may remove the block with bigger GS. Key: HDFS-16622 URL: https://issues.apache.org/jira/browse/HDFS-16622 Project: Hadoop HDFS Issue Type: Bug Reporter: ZanderXu Assignee: ZanderXu In our production environment, there is a strange missing block, according to the log, I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250). Bug code in the for loop: {code:java} synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi, DatanodeStorage storage) { // Make sure another entry for the same block is first removed. // There may only be one such entry. for (PerStorageIBR perStorage : pendingIBRs.values()) { if (perStorage.remove(rdbi.getBlock()) != null) { break; } } getPerStorageIBR(storage).put(rdbi); } {code} Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS when complete one block. So If there is only one replicate for one block, there is a possibility of missingblock because of this wrong logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16621 started by JiangHua Zhu. --- > Replace sd.getCurrentDir() with JNStorage#getCurrentDir() > - > > Key: HDFS-16621 > URL: https://issues.apache.org/jira/browse/HDFS-16621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, qjm >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In JNStorage, sd.getCurrentDir() is used in 5~6 places, > It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?focusedWorklogId=778453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778453 ] ASF GitHub Bot logged work on HDFS-16621: - Author: ASF GitHub Bot Created on: 05/Jun/22 14:21 Start Date: 05/Jun/22 14:21 Worklog Time Spent: 10m Work Description: jianghuazhu opened a new pull request, #4404: URL: https://github.com/apache/hadoop/pull/4404 ### Description of PR In JNStorage, sd.getCurrentDir() is used in 5~6 places, It can be replaced with JNStorage#getCurrentDir(), which will be more concise. Details: HDFS-16621 ### How was this patch tested? For testing, there is not much pressure. Issue Time Tracking --- Worklog Id: (was: 778453) Remaining Estimate: 0h Time Spent: 10m > Replace sd.getCurrentDir() with JNStorage#getCurrentDir() > - > > Key: HDFS-16621 > URL: https://issues.apache.org/jira/browse/HDFS-16621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, qjm >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In JNStorage, sd.getCurrentDir() is used in 5~6 places, > It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16621: -- Labels: pull-request-available (was: ) > Replace sd.getCurrentDir() with JNStorage#getCurrentDir() > - > > Key: HDFS-16621 > URL: https://issues.apache.org/jira/browse/HDFS-16621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, qjm >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In JNStorage, sd.getCurrentDir() is used in 5~6 places, > It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
[ https://issues.apache.org/jira/browse/HDFS-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu reassigned HDFS-16621: --- Assignee: JiangHua Zhu > Replace sd.getCurrentDir() with JNStorage#getCurrentDir() > - > > Key: HDFS-16621 > URL: https://issues.apache.org/jira/browse/HDFS-16621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, qjm >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > > In JNStorage, sd.getCurrentDir() is used in 5~6 places, > It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16621) Replace sd.getCurrentDir() with JNStorage#getCurrentDir()
JiangHua Zhu created HDFS-16621: --- Summary: Replace sd.getCurrentDir() with JNStorage#getCurrentDir() Key: HDFS-16621 URL: https://issues.apache.org/jira/browse/HDFS-16621 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node, qjm Affects Versions: 3.3.0 Reporter: JiangHua Zhu In JNStorage, sd.getCurrentDir() is used in 5~6 places, It can be replaced with JNStorage#getCurrentDir(), which will be more concise. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...
[ https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778450 ] ASF GitHub Bot logged work on HDFS-16598: - Author: ASF GitHub Bot Created on: 05/Jun/22 13:02 Start Date: 05/Jun/22 13:02 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4366: URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146800739 Thanks @slfan1989 for your comment. At present, I can only confirm that createRBW needs to use getReplicaInfo(String bpid, long blkid), but whether it is reasonable to use getReplicaInfo(String bpid, long blkid) in other places, I need further confirmation. Issue Time Tracking --- Worklog Id: (was: 778450) Time Spent: 1h 50m (was: 1h 40m) > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > -- > > Key: HDFS-16598 > URL: https://issues.apache.org/jira/browse/HDFS-16598 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the > stack like: > {code:java} > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > After tracing the root cause, this bug was introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the > block GS of client may be smaller than DN when pipeline recovery failed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778448 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 05/Jun/22 12:58 Start Date: 05/Jun/22 12:58 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146800123 @slfan1989 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction sometimes succeed? I have run it several times locally and all fail. > How do you judge the occurrence of DeadLock? Deadlock is trigged when evictLazyPersistBlocks is required in createRbw. Because createRbw hold BLOCK_POOL read lock, but evictLazyPersistBlocks try to hold BLOCK_POOL write lock. Issue Time Tracking --- Worklog Id: (was: 778448) Time Spent: 1h 50m (was: 1h 40m) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778447 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 05/Jun/22 12:47 Start Date: 05/Jun/22 12:47 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146798609 @ZanderXu I found that some tests of Junit Test of HDFS (DN) sometimes succeed and sometimes fail, but there is no way to judge whether it is related to DeadLock. How do you judge the occurrence of DeadLock? @MingXiangLi HDFS-16534 is a very big change, which will greatly help the performance improvement of DN, but ZanderXu has already proposed 2 Jiras for this change. Can you help to re-examine this HDFS-16534, if it is separate each time The commit fixes pr, worried that it will bring more problems. Issue Time Tracking --- Worklog Id: (was: 778447) Time Spent: 1h 40m (was: 1.5h) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...
[ https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778446 ] ASF GitHub Bot logged work on HDFS-16598: - Author: ASF GitHub Bot Created on: 05/Jun/22 12:37 Start Date: 05/Jun/22 12:37 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4366: URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146797019 > @MingXiangLi Thanks for you review. @ZanderXu @MingXiangLi I would like to ask a question, after reading your discussion, is it possible that block GS of client may be smaller than DN appears in all places where getReplicaInfo(String bpid, long blkid) is called? Issue Time Tracking --- Worklog Id: (was: 778446) Time Spent: 1h 40m (was: 1.5h) > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > -- > > Key: HDFS-16598 > URL: https://issues.apache.org/jira/browse/HDFS-16598 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the > stack like: > {code:java} > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > After tracing the root cause, this bug was introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the > block GS of client may be smaller than DN when pipeline recovery failed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16618) sync_file_range error should include more volume and file info
[ https://issues.apache.org/jira/browse/HDFS-16618?focusedWorklogId=778444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778444 ] ASF GitHub Bot logged work on HDFS-16618: - Author: ASF GitHub Bot Created on: 05/Jun/22 12:00 Start Date: 05/Jun/22 12:00 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4402: URL: https://github.com/apache/hadoop/pull/4402#issuecomment-1146791706 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 38s | | trunk passed | | +1 :green_heart: | compile | 1m 42s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 32s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 18s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 51s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 23s | | the patch passed | | +1 :green_heart: | compile | 1m 30s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 30s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 1s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 36s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 346m 13s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 462m 3s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4402/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4402 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 7ecdb3722981 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 75cf43a071c65210c74cd77b2675840419c2ff41 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4402/1/testReport/ | | Max. process+thread count | 2365 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Work logged] (HDFS-16601) Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try
[ https://issues.apache.org/jira/browse/HDFS-16601?focusedWorklogId=778442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778442 ] ASF GitHub Bot logged work on HDFS-16601: - Author: ASF GitHub Bot Created on: 05/Jun/22 09:58 Start Date: 05/Jun/22 09:58 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4369: URL: https://github.com/apache/hadoop/pull/4369#issuecomment-1146776005 @Hexiaoqiao @MingXiangLi can you help me review this patch? thanks~ Issue Time Tracking --- Worklog Id: (was: 778442) Time Spent: 0.5h (was: 20m) > Failed to replace a bad datanode on the existing pipeline due to no more good > datanodes being available to try > -- > > Key: HDFS-16601 > URL: https://issues.apache.org/jira/browse/HDFS-16601 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In our production environment, we found a bug and stack like: > {code:java} > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK], > > DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK]], > > original=[DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK], > > DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1418) > at > org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1478) > at > org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1704) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1605) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > And the root cause is that DFSClient cannot perceive the exception of > TransferBlock during PipelineRecovery. If failed during TransferBlock, the > DFSClient will retry all datanodes in the cluster and then failed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...
[ https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778441 ] ASF GitHub Bot logged work on HDFS-16598: - Author: ASF GitHub Bot Created on: 05/Jun/22 09:55 Start Date: 05/Jun/22 09:55 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4366: URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146775645 @MingXiangLi @Hexiaoqiao Can you catch the root cause of this bug? Need further explanation? Issue Time Tracking --- Worklog Id: (was: 778441) Time Spent: 1.5h (was: 1h 20m) > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > -- > > Key: HDFS-16598 > URL: https://issues.apache.org/jira/browse/HDFS-16598 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the > stack like: > {code:java} > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > After tracing the root cause, this bug was introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the > block GS of client may be smaller than DN when pipeline recovery failed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778440 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 05/Jun/22 09:51 Start Date: 05/Jun/22 09:51 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146775213 Thanks @MingXiangLi for your review. [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534) means a lot to me and I learned a lot from it. Issue Time Tracking --- Worklog Id: (was: 778440) Time Spent: 1.5h (was: 1h 20m) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...
[ https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=778439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778439 ] ASF GitHub Bot logged work on HDFS-16598: - Author: ASF GitHub Bot Created on: 05/Jun/22 09:49 Start Date: 05/Jun/22 09:49 Worklog Time Spent: 10m Work Description: MingXiangLi commented on PR #4366: URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1146774912 @ZanderXu LGTM. When we acquire volume lock we should invoke getReplicaInfo(..) to get volume uuid for replica.So the GS is not necessary check for acquire lock stage.The original method will do check GS If GS check is necessary. Issue Time Tracking --- Worklog Id: (was: 778439) Time Spent: 1h 20m (was: 1h 10m) > All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > -- > > Key: HDFS-16598 > URL: https://issues.apache.org/jira/browse/HDFS-16598 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the > stack like: > {code:java} > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > After tracing the root cause, this bug was introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the > block GS of client may be smaller than DN when pipeline recovery failed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778436 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 05/Jun/22 09:27 Start Date: 05/Jun/22 09:27 Worklog Time Spent: 10m Work Description: MingXiangLi commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146772397 @ZanderXu LGTM.The core logic in HDFS-16534 is change block pool write lock to read lock and add volume lock for each replica under this block pool.And we didn't change this method in HDFS-16534 because it's not a heavy call. So this commit is makes sense to me. Issue Time Tracking --- Worklog Id: (was: 778436) Time Spent: 1h 20m (was: 1h 10m) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server
[ https://issues.apache.org/jira/browse/HDFS-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongpan Liu updated HDFS-16620: --- Attachment: HDFS-16620-001.patch Status: Patch Available (was: Open) > Improve msync performance by a separate RPC server > -- > > Key: HDFS-16620 > URL: https://issues.apache.org/jira/browse/HDFS-16620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongpan Liu >Priority: Minor > Attachments: HDFS-16620-001.patch > > > ClientProtocol#msync is invoked frequently in clusters where the Observer is > enabled. When NameNode is overwhelmed with load, the average response time of > msync will increase, which can affect Observer Read efficiency. We can split > out a separate RPC server just like HDFS-9311. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16620) Improve msync performance by a separate RPC server
[ https://issues.apache.org/jira/browse/HDFS-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongpan Liu updated HDFS-16620: --- Description: ClientProtocol#msync is invoked frequently in clusters where the Observer is enabled. When NameNode is overwhelmed with load, the average response time of msync will increase, which can affect Observer Read efficiency. We can split out a separate RPC server just like HDFS-9311. (was: ClientProtocol#msync is invoked frequently in clusters where the Observer is enabled. When NameNode is heavily loaded, the average response time of msync increases, which can affect Observer Read efficiency. We can split out a separate RPC server just like HDFS-9311.) > Improve msync performance by a separate RPC server > -- > > Key: HDFS-16620 > URL: https://issues.apache.org/jira/browse/HDFS-16620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongpan Liu >Priority: Minor > > ClientProtocol#msync is invoked frequently in clusters where the Observer is > enabled. When NameNode is overwhelmed with load, the average response time of > msync will increase, which can affect Observer Read efficiency. We can split > out a separate RPC server just like HDFS-9311. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16620) Improve msync performance by a separate RPC server
Yongpan Liu created HDFS-16620: -- Summary: Improve msync performance by a separate RPC server Key: HDFS-16620 URL: https://issues.apache.org/jira/browse/HDFS-16620 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongpan Liu ClientProtocol#msync is invoked frequently in clusters where the Observer is enabled. When NameNode is heavily loaded, the average response time of msync increases, which can affect Observer Read efficiency. We can split out a separate RPC server just like HDFS-9311. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16619) impove HttpHeaders.Values And HttpHeaders.Names With recommended Class
fanshilun created HDFS-16619: Summary: impove HttpHeaders.Values And HttpHeaders.Names With recommended Class Key: HDFS-16619 URL: https://issues.apache.org/jira/browse/HDFS-16619 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.4.0 Reporter: fanshilun Assignee: fanshilun Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org