[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Jian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841650#comment-17841650
 ] 

Jian Zhang commented on HDFS-17503:
---

Can you describe in detail how oom came about?

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841815#comment-17841815
 ] 

Zilong Zhu commented on HDFS-17503:
---

[~Keepromise] It appears to occur when creating the BlockSender object. This is 
an intermittent issue that occurs in our production environment. If I manually 
throw an OOM error while creating the BlockSender object, it can cause volume 
references not to be released.

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842013#comment-17842013
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

zhuzilong2013 opened a new pull request, #6782:
URL: https://github.com/apache/hadoop/pull/6782

   ### Description of PR
   Refer to HDFS-17503.
   When BlockSender throws an error because of OOM,the volume reference 
obtained by the thread is not released,which causes the thread trying to remove 
the volume to wait and fall into an infinite loop.
   I found [HDFS-15963](https://issues.apache.org/jira/browse/HDFS-15963) 
catched exception and release volume reference. But it did not handle the case 
of throwing errors. I think "catch (Throwable t)" should be used instead of 
"catch (IOException ioe)".
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844729#comment-17844729
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

hadoop-yetus commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2101107605

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  94m 08s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 41s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 34s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 159m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 48s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 168m 39s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 452m 05s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6782 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 06e84c9dc0f5 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / b25c35089fe854166b4f7a57171fa21abeb99a5f |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6782/1/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6782/1/console
 |
   | versions | git=2.45.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844895#comment-17844895
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

zhuzilong2013 commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2102119137

   @Hexiaoqiao Hi~ sir. Could you please help me review this PR when you are 
free? Thanks.




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845172#comment-17845172
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

ZanderXu merged PR #6782:
URL: https://github.com/apache/hadoop/pull/6782




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845173#comment-17845173
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

ZanderXu commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2103752898

   Merged. Thanks @zhuzilong2013 for your contribution.




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845179#comment-17845179
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

zhuzilong2013 commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2103787150

   Thanks @ZanderXu for your review and merge~




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org