[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2022-10-13 Thread March Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617378#comment-17617378
 ] 

March Wang commented on HBASE-25720:


Hi [~Xiaolin Ha] ,

Sorry late response, I will do it. Thank you so much!

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2022-10-08 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614598#comment-17614598
 ] 

Xiaolin Ha commented on HBASE-25720:


Hi, [~MarchWang] , I saw the problem you described in HBASE-27413.

The idea here is to abort the RS ASAP when sync WAL failed, even in the stage 
of preparing to flush memstore, while currently only fail when committing the 
flush of memstore will abort the RS. The PR is not accepted, but you can 
backport it, we are using it on our production environment smoothly.

For the sync wal stuck problems, several issues are helpful, I think they can 
solve mostly of your problems, especially HBASE-22301, HBASE-26347, and 
HBASE-25905.

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2022-10-05 Thread jason (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613267#comment-17613267
 ] 

jason commented on HBASE-25720:
---

confused about the status, status is 'Resolved' but resolution is "Won't Fix', 
i check the github commit, it doesn't accepted. interesting. 

 

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2022-10-05 Thread March Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613248#comment-17613248
 ] 

March Wang commented on HBASE-25720:


Hi [~Xiaolin Ha], Could you please let me know how to fix this issue? Thanks!

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2021-07-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378168#comment-17378168
 ] 

Michael Stack commented on HBASE-25720:
---

[~Xiaolin Ha] I ask because I'm looking at a related issue around AsyncFSWAL – 
HBASE-26042. Thanks.

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2021-07-09 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377934#comment-17377934
 ] 

Xiaolin Ha commented on HBASE-25720:


Hi, [~stack], we noticed this problem always after the RS killed itselves, 
sorry there is no jstack now, and we have no more ideas about the reason of WAL 
stuck. But we have made a script monitor for this problem, I'll attach the 
jstack files once get and will dig more about this problem, thanks.

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2021-07-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377532#comment-17377532
 ] 

Michael Stack commented on HBASE-25720:
---

Anything in the log before your png? That shows perhaps how or why the WAL 
system is stuck? A jstack? Thanks [~Xiaolin Ha]

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)