[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-12-29 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466708#comment-17466708
 ] 

Yuan Mei commented on FLINK-20044:
--

This task seems a duplication of FLINK-5463 

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-11-01 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436637#comment-17436637
 ] 

Yu Li commented on FLINK-20044:
---

Thanks for the quick response [~wind_ljy]. Sure, let's keep watching.

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Minor
>  Labels: auto-deprioritized-major, stale-minor
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-11-01 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436634#comment-17436634
 ] 

Jiayi Liao commented on FLINK-20044:


[~liyu] We haven't upgrade our Flink version recently. But I think the problem 
is still valid after reviewing the codes on the latest branch. How about we 
keep watch the issue, and see if there is any feedback from other users? 

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Minor
>  Labels: auto-deprioritized-major, stale-minor
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-10-31 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436616#comment-17436616
 ] 

Yu Li commented on FLINK-20044:
---

[~wind_ljy] are we still observing the same issue in product environment? It's 
a little bit stale but we will keep watching it if the later releases still 
have the issue. Thanks.

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Minor
>  Labels: auto-deprioritized-major, stale-minor
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336038#comment-17336038
 ] 

Flink Jira Bot commented on FLINK-20044:


This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
>  Labels: stale-major
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327556#comment-17327556
 ] 

Flink Jira Bot commented on FLINK-20044:


This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
>  Labels: stale-major
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2020-11-09 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228504#comment-17228504
 ] 

Yu Li commented on FLINK-20044:
---

I could see you also reopened FLINK-5463 and from the description these two 
JIRAs are reporting the same issue. I'm linking these two together and I 
believe we should mark one as duplicate of the other once confirmed.

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2020-11-08 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228315#comment-17228315
 ] 

Jiayi Liao commented on FLINK-20044:


[~sewen] I guess this is not a cancellation situation here because I didn't see 
any cancellation logs on TaskExecutor from the context. I think the task throws 
an exception and in the try-finally code block in {{StreamTask}}, the thread 
hangs on {{disposeAllOperators}}. And we also cannot observe the exception 
because the exception is printed on {{Task}} , which is executed after the 
try-finally code block in {{StreamTask}}.

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20044) Disposal of RocksDB could last forever

2020-11-08 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228297#comment-17228297
 ] 

Stephan Ewen commented on FLINK-20044:
--

[~wind_ljy] The TaskManagers should kill the process after some time if the 
cancellation does not succeed. Is that not happening here?

> Disposal of RocksDB could last forever
> --
>
> Key: FLINK-20044
> URL: https://issues.apache.org/jira/browse/FLINK-20044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
>
> The task cannot fail itself because it's stuck on the disposal of RocksDB, 
> which also affects the job. I saw this for several times in recent months, 
> most of the errors come from the broken disk. But I think we should also do 
> something to deal with it more elegantly from Flink's perspective.
> {code:java}
> "LookUp_Join -> Sink_Unnamed (898/1777)- execution # 4" #411 prio=5 os_prio=0 
> tid=0x7fc9b0286800 nid=0xff6fc runnable [0x7fc966cfc000]
>java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.disposeInternal(Native Method)
> at org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
> at 
> org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:57)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:349)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:371)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:124)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:618)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:517)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:733)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)