[ 
https://issues.apache.org/jira/browse/FLINK-18815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190276#comment-17190276
 ] 

Kezhu Wang commented on FLINK-18815:
------------------------------------

Seems that recently two cases are leaking while previous cases are duplicated 
closing. I think these two cases are caused by 
{{SafetyNetCloseableRegistry.close}} which interrupt reaper thread. Suppose 
that:
 1. A closeable became phantom reachable and queued in 
{{CloseableReaperThread.referenceQueue}} but did not get a chance to close.
 2. {{SafetyNetCloseableRegistry.close}} calls 
{{CloseableReaperThread.interrupt}} which set {{CloseableReaperThread.running}} 
to false and interrupt that java thread.
 3. {{CloseableReaperThread}} terminates due to false 
{{CloseableReaperThread.running}} or {{InterruptedException}}.
 4. That enqueued closeable leaks.

I think there are two different approaches to fix this issue:
 * Use at most one {{CloseableReaperThread}}, and don't close it. This may 
cause leaking if Flink is embedded as guest in other host application.
 * Count registered phantom references, and close reaper thread only if all 
registered phantom references are popped and {{CloseableReaperThread}} is 
dropped by caller.

Since Flink is not an end stop application, I think the counting approach maybe 
more appropriate ?

As a analogy, {{java.lang.ref.Cleaner}} has no close-like method, it [tracks 
all registered 
referents|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java#L65],
 its underlying thread will terminate after 
[itself|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java#L101]
 and [all registered 
references|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java#L133]
 are cleaned.

[~kevin.cyj] [~dian.fu] [~trohrmann] Any thoughts ?

> AbstractCloseableRegistryTest.testClose unstable
> ------------------------------------------------
>
>                 Key: FLINK-18815
>                 URL: https://issues.apache.org/jira/browse/FLINK-18815
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems, Tests
>    Affects Versions: 1.10.1, 1.12.0, 1.11.1
>            Reporter: Robert Metzger
>            Assignee: Kezhu Wang
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.10.2, 1.12.0, 1.11.2
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5164&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=05b74a19-4ee4-5036-c46f-ada307df6cf0
> {code}
> [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.509 
> s <<< FAILURE! - in org.apache.flink.core.fs.SafetyNetCloseableRegistryTest
> [ERROR] testClose(org.apache.flink.core.fs.SafetyNetCloseableRegistryTest)  
> Time elapsed: 1.15 s  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<-1>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at org.junit.Assert.assertEquals(Assert.java:631)
>       at 
> org.apache.flink.core.fs.AbstractCloseableRegistryTest.testClose(AbstractCloseableRegistryTest.java:93)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to