[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771772#comment-15771772 ] Paul Rogers commented on DRILL-5156: Continuing to investigate, it appears that the RPC threads hang around after shutting down the Drillbit if the debugger is stopped at the exception breakpoint. In particular, I ran the full test, with the {{IllegalStateException}} breakpoint. The breakpoint was hit in one tests. That test shuts down its Drillbit at the end. Then, another test started that created a new Drillbit. It seams that the first Drillbit did not wait for the RPC threads to exit; leaving orphaned threads (those stopped in the debugger.) Seems the Drillbit should refuse to exit until all child threads have completed. > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772269#comment-15772269 ] Paul Rogers commented on DRILL-5156: The problem appears to be a bug in {{BootStrapContext}} which creates two thread pools, but does not close them. The two pools are for the "BitClient-n" and "BitServer-n" threads. During close, the {{BootStrapContext.close()}} method closes the allocator but leaves the threads running. Since they are left running, the BitClient thread attempts to use the (now closed) allocator and triggers the {{IllegalStateException}}. This behavior is easy to see by setting the breakpoint described above. Leave the thread stopped at that breakpoint. The rest of the Drillbit shuts down around the suspended thread, showing that the Drillbit did not wait for the thread. The fix is simple: {code} public void close() { try { loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS); } catch ( Exception e ) { logger.warn("Failure During Bit-Client shutdown.", e); } try { loop.shutdownGracefully(0, 0, TimeUnit.SECONDS); } catch ( Exception e ) { logger.warn("Failure During Bit-Server shutdown.", e); } ... {code} After this fix, the test case runs fine with no {{IllegalStateExceptions}}. > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773608#comment-15773608 ] Paul Rogers commented on DRILL-5156: Also seeing a similar problem in {{FragmentContext.close()}} in the unit test {{TestConvertFunctions#testConvertFromConvertToInt}}. This test fails with the Snappy library issue. Then, when tearing down, we get an {{IllegalStateException}} in the {{OperatorContextImpl.close()}} method here: {code} if (allocator != null) { allocator.close(); // Error here } {code} Likely, again, the thread is not being closed properly before the memory allocator is released. > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782140#comment-15782140 ] ASF GitHub Bot commented on DRILL-5156: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/709 DRILL-5156: BootStrapContext should close threads The Bit-Client thread (that's the thread name) finds a closed allocator in TestDrillbitResilience unit test. This fix (along with DRILL-5157) eliminates two run-time problems seen in this unit tests. BootStrapContext creates two thread pools, but does not close them. This allows the code running in the threads to attempt to access their allocators after the allocator is closed. This fix ensures that the thread pools are closed to avoid the issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5156 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #709 commit 73607c33c78dd89373412c8e4c70356fa19f81fd Author: Paul Rogers Date: 2016-12-28T01:45:16Z DRILL-5156: BootStrapContext should close threads Bit-Client thread finds closed allocator in TestDrillbitResilience unit test. This fix (along with DRILL-5157) eliminates two run-time problems seen in this unit tests. BootStrapContext creates two thread pools, but does not close them. This allows the code running in the threads to attempt to access their allocators after the allocator is closed. This fix ensures that the thread pools are closed to avoid the issue. > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795944#comment-15795944 ] ASF GitHub Bot commented on DRILL-5156: --- Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/709 This change may not be that simple, but I could be wrong. I tried to do something similar as part of [PR 429](https://github.com/apache/drill/pull/429/commits/0394f4ca5aed142bb2ba0b192f3588cfda7b). The close happens elsewhere. The "loop" is actually closed as part of [BasicServer#close](https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/BasicServer.java#L218). But it will be closed multiple times (because there may be multiple instances of sub-classes of BasicServer, and all use the same loop), and looks like "loop2" is not closed anywhere. The changes in PR 429 close the loops exactly once, but I do not recollect why the PR is not merged. > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498342#comment-16498342 ] ASF GitHub Bot commented on DRILL-5156: --- ilooner commented on issue #709: DRILL-5156: BootStrapContext should close threads URL: https://github.com/apache/drill/pull/709#issuecomment-393959637 @paul-rogers is this fix still valid? Or can we close this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498388#comment-16498388 ] ASF GitHub Bot commented on DRILL-5156: --- paul-rogers closed pull request #709: DRILL-5156: BootStrapContext should close threads URL: https://github.com/apache/drill/pull/709 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/server/BootStrapContext.java b/exec/java-exec/src/main/java/org/apache/drill/exec/server/BootStrapContext.java index c498185046..dc0a392ba1 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/server/BootStrapContext.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/server/BootStrapContext.java @@ -123,6 +123,16 @@ public ScanResult getClasspathScan() { @Override public void close() { +try { + loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS); +} catch ( Exception e ) { + logger.warn("Failure During Bit-Client shutdown.", e); +} +try { + loop.shutdownGracefully(0, 0, TimeUnit.SECONDS); +} catch ( Exception e ) { + logger.warn("Failure During Bit-Server shutdown.", e); +} try { DrillMetrics.resetMetrics(); } catch (Error | Exception e) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
[ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498387#comment-16498387 ] ASF GitHub Bot commented on DRILL-5156: --- paul-rogers commented on issue #709: DRILL-5156: BootStrapContext should close threads URL: https://github.com/apache/drill/pull/709#issuecomment-393971351 We can close this. The origin was seeing a resource leak. There was debate. There have been other related fixes. If the leak still exists, folks will find it and we can devise a fix based on current code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bit-Client thread finds closed allocator in TestDrillbitResilience unit test > > > Key: DRILL-5156 > URL: https://issues.apache.org/jira/browse/DRILL-5156 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > RPC thread attempts to access a closed allocator during the > {{TestDrillbitResilience}} unit test. > Set a Java exception breakpoint for {{IllegalStateException}}. Run the > {{TestDrillbitResilience}} unit tests. > You will see quite a few exceptions, including the following in a thread > called BitClient-1: > {code} > RootAllocator(BaseAllocator).assertOpen() line 109 > RootAllocator(BaseAllocator).buffer(int) line 191 > DrillByteBufAllocator.buffer(int) line 49 > DrillByteBufAllocator.ioBuffer(int) line 64 > AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104 > NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117 > ... > NioEventLoop.run() line 354 > {code} > The test continues (then fails for some other reason), which is why this is > marked as minor. Still, it seems odd that the client thread should attempt to > access a closed allocator. > At this point, it is not clear how we got into this state. The test itself is > waiting for a response from the server in the {{tailsAfterMSorterSorting}} > test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)