[ https://issues.apache.org/jira/browse/DRILL-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066939#comment-16066939 ]
ASF GitHub Bot commented on DRILL-5420: --------------------------------------- GitHub user kkhatua opened a pull request: https://github.com/apache/drill/pull/862 DRILL-5420: ParquetAsyncPgReader goes into infinite loop during cleanup PageQueue is cleaned up using poll() instead of take(), which constantly gets interrupted and causes CPU churn. During a columnReader shutdown, a flag is set so as to block any new page reading tasks from being submitted, before the queues are finally cleared and memory occupied by the pages released. More details are in this JIRA comment : https://issues.apache.org/jira/browse/DRILL-5420?focusedCommentId=16066933&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16066933 You can merge this pull request into a Git repository by running: $ git pull https://github.com/kkhatua/drill DRILL-5420 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/862.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #862 ---- commit 2f233d4b1318e29211856877937ef9988c34ffaf Author: Kunal Khatua <kkha...@maprtech.com> Date: 2017-06-28T06:35:34Z DRILL-5420: ParquetAsyncPgReader goes into infinite loop during cleanup PageQueue is cleaned up using poll() instead of take(), which constantly gets interrupted and causes CPU churn. During a columnReader shutdown, a flag is set so as to block any new page reading tasks from being submitted. ---- > all cores at 100% of all servers > -------------------------------- > > Key: DRILL-5420 > URL: https://issues.apache.org/jira/browse/DRILL-5420 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.10.0 > Environment: linux, cluster with 5 servers over hdfs/parquet > Reporter: Hugo Bellomusto > Assignee: Kunal Khatua > Attachments: 2709a36d-804a-261a-64e5-afa271e782f8.json > > > We have a drill cluster with five servers over hdfs/parquet. > Each machine have 8 cores. All cores get at 100% of use. > Each thread is looping in the while in line 314 in AsyncPageReader.java > inside clear() method. > https://github.com/apache/drill/blob/1.10.0/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java#L314 > jstack -l 19255|grep -A 50 $(printf "%x" 29250) > "271d6262-ff19-ad24-af36-777bfe6c6375:frag:1:4" daemon prio=10 > tid=0x00007f5b2adec800 nid=0x7242 runnable [0x00007f5aa33e8000] > java.lang.Thread.State: RUNNABLE > at java.lang.Throwable.fillInStackTrace(Native Method) > at java.lang.Throwable.fillInStackTrace(Throwable.java:783) > - locked <0x00000007374bfcb0> (a java.lang.InterruptedException) > at java.lang.Throwable.<init>(Throwable.java:250) > at java.lang.Exception.<init>(Exception.java:54) > at java.lang.InterruptedException.<init>(InterruptedException.java:57) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:317) > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:140) > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:632) > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232) > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) -- This message was sent by Atlassian JIRA (v6.4.14#64029)