[ https://issues.apache.org/jira/browse/DRILL-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269463#comment-14269463 ]
Jacques Nadeau commented on DRILL-1948: --------------------------------------- [~dragoncurve], nice catch and thanks for the patch. [~parthc], can you see if we can review and get incorporated for 0.8? Adam, I believe the current fork is here: https://github.com/jacques-n/incubator-parquet-mr/tree/parquet-1.5.0-r5. We're trying to get our changes reincorporated upstream so we can get off the fork so hopefully that will happen soon. You can follow DRILL-1410 to see our progress on that. > Reading large parquet files via HDFS fails > ------------------------------------------ > > Key: DRILL-1948 > URL: https://issues.apache.org/jira/browse/DRILL-1948 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 0.7.0 > Environment: Hadoop 2.4.0 on Amazon EMR > Reporter: Adam Gilmore > Assignee: Parth Chandra > Priority: Critical > Fix For: 0.8.0 > > Attachments: DRILL-1948.1.patch.txt > > > There appears to be an issue with reading medium to large Parquet files via > HDFS. We have created a basic Parquet file in with a schema like so: > sellprice DOUBLE > When filled with 10,000 double values, the following query in Drill works > fine: > select sum(sellprice) from hdfs.`/saleparquet`; > When filled with 50,000 double values, the following error occurs: > Query failed: Query stopped.[ 9aece851-48bc-4664-831e-d35bbfbcd1d5 on > ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > The full stack trace is: > 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > java.lang.ArrayIndexOutOfBoundsException: null > 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR > o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error > 88fe95c3-b088-4674-8b65-967a7f4c3cdf: Query stopped. > java.lang.ArrayIndexOutOfBoundsException: null > 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR > o.a.d.e.w.f.AbstractStatusReporter - Error > cd4123e4-7b9d-451d-90f0-3cc1ecf461e4: Failure while running fragment. > java.lang.ArrayIndexOutOfBoundsException: null > 2015-01-07 05:48:57,813 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR > o.a.drill.exec.work.foreman.Foreman - Error > 5db2c65b-cd10-4970-ba2b-f29b51fda923: Query failed: Failure while running > fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on > ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ] > [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on > ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ] > org.apache.drill.exec.rpc.RemoteRpcException: Failure while running > fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on > ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ] > [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on > ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ] > at > org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:93) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:151) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:113) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:109) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:166) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > 2015-01-07 05:48:57,814 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] WARN > o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send complete. > java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301) > ~[na:1.7.0_71] > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) > ~[na:1.7.0_71] > at > org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44) > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop(ScreenCreator.java:186) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:144) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:117) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > If I fill with even more values (e.g. 100,000 or 1,000,000) - I get a variety > of other errors, such as: > "Query failed: Query stopped., don't know what type: 14" > coming from the Parquet engine. > I am able to consistently replicate this in my environment with a basic > Parquet file. I can attach that file if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)