[ 
https://issues.apache.org/jira/browse/DRILL-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325446#comment-14325446
 ] 

Parth Chandra commented on DRILL-1948:
--------------------------------------

Resolved in commit 3076978

> Reading large parquet files via HDFS fails
> ------------------------------------------
>
>                 Key: DRILL-1948
>                 URL: https://issues.apache.org/jira/browse/DRILL-1948
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 0.7.0
>         Environment: Hadoop 2.4.0 on Amazon EMR
>            Reporter: Adam Gilmore
>            Assignee: Parth Chandra
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: DRILL-1948.1.patch.txt, DRILL-1948.2.patch.txt
>
>
> There appears to be an issue with reading medium to large Parquet files via 
> HDFS.  We have created a basic Parquet file in with a schema like so:
> sellprice DOUBLE
> When filled with 10,000 double values, the following query in Drill works 
> fine:
> select sum(sellprice) from hdfs.`/saleparquet`;
> When filled with 50,000 double values, the following error occurs:
> Query failed: Query stopped.[ 9aece851-48bc-4664-831e-d35bbfbcd1d5 on 
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
> The full stack trace is:
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR 
> o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error 
> 88fe95c3-b088-4674-8b65-967a7f4c3cdf: Query stopped.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR 
> o.a.d.e.w.f.AbstractStatusReporter - Error 
> cd4123e4-7b9d-451d-90f0-3cc1ecf461e4: Failure while running fragment.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,813 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> 5db2c65b-cd10-4970-ba2b-f29b51fda923: Query failed: Failure while running 
> fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on 
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on 
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running 
> fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on 
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on 
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
>         at 
> org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:93)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:151)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:113)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:109)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:166)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> 2015-01-07 05:48:57,814 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] WARN  
> o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send complete.
> java.lang.InterruptedException: null
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
>  ~[na:1.7.0_71]
>         at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) 
> ~[na:1.7.0_71]
>         at 
> org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44)
>  ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop(ScreenCreator.java:186)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:144)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:117)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
>  [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> If I fill with even more values (e.g. 100,000 or 1,000,000) - I get a variety 
> of other errors, such as:
> "Query failed: Query stopped., don't know what type: 14"
> coming from the Parquet engine.
> I am able to consistently replicate this in my environment with a basic 
> Parquet file.  I can attach that file if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to