[ 
https://issues.apache.org/jira/browse/DRILL-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-1963:
-------------------------------------
    Attachment: morecolumns2.parquet

Could not attach the json file here as it is 19.23MB. 

Below is a vague description of the shape of the json/parquet file
{code}
{
    "depth":1,
    "noOfScalarTypes":5000,
    "repeatedScalarTypesInfo": 
[100,100,100,100,100,100,100,100,100,100,100,100],
    "noOfComplexTypes":1,
    "repeatedComplexTypesInfo": []
}
{
    "depth":2,
    "noOfScalarTypes":1000,
    "repeatedScalarTypesInfo": [10, 5, 7, 4, 7, 25, 5, 10, 73, 5, 15, 20],
    "noOfComplexTypes":0,
    "repeatedComplexTypesInfo": [1]
}
{
    "depth":3,
    "noOfScalarTypes":100,
    "repeatedScalarTypesInfo": [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
    "noOfComplexTypes":0,
    "repeatedComplexTypesInfo": []
}
{code}

> select * on a parquet file created from json with large no of columns fail
> --------------------------------------------------------------------------
>
>                 Key: DRILL-1963
>                 URL: https://issues.apache.org/jira/browse/DRILL-1963
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet, Storage - Writer
>            Reporter: Rahul Challapalli
>            Assignee: Parth Chandra
>         Attachments: morecolumns2.parquet
>
>
> git.commit.id.abbrev=b491cdb
> The datafile (parquet) is create by drill from a json file. The below failing 
> query works fine on the json file. Attached both the json and drill generated 
> parquet files
> {code}
> 0: jdbc:drill:schema=dfs.wide-columns> select * from morecolumns2;
> -- drill prints all the columns....just ignoring them because of space
> java.lang.IndexOutOfBoundsException: index: 0, length: 8 (expected: range(0, 
> 0))
>       at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:156)
>       at io.netty.buffer.DrillBuf.chk(DrillBuf.java:178)
>       at io.netty.buffer.DrillBuf.getLong(DrillBuf.java:420)
>       at 
> org.apache.drill.exec.vector.BigIntVector$Accessor.get(BigIntVector.java:297)
>       at 
> org.apache.drill.exec.vector.BigIntVector$Accessor.getObject(BigIntVector.java:302)
>       at 
> org.apache.drill.exec.vector.RepeatedBigIntVector$Accessor.getObject(RepeatedBigIntVector.java:332)
>       at 
> org.apache.drill.exec.vector.RepeatedBigIntVector$Accessor.getObject(RepeatedBigIntVector.java:308)
>       at 
> org.apache.drill.exec.vector.accessor.GenericAccessor.getObject(GenericAccessor.java:38)
>       at 
> org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136)
>       at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>       at sqlline.SqlLine$Rows$Row.<init>(SqlLine.java:2388)
>       at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2504)
>       at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>       at sqlline.SqlLine.print(SqlLine.java:1809)
>       at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>       at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>       at sqlline.SqlLine.dispatch(SqlLine.java:889)
>       at sqlline.SqlLine.begin(SqlLine.java:763)
>       at sqlline.SqlLine.start(SqlLine.java:498)
>       at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> Error from the logs :
> {code}
> 2015-01-08 22:06:37,780 [2b5100a6-c361-8633-5ede-5ae4968b35d0:frag:0:0] WARN  
> o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send complete.
> java.lang.InterruptedException: null
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:996)
>  ~[na:1.7.0_71]
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>  ~[na:1.7.0_71]
>       at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) 
> ~[na:1.7.0_71]
>       at 
> org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop(ScreenCreator.java:186)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:144)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:119)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
>  2015-01-08 22:06:37,780 [2b5100a6-c361-8633-5ede-5ae4968b35d0:frag:0:0] INFO 
>  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED 
> state as query is already at CANCELED state (which is terminal).
> {code}
> The below query also results in a similar exception
> {code}
> select Obj0_level1 from morecolumns2;
> {code}
> However the below queries succeed :
> {code}
> select t.Obj0_level1.STRING960_level2 from morecolumns2 t;
> select t.Obj0_level1.Array_Object0_level2[0] from morecolumns2 t;
> select t.Obj0_level1.Array_Object0_level2[0].TINYINT22_level3 from 
> morecolumns2 t;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to