[ 
https://issues.apache.org/jira/browse/DRILL-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107186#comment-15107186
 ] 

Hanifi Gunes commented on DRILL-4284:
-------------------------------------

I appreciate the very detailed description above. Any chance you could attach 
minimal set of files to reproduce this issue?

> Complex Data Causing Index Out of Bounds with UNION Type 
> ---------------------------------------------------------
>
>                 Key: DRILL-4284
>                 URL: https://issues.apache.org/jira/browse/DRILL-4284
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.4.0
>            Reporter: John Omernik
>
> Working with complex json data and the UNION type, has shown a Index out of 
> Bounds error when trying to read data.  Here are the posts from the Drill 
> User Group:
> After getting some pointers on the new experimental Union type with json, I 
> started getting a different error related to index out of bounds, I thought 
> I'd post here to determine what it could be, and if a bug, I can then open a 
> JIRA. 
> So first, I did:
> ALTER SESSION SET `exec.errors.verbose` = true;  -- So I could get full 
> errors 
> ALTER SESSION SET `exec.enable_union_type` = true; -- So I could use the 
> experimental UNION type 
> Now, my first query, select * from `/data/prod/src/`  gave me the errors 
> below.  The files change, and ironically, if I select directly from any 
> specific file (even the ones in the error) often times the query works fine.  
> It's going through a directory of files that cause the error. Sometimes I Can 
> do multiple files, but often times, but I come to one file, and it seems to 
> break it.  The file that breaks things doesn't look different from others, 
> but at the same time, I can select directly from the file, and it works... 
> weird.  Let know if I can do anything to help troubleshoot more. 
> Data Notes (see example below): 
> - The ... represents LOTs of other fields, some simple, some complex/nested. 
> This data is NOT Pretty. 
> - The files are goofy in that each file has one top level field of "count" 
> then a huge array of events
> - The field that is ALWAYS (as far as I've seen) is the "features" field
> - This field will sometimes be an array and sometimes be an empty object. {}. 
>  
> - The size of the array for the features field (when not an empty object) 
> does change from event to event.  (My hunch is an issue there)
> - This occurs even if I don't reference the features field, say I am trying 
> to flatten a different field at the same level as features. 
> Error:
> Error: DATA_READ ERROR: index: 0, length: 4 (expected: range(0, 0))
>  
> File  /data/prod/src/file1.json
> Record  1
> Line  193
> Column  34
> Field  feature
> Fragment 0:0
>  
> [Error Id: 25a2c963-86db-40e9-b5cc-2674887de2fe on node7:31010]
>  
>   (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: 
> range(0, 0))
>     io.netty.buffer.DrillBuf.checkIndexD():175
>     io.netty.buffer.DrillBuf.chk():197
>     io.netty.buffer.DrillBuf.getInt():477
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():356
>     
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():305
>     org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
>     
> org.apache.drill.exec.vector.complex.impl.AbstractPromotableFieldWriter.startList():126
>     org.apache.drill.exec.vector.complex.impl.PromotableWriter.startList():42
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():461
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():470
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():240
>     org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():178
>     org.apache.drill.exec.vector.complex.fn.JsonReader.write():144
>     org.apache.drill.exec.store.easy.json.JSONRecordReader.next():191
>     org.apache.drill.exec.physical.impl.ScanBatch.next():191
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> Example Data:
> {
>   "count": 241,
>   "events": [
>     {
>                 ...
>                 ...
>                 ...
>                 "features": [
>         {
>           "count": 3,
>           "name": "feature1"
>         },
>         {
>           "count": 30,
>           "name": "feature2"
>         },
>         {
>           "count": 2,
>           "name": "feature3"
>         },
>         {
>           "count": 3,
>           "name": "feature4"
>         }
>       ],
>                 ...
>                 ...
>     },
>    {
>    ...
>    ...
>    ...
>    "features": {},
>    ...
>    },
>     {
>                 ...
>                 ...
>                 ...
>                 "features": [
>         {
>           "count": 3,
>           "name": "feature1"
>         },
>         {
>           "count": 30,
>           "name": "feature2"
>         },
>         {
>           "count": 2,
>           "name": "feature3"
>        }
>       ],
>                 ...
>                 ...
>     }
> ]
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to