[ https://issues.apache.org/jira/browse/DRILL-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107186#comment-15107186 ]
Hanifi Gunes commented on DRILL-4284: ------------------------------------- I appreciate the very detailed description above. Any chance you could attach minimal set of files to reproduce this issue? > Complex Data Causing Index Out of Bounds with UNION Type > --------------------------------------------------------- > > Key: DRILL-4284 > URL: https://issues.apache.org/jira/browse/DRILL-4284 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Affects Versions: 1.4.0 > Reporter: John Omernik > > Working with complex json data and the UNION type, has shown a Index out of > Bounds error when trying to read data. Here are the posts from the Drill > User Group: > After getting some pointers on the new experimental Union type with json, I > started getting a different error related to index out of bounds, I thought > I'd post here to determine what it could be, and if a bug, I can then open a > JIRA. > So first, I did: > ALTER SESSION SET `exec.errors.verbose` = true; -- So I could get full > errors > ALTER SESSION SET `exec.enable_union_type` = true; -- So I could use the > experimental UNION type > Now, my first query, select * from `/data/prod/src/` gave me the errors > below. The files change, and ironically, if I select directly from any > specific file (even the ones in the error) often times the query works fine. > It's going through a directory of files that cause the error. Sometimes I Can > do multiple files, but often times, but I come to one file, and it seems to > break it. The file that breaks things doesn't look different from others, > but at the same time, I can select directly from the file, and it works... > weird. Let know if I can do anything to help troubleshoot more. > Data Notes (see example below): > - The ... represents LOTs of other fields, some simple, some complex/nested. > This data is NOT Pretty. > - The files are goofy in that each file has one top level field of "count" > then a huge array of events > - The field that is ALWAYS (as far as I've seen) is the "features" field > - This field will sometimes be an array and sometimes be an empty object. {}. > > - The size of the array for the features field (when not an empty object) > does change from event to event. (My hunch is an issue there) > - This occurs even if I don't reference the features field, say I am trying > to flatten a different field at the same level as features. > Error: > Error: DATA_READ ERROR: index: 0, length: 4 (expected: range(0, 0)) > > File /data/prod/src/file1.json > Record 1 > Line 193 > Column 34 > Field feature > Fragment 0:0 > > [Error Id: 25a2c963-86db-40e9-b5cc-2674887de2fe on node7:31010] > > (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: > range(0, 0)) > io.netty.buffer.DrillBuf.checkIndexD():175 > io.netty.buffer.DrillBuf.chk():197 > io.netty.buffer.DrillBuf.getInt():477 > org.apache.drill.exec.vector.UInt4Vector$Accessor.get():356 > > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():305 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > > org.apache.drill.exec.vector.complex.impl.AbstractPromotableFieldWriter.startList():126 > org.apache.drill.exec.vector.complex.impl.PromotableWriter.startList():42 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():461 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():470 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():240 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():178 > org.apache.drill.exec.vector.complex.fn.JsonReader.write():144 > org.apache.drill.exec.store.easy.json.JSONRecordReader.next():191 > org.apache.drill.exec.physical.impl.ScanBatch.next():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > Example Data: > { > "count": 241, > "events": [ > { > ... > ... > ... > "features": [ > { > "count": 3, > "name": "feature1" > }, > { > "count": 30, > "name": "feature2" > }, > { > "count": 2, > "name": "feature3" > }, > { > "count": 3, > "name": "feature4" > } > ], > ... > ... > }, > { > ... > ... > ... > "features": {}, > ... > }, > { > ... > ... > ... > "features": [ > { > "count": 3, > "name": "feature1" > }, > { > "count": 30, > "name": "feature2" > }, > { > "count": 2, > "name": "feature3" > } > ], > ... > ... > } > ] > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)