[ https://issues.apache.org/jira/browse/DRILL-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546865#comment-14546865 ]
Jacques Nadeau commented on DRILL-3118: --------------------------------------- You can set drill.exec.storage.file.partition.column.label as a SESSION option and that should override for just your session. Does that work for this usecase or are you having problems with that as well? > "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column > ---------------------------------------------------------------------------- > > Key: DRILL-3118 > URL: https://issues.apache.org/jira/browse/DRILL-3118 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.0.0 > Reporter: Hao Zhu > Assignee: Chris Westin > > Tested on 1.0 with commit id: > {code} > select commit_id from sys.version; > +-------------------------------------------+ > | commit_id | > +-------------------------------------------+ > | d8b19759657698581cc0d01d7038797952888123 | > +-------------------------------------------+ > 1 row selected (0.097 seconds) > {code} > When source data has column name like "dir0","dir1"...., the query may fail > with "java.lang.IndexOutOfBoundsException". > For example: > {code} > > select `dir999` from > > dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`; > Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 > (expected: range(0, 0)) > Fragment 0:0 > [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010] > (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet > record reader. > Message: > Hadoop path: > /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet > Total records read: 0 > Mock records read: 0 > Records to read: 32768 > Row group index: 0 > Records in row group: 1 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { > optional int32 id; > optional binary dir999; > } > , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] > INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] > BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 > java.security.AccessController.doPrivileged():-2 > optional int32 id; > optional binary dir999; > } > , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] > INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] > BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1469 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 > (expected: range(0, 0)) > io.netty.buffer.DrillBuf.checkIndexD():189 > io.netty.buffer.DrillBuf.chk():211 > io.netty.buffer.DrillBuf.getInt():491 > org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321 > org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481 > > org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408 > > org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513 > > org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78 > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1469 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} > My thought: > We need to fix this by > 1. Either prompting a readable message saying "dirN" is a reserved column > names, please change drill.exec.storage.file.partition.column.label to > something else; > 2. Or/And if source data has dirN columns, it should override our reserved > "dirN". > 3. We need to document "drill.exec.storage.file.partition.column.label" in > http://drill.apache.org/docs/querying-directories/ > 4. drill.exec.storage.file.partition.column.label is a system level > configuration, if we use it as a workaround, it will impact the whole system. > Can we make it a session level? -- This message was sent by Atlassian JIRA (v6.3.4#6332)