[ https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman updated DRILL-3871: ------------------------------------ Reviewer: Victoria Markman > Off by one error while reading binary fields with one terminal null in parquet > ------------------------------------------------------------------------------ > > Key: DRILL-3871 > URL: https://issues.apache.org/jira/browse/DRILL-3871 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.2.0 > Reporter: Victoria Markman > Assignee: Deneche A. Hakim > Priority: Critical > Labels: int96 > Fix For: 1.3.0 > > Attachments: tables.tar > > > Both tables in the join where created by impala, with column c_timestamp > being parquet int96. > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > max(t1.c_timestamp), > . . . . . . . . . . . . > min(t1.c_timestamp), > . . . . . . . . . . . . > count(t1.c_timestamp) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > imp_t1 t1 > . . . . . . . . . . . . > inner join > . . . . . . . . . . . . > imp_t2 t2 > . . . . . . . . . . . . > on (t1.c_timestamp = t2.c_timestamp) > . . . . . . . . . . . . > ; > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > TProtocolException: Required field 'uncompressed_page_size' was not found in > serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, > compressed_page_size:0) > Fragment 0:0 > [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010] > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > {code} > drillbit.log > {code} > 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses > 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of > 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max. > 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of > 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, > Average start: 1.332000 μs . > 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING > 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> > FAILED > 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> > FINISHED > 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: > Required field 'uncompressed_page_size' was not found in serialized data! > Struct: PageHeader(type:null, uncompressed_page_size:0, > compressed_page_size:0) > Fragment 0:0 > [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > TProtocolException: Required field 'uncompressed_page_size' was not found in > serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, > compressed_page_size:0) > Fragment 0:0 > [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in > parquet record reader. > Message: > Hadoop path: > /drill/testdata/subqueries/imp_t2/bf4261140dac8d45-814d66b86bf960b8_853027779_data.0.parq > Total records read: 10 > Mock records read: 0 > Records to read: 1 > Row group index: 0 > Records in row group: 10 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { > optional binary c_varchar (UTF8); > optional int32 c_integer; > optional int64 c_bigint; > optional float c_float; > optional double c_double; > optional binary c_date (UTF8); > optional binary c_time (UTF8); > optional int96 c_timestamp; > optional boolean c_boolean; > optional double d9; > optional double d18; > optional double d28; > optional double d38; > } > , metadata: {}}, blocks: [BlockMetaData{10, 1507 [ColumnMetaData{SNAPPY > [c_varchar] BINARY [PLAIN, PLAIN_DICTIONARY, RLE], 173}, > ColumnMetaData{SNAPPY [c_integer] INT32 [PLAIN, PLAIN_DICTIONARY, RLE], > 299}, ColumnMetaData{SNAPPY [c_bigint] INT64 [PLAIN, PLAIN_DICTIONARY, RLE], > 453}, ColumnMetaData{SNAPPY [c_float] FLOAT [PLAIN, PLAIN_DICTIONARY, RLE], > 581}, ColumnMetaData{SNAPPY [c_double] DOUBLE [PLAIN, PLAIN_DICTIONARY, > RLE], 747}, ColumnMetaData{SNAPPY [c_date] BINARY [PLAIN, PLAIN_DICTIONARY, > RLE], 900}, ColumnMetaData{SNAPPY [c_time] BINARY [PLAIN, PLAIN_DICTIONARY, > RLE], 1045}, ColumnMetaData{SNAPPY [c_timestamp] INT96 [PLAIN, > PLAIN_DICTIONARY, RLE], 1213}, ColumnMetaData{SNAPPY [c_boolean] BOOLEAN > [PLAIN, PLAIN_DICTIONARY, RLE], 1293}, ColumnMetaData{SNAPPY [d9] DOUBLE > [PLAIN, PLAIN_DICTIONARY, RLE], 1448}, ColumnMetaData{SNAPPY [d18] DOUBLE > [PLAIN, PLAIN_DICTIONARY, RLE], 1609}, ColumnMetaData{SNAPPY [d28] DOUBLE > [PLAIN, PLAIN_DICTIONARY, RLE], 1771}, ColumnMetaData{SNAPPY [d38] DOUBLE > [PLAIN, PLAIN_DICTIONARY, RLE], 1933}]}]} > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:346) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:448) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:403) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:218) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:136) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > ... 4 common frames omitted > Caused by: java.io.IOException: can not read class parquet.format.PageHeader: > Required field 'uncompressed_page_size' was not found in serialized data! > Struct: PageHeader(type:null, uncompressed_page_size:0, > compressed_page_size:0) > at parquet.format.Util.read(Util.java:50) > ~[parquet-format-2.1.1-drill-r1.jar:na] > at parquet.format.Util.readPageHeader(Util.java:26) > ~[parquet-format-2.1.1-drill-r1.jar:na] > at > org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:46) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:191) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:387) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:430) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > ... 43 common frames omitted > Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required > field 'uncompressed_page_size' was not found in serialized data! Struct: > PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0) > at parquet.format.PageHeader.read(PageHeader.java:905) > ~[parquet-format-2.1.1-drill-r1.jar:na] > at parquet.format.Util.read(Util.java:47) > ~[parquet-format-2.1.1-drill-r1.jar:na] > ... 49 common frames omitted > 2015-09-30 21:15:45,951 [BitServer-4] WARN > o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED > state as query is already at FAILED state (which is terminal). > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)