[ https://issues.apache.org/jira/browse/DRILL-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175566#comment-15175566 ]
Miroslav Holubec edited comment on DRILL-4464 at 3/3/16 9:01 AM: ----------------------------------------------------------------- output from MR-tools meta. TS column is causing an issue: {noformat} $ java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar meta tmp.gz.parquet file: file:/tmp/tmp.gz.parquet creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) file schema: nat -------------------------------------------------------------------------------- ts: REQUIRED INT64 R:0 D:0 dr: REQUIRED INT32 R:0 D:0 ui: OPTIONAL BINARY O:UTF8 R:0 D:1 up: OPTIONAL INT32 R:0 D:1 ri: OPTIONAL BINARY O:UTF8 R:0 D:1 rp: OPTIONAL INT32 R:0 D:1 di: OPTIONAL BINARY O:UTF8 R:0 D:1 dp: OPTIONAL INT32 R:0 D:1 pr: REQUIRED INT32 R:0 D:0 ob: OPTIONAL INT64 R:0 D:1 ib: OPTIONAL INT64 R:0 D:1 row group 1: RC:2418197 TS:30601003 OFFSET:4 -------------------------------------------------------------------------------- ts: INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY dr: INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ui: BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY up: INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ri: BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY rp: INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY di: BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE dp: INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE pr: INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ob: INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ib: INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY {noformat} was (Author: myroch): output from MR-tools meta. TS column is causing an issue: {noformat} java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar meta tmp.gz.parquet file: file:/C:/smaz/tmp.gz.parquet creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) file schema: nat -------------------------------------------------------------------------------- ts: REQUIRED INT64 R:0 D:0 dr: REQUIRED INT32 R:0 D:0 ui: OPTIONAL BINARY O:UTF8 R:0 D:1 up: OPTIONAL INT32 R:0 D:1 ri: OPTIONAL BINARY O:UTF8 R:0 D:1 rp: OPTIONAL INT32 R:0 D:1 di: OPTIONAL BINARY O:UTF8 R:0 D:1 dp: OPTIONAL INT32 R:0 D:1 pr: REQUIRED INT32 R:0 D:0 ob: OPTIONAL INT64 R:0 D:1 ib: OPTIONAL INT64 R:0 D:1 row group 1: RC:2418197 TS:30601003 OFFSET:4 -------------------------------------------------------------------------------- ts: INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY dr: INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ui: BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY up: INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ri: BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY rp: INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY di: BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE dp: INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE pr: INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ob: INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ib: INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY {noformat} > Apache Drill cannot read parquet generated outside Drill: Reading past > RLE/BitPacking stream > -------------------------------------------------------------------------------------------- > > Key: DRILL-4464 > URL: https://issues.apache.org/jira/browse/DRILL-4464 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.4.0, 1.5.0 > Reporter: Miroslav Holubec > Attachments: tmp.gz.parquet > > > When I generate file using MapReduce and parquet 1.8.1 (or 1.8.1-drill-r0), > which contains REQUIRED INT64 field, I'm not able to read this column in > drill, but I'm able to read full content using parquet-tools cat/dump. This > doesn't happened every time, it is input data dependant (so probably > different encoding is chosen by parquet for given column?). > Error reported by drill: > {noformat} > 2016-03-02 03:01:16,354 [29296305-abe2-f4bd-ded0-27bb53f631f0:frag:3:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: > Reading past RLE/BitPacking stream. > Fragment 3:0 > [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalArgumentException: Reading past RLE/BitPacking stream. > Fragment 3:0 > [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.4.0.jar:1.4.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in > parquet record reader. > Message: > Hadoop path: /tmp/tmp.gz.parquet > Total records read: 131070 > Mock records read: 0 > Records to read: 21845 > Row group index: 0 > Records in row group: 2418197 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message nat { > required int64 ts; > required int32 dr; > optional binary ui (UTF8); > optional int32 up; > optional binary ri (UTF8); > optional int32 rp; > optional binary di (UTF8); > optional int32 dp; > required int32 pr; > optional int64 ob; > optional int64 ib; > } > , metadata: {}}, blocks: [BlockMetaData{2418197, 30601003 > [ColumnMetaData{GZIP [ts] INT64 [PLAIN_DICTIONARY, BIT_PACKED, PLAIN], 4}, > ColumnMetaData{GZIP [dr] INT32 [PLAIN_DICTIONARY, BIT_PACKED], 2630991}, > ColumnMetaData{GZIP [ui] BINARY [PLAIN_DICTIONARY, RLE, BIT_PACKED], > 2964867}, ColumnMetaData{GZIP [up] INT32 [PLAIN_DICTIONARY, RLE, > BIT_PACKED], 2966955}, ColumnMetaData{GZIP [ri] BINARY [PLAIN_DICTIONARY, > RLE, BIT_PACKED], 7481618}, ColumnMetaData{GZIP [rp] INT32 > [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7483706}, ColumnMetaData{GZIP [di] > BINARY [RLE, BIT_PACKED, PLAIN], 11995191}, ColumnMetaData{GZIP [dp] INT32 > [RLE, BIT_PACKED, PLAIN], 11995247}, ColumnMetaData{GZIP [pr] INT32 > [PLAIN_DICTIONARY, BIT_PACKED], 11995303}, ColumnMetaData{GZIP [ob] INT64 > [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11995930}, ColumnMetaData{GZIP [ib] > INT64 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11999527}]}]} > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:345) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:447) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.8.0_40] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_40] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > ~[hadoop-common-2.7.1.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) > [drill-java-exec-1.4.0.jar:1.4.0] > ... 4 common frames omitted > Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking > stream. > at > org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) > ~[parquet-common-1.8.1-drill-r0.jar:1.8.1-drill-r0] > at > org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:84) > ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] > at > org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:66) > ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] > at > org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:122) > ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryBigIntReader.readField(ParquetFixedWidthDictionaryReaders.java:182) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:120) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:169) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:146) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:386) > ~[drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:429) > ~[drill-java-exec-1.4.0.jar:1.4.0] > ... 19 common frames omitted > {noformat} > When I change fields in schema to optional and regenerate file, drill will > start working. Same when I generate file using CTAS (which have all columns > optional as well). -- This message was sent by Atlassian JIRA (v6.3.4#6332)