[ https://issues.apache.org/jira/browse/DRILL-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450191#comment-15450191 ]
Padma Penumarthy commented on DRILL-4770: ----------------------------------------- The problem is happening because this parquet file has data page type v2. Drill fast reader does not have support for data page type v2. I tried using the parquet library reader (from parquet-mr) using the option set store.parquet.use_new_reader=true; This also does not work because int64 delta encoding support is not there in the version of library (1.8.1-drill-r0) that we use. Support for this was added to parquet-mr recently. https://issues.apache.org/jira/browse/PARQUET-225 So, we have 2 choices: 1. Add support for data page v2 in drill fast reader 2. Wait for new parquet-mr release to happen so we can pick up the fix for int64 delta encoding. In this case, we will not be supporting this in the fast reader. > ParquetRecordReader throws NPE querying a single int64 column file > ------------------------------------------------------------------ > > Key: DRILL-4770 > URL: https://issues.apache.org/jira/browse/DRILL-4770 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.8.0 > Reporter: Chun Chang > Assignee: Padma Penumarthy > Fix For: 1.8.0 > > Attachments: int64_10_bs10k_ps1k_uncompressed.parquet > > > I have a parquet file with a single int64 column. > {noformat} > [root@perfnode166 parquet-mr]# java -jar > parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump > /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int64_10_bs10k_ps1k_uncompressed.parquet > row group 0 > -------------------------------------------------------------------------------- > int64_field_required: INT64 UNCOMPRESSED DO:0 FPO:4 SZ:55/55/1.00 VC:10 > [more]... > int64_field_required TV=10 RL=0 DL=0 > > ---------------------------------------------------------------------------- > page 0: DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max: > [more]... VC:10 > INT64 int64_field_required > -------------------------------------------------------------------------------- > *** row group 1 of 1, values 1 to 10 *** > value 1: R:0 D:0 V:0 > value 2: R:0 D:0 V:1 > value 3: R:0 D:0 V:2 > value 4: R:0 D:0 V:3 > value 5: R:0 D:0 V:4 > value 6: R:0 D:0 V:5 > value 7: R:0 D:0 V:6 > value 8: R:0 D:0 V:7 > value 9: R:0 D:0 V:8 > value 10: R:0 D:0 V:9 > {noformat} > Drill version: > {noformat} > 0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version; > +-----------------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+ > | version | commit_id | > commit_message > | commit_time | build_email | > build_time | > +-----------------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+ > | 1.8.0-SNAPSHOT | 05c42eae79ce3e309028b3824f9449b98e329f29 | DRILL-4707: > Fix memory leak or incorrect query result in case two column names are > case-insensitive identical. | 29.06.2016 @ 08:15:13 PDT | > inram...@gmail.com | 07.07.2016 @ 10:50:40 PDT | > +-----------------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+ > 1 row selected (0.44 seconds) > {noformat} > drill throws NPE: > {noformat} > 2016-07-08 11:08:55,156 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 288013c7-f122-f6be-936e-c18ebe9b92ef: select * from > dfs.`drill/testdata/parquet_storage/int64_10_bs10k_ps1k_uncompressed.parquet` > 2016-07-08 11:08:55,292 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses > 2016-07-08 11:08:55,295 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of > 1 using 1 threads. Time: 2ms total, 2.423069ms avg, 2ms max. > 2016-07-08 11:08:55,295 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of > 1 using 1 threads. Earliest start: 1.347000 μs, Latest start: 1.347000 μs, > Average start: 1.347000 μs . > 2016-07-08 11:08:55,295 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 2 ms to read file metadata > 2016-07-08 11:08:55,377 [288013c7-f122-f6be-936e-c18ebe9b92ef:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 288013c7-f122-f6be-936e-c18ebe9b92ef:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2016-07-08 11:08:55,377 [288013c7-f122-f6be-936e-c18ebe9b92ef:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 288013c7-f122-f6be-936e-c18ebe9b92ef:0:0: State to report: RUNNING > 2016-07-08 11:08:55,386 [288013c7-f122-f6be-936e-c18ebe9b92ef:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 288013c7-f122-f6be-936e-c18ebe9b92ef:0:0: State change requested RUNNING --> > FAILED > 2016-07-08 11:08:55,386 [288013c7-f122-f6be-936e-c18ebe9b92ef:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 288013c7-f122-f6be-936e-c18ebe9b92ef:0:0: State change requested FAILED --> > FINISHED > 2016-07-08 11:08:55,387 [288013c7-f122-f6be-936e-c18ebe9b92ef:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException > Fragment 0:0 > [Error Id: 21fcc35b-6151-46b6-a750-0ce6f2141a7d on 10.10.30.167:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > NullPointerException > Fragment 0:0 > [Error Id: 21fcc35b-6151-46b6-a750-0ce6f2141a7d on 10.10.30.167:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in > parquet record reader. > Message: > Hadoop path: > /drill/testdata/parquet_storage/int64_10_bs10k_ps1k_uncompressed.parquet > Total records read: 0 > Mock records read: 0 > Records to read: 10 > Row group index: 0 > Records in row group: 10 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message test { > required int64 int64_field_required; > } > , metadata: {writer.model.name=example}}, blocks: [BlockMetaData{10, 55 > [ColumnMetaData{UNCOMPRESSED [int64_field_required] INT64 > [DELTA_BINARY_PACKED], 4}]}]} > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:352) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:454) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:178) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_45] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > ~[hadoop-common-2.7.0-mapr-1602.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > ... 4 common frames omitted > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:241) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readPage(ColumnReader.java:198) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:141) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:393) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:436) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > ... 19 common frames omitted > 2016-07-08 11:08:55,412 [CONTROL-rpc-event-queue] WARN > o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED > state as query is already at FAILED state (which is terminal). > 2016-07-08 11:08:55,413 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel fragment. > 288013c7-f122-f6be-936e-c18ebe9b92ef:0:0 does not exist. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)