[jira] [Commented] (DRILL-7934) NullPointerException error when reading parquet files

ASF GitHub Bot (Jira) Sat, 29 May 2021 05:02:04 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17353739#comment-17353739
 ]


ASF GitHub Bot commented on DRILL-7934:
---------------------------------------

vvysotskyi commented on a change in pull request #2238:
URL: https://github.com/apache/drill/pull/2238#discussion_r641929654



##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScanStatistics.java
##########
@@ -115,7 +118,7 @@ public void collect(Collection<T> metadataList) {
           previousCount.setValue(Statistic.NO_COLUMN_STATS);
         }
         ColumnMetadata columnMetadata = 
SchemaPathUtils.getColumnMetadata(schemaPath, metadata.getSchema());
-        TypeProtos.MajorType majorType = columnMetadata != null ? 
columnMetadata.majorType() : null;
+        TypeProtos.MajorType majorType = columnMetadata != null ? 
columnMetadata.majorType() : NULL;

Review comment:
       Specifying the `NULL` type here may cause issues later when this type is 
used for obtaining values comparator... 
   But do we actually support partitioning list columns, i.e. if `majorType` is 
null, maybe we should set `partitionColumn` to false instead of calling the 
`checkForPartitionColumn()` method?

##########
File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java
##########
@@ -745,4 +747,41 @@ public void testLimitMultipleRowGroupsBeyondRowCount() 
throws Exception {
     assertTrue(String.format("Number of records in output is wrong: 
expected=%d, actual=%s", 300, recordsInOutput), 300 == recordsInOutput);
   }
 
+  @Test
+  public void testTypeNull() throws Exception {

Review comment:
       Please note that this class has `@Ignore` annotation, so its tests 
wouldn't be running. Please add the test to another class.

##########
File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java
##########
@@ -745,4 +747,41 @@ public void testLimitMultipleRowGroupsBeyondRowCount() 
throws Exception {
     assertTrue(String.format("Number of records in output is wrong: 
expected=%d, actual=%s", 300, recordsInOutput), 300 == recordsInOutput);
   }
 
+  @Test
+  public void testTypeNull() throws Exception {
+    /* the `features` schema is:
+    optional group features {
+      required int32 type (INTEGER(8,true));
+      optional int32 size;
+      optional group indices (LIST) {
+        repeated group list {
+          required int32 element;
+        }
+      }
+      optional group values (LIST) {
+        repeated group list {
+          required double element;
+        }
+      }
+    }
+    base on 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/util/SchemaPathUtils.java
+    list schema is skipped, so that in ParquetGroupScanStatistics drill can 
not get ColumnMetadata by schemaPath
+    */
+    List<QueryDataBatch> results = testSqlWithResults("SELECT * FROM 
cp.`parquet/test_type_null.parquet`");

Review comment:
       Please use `testBuilder()` for running and verifying query results.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> NullPointerException error when reading parquet files
> -----------------------------------------------------
>
>                 Key: DRILL-7934
>                 URL: https://issues.apache.org/jira/browse/DRILL-7934
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.18.0
>         Environment: Drill 1.18 
> Ambari 2.7.4
> Spark 3.0.2
>            Reporter: cdmikechen
>            Priority: Critical
>             Fix For: 1.19.0
>
>         Attachments: 
> part-00000-e849bed7-5cc2-480c-96d8-3fe5f9b4294a-c000.snappy.parquet, 
> part-00001-e849bed7-5cc2-480c-96d8-3fe5f9b4294a-c000.snappy.parquet
>
>
> I create a dataset using spark ml, when I use drill 1.18 to query this 
> dataset folder, it report error this:
> {code:java}
> [Error Id: 92d3f331-ffca-46b5-a64c-87453b88a108 on xxx.xxx.xxx:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:788)
>         at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:322)
>         at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:216)
>         at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:76)
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:300)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Error while applying rule 
> DrillPushProjectIntoScanRule:enumerable, args 
> [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), 
> rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default,
>  /home/spark/dataset/default/test2/*.parquet])]
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
>         ... 3 common frames omitted
> Caused by: java.lang.RuntimeException: Error while applying rule 
> DrillPushProjectIntoScanRule:enumerable, args 
> [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), 
> rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default,
>  /home/spark/dataset/default/test2/*.parquet])]
>         at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235)
>         at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:633)
>         at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327)
>         at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:405)
>         at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:351)
>         at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:245)
>         at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308)
>         at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
>         at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>         at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>         at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
>         at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>         at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
>         ... 3 common frames omitted
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn(ParquetGroupScanStatistics.java:186)
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect(ParquetGroupScanStatistics.java:119)
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>(ParquetGroupScanStatistics.java:59)
>         at 
> org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics(BaseParquetMetadataProvider.java:293)
>         at 
> org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata(BaseParquetMetadataProvider.java:249)
>         at 
> org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata(BaseParquetMetadataProvider.java:203)
>         at 
> org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init(BaseParquetMetadataProvider.java:170)
>         at 
> org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:95)
>         at 
> org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:48)
>         at 
> org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build(ParquetTableMetadataProviderImpl.java:415)
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:150)
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:120)
>         at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:202)
>         at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:79)
>         at 
> org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:226)
>         at 
> org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:209)
>         at 
> org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:119)
>         at 
> org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.canPushProjectIntoScan(DrillPushProjectIntoScanRule.java:190)
>         at 
> org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.onMatch(DrillPushProjectIntoScanRule.java:107)
>         at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
>         ... 16 common frames omitted
> {code}
> It is same like issue https://issues.apache.org/jira/browse/DRILL-7769.
>  I add some log information and found this:
> {code:java}
> TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path 
> `features`.`values`.`list`.`element` with major type null
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
> `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE 
> o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `label` with major 
> type minor_type: FLOAT8
> mode: REQUIRED
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
> `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE 
> o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`size` 
> with major type minor_type: INT
> mode: OPTIONAL
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
> `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> {code}
> So that there is some condition major type is null, if drill use this code, 
> it will catch NullPointerException error:
> {code:java}
> TypeProtos.MajorType majorType = columnMetadata != null ? 
> columnMetadata.majorType() : null; # 121
> !partitionColTypeMap.get(schemaPath).equals(type) # 189
> {code}
> we need to change null to *org.apache.drill.common.types.Types.NULL* to avoid 
> NullPointerException error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7934) NullPointerException error when reading parquet files

Reply via email to