cdmikechen created DRILL-7934:
---------------------------------

             Summary: NullPointerException error when reading parquet files
                 Key: DRILL-7934
                 URL: https://issues.apache.org/jira/browse/DRILL-7934
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.18.0
         Environment: Drill 1.18 
Ambari 2.7.4
Spark 3.0.2
            Reporter: cdmikechen
             Fix For: 1.19.0


I create a dataset using spark ml, when I use drill 1.18 to query this dataset 
folder, it report error this:
{code}
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Error while applying rule 
DrillScanRule, args 
[rel#29:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, 
/home/spark/dataset/default/test2/*.parquet])]
        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
        ... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error while applying rule DrillScanRule, 
args 
[rel#29:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, 
/home/spark/dataset/default/test2/*.parquet])]
        at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:633)
        at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:405)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:351)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:245)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
        at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
        ... 3 common frames omitted
Caused by: java.lang.NullPointerException: null
        at 
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.lambda$getPartitionsMetadata$7(BaseParquetMetadataProvider.java:354)
        at java.util.Map.forEach(Map.java:630)
        at 
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getPartitionsMetadata(BaseParquetMetadataProvider.java:342)
        at 
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata(BaseParquetMetadataProvider.java:206)
        at 
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init(BaseParquetMetadataProvider.java:170)
        at 
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:95)
        at 
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:48)
        at 
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build(ParquetTableMetadataProviderImpl.java:415)
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:150)
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:120)
        at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:202)
        at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:79)
        at 
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:226)
        at 
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:209)
        at 
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:119)
        at 
org.apache.drill.exec.planner.common.DrillScanRelBase.<init>(DrillScanRelBase.java:51)
        at 
org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:76)
        at 
org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:65)
        at 
org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:58)
        at 
org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:38)
        at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
        ... 16 common frames omitted
{code}

It is same like issue https://issues.apache.org/jira/browse/DRILL-7769.
I add some log information and found this:
{code}
TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path 
`features`.`values`.`list`.`element` with major type null
 current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
`features`.`type`=minor_type: TINYINT
mode: REQUIRED
, `features`.`size`=minor_type: INT
mode: OPTIONAL
}
2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE 
o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `label` with major 
type minor_type: FLOAT8
mode: REQUIRED

 current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
`features`.`type`=minor_type: TINYINT
mode: REQUIRED
, `features`.`size`=minor_type: INT
mode: OPTIONAL
}
2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE 
o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`size` 
with major type minor_type: INT
mode: OPTIONAL

 current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, 
`features`.`type`=minor_type: TINYINT
mode: REQUIRED
, `features`.`size`=minor_type: INT
mode: OPTIONAL
}
{code}

So that there is some condition major type is null, if drill use this code, it 
will catch NullPointerException error:
{code:java}
TypeProtos.MajorType majorType = columnMetadata != null ? 
columnMetadata.majorType() : null; # 121
!partitionColTypeMap.get(schemaPath).equals(type) # 189
{code}

we need to change null to *org.apache.drill.common.types.Types.NULL* to avoid 
NullPointerException error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to