[ https://issues.apache.org/jira/browse/DRILL-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898968#comment-16898968 ]
Arina Ielchiieva commented on DRILL-5075: ----------------------------------------- Since Drill 1.8, Parquet version has been upgraded so I currently such warnings are not observed. Please reopen if seen again. > Tests complain about Parquet metadata parse errors in Drill-created files > ------------------------------------------------------------------------- > > Key: DRILL-5075 > URL: https://issues.apache.org/jira/browse/DRILL-5075 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet > file, then read it using the "new" Parquet reader. However, the test throws > the following assertion (though the test still succeeds.) > Note that the exception does _not_ occur if we run the single test function > by itself. It only occurs when run as part of the entire test class, > suggesting an interaction between tests. > When run stand-alone, another behavior occurs. When the test is complete, and > the Drillbit shuts down, only then does Parquet log a bunch of > "ColumnChunkPageWriteStore: written" messages followed by: > {code} > WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because > created_by is null or empty! See PARQUET-251 and PARQUET-297 > {code} > Are we leaving a file open that is getting flushed only on shut-down? > Full error when the test runs in the entire suite: > {code} > PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because > created_by could not be parsed (see PARQUET-251): parquet-mr > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\) > at org.apache.parquet.VersionParser.parse(VersionParser.java:112) > at > org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412) > at > org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381) > at > org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379) > at > org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316) > at > org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1) > at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56) > at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122) > at > org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278) > at > org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257) > at > org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242) > at > org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118) > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733) > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230) > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190) > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169) > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1) > at > org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145) > at > org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103) > at > org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85) > at > org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65) > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > at > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808) > at > org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97) > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008) > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)