[ https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Challapalli closed DRILL-4070. ------------------------------------ > Files written with versions of Drill before v1.3 record metadata that is > indistinguishable from bad metadata from other Parquet creators > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-4070 > URL: https://issues.apache.org/jira/browse/DRILL-4070 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.3.0 > Reporter: Rahul Challapalli > Assignee: Parth Chandra > Priority: Blocker > Fix For: 1.3.0 > > Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz > > > Drill uses the parquet-mr library to write Parquet files. The metadata > signature that Drill produced in 1.2 and earlier versions of Drill is > indistinguishable from older footers written by other tools (such as Pig and > Hive). There was a known bug when those tools wrote metadata that caused the > statistics to be incorrect. To correct this, the parquet-mr library adopted a > behavior of ignoring statistics from the old form of the Parquet footer. > With 1.3, Drill upgraded to the latest version of parquet-mr and has now > started ignoring these statistics as well. This ensures correct result but > produces performance regressions (compared to Drill v1 and v2) when querying > against partitioned Parquet files generated in Drill 1.1 and 1.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)