[ 
https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4070.
------------------------------------

> Files written with versions of Drill before v1.3 record metadata that is 
> indistinguishable from bad metadata from other Parquet creators
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4070
>                 URL: https://issues.apache.org/jira/browse/DRILL-4070
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.3.0
>            Reporter: Rahul Challapalli
>            Assignee: Parth Chandra
>            Priority: Blocker
>             Fix For: 1.3.0
>
>         Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> Drill uses the parquet-mr library to write Parquet files. The metadata 
> signature that Drill produced in 1.2 and earlier versions of Drill is 
> indistinguishable from older footers written by other tools (such as Pig and 
> Hive). There was a known bug when those tools wrote metadata that caused the 
> statistics to be incorrect. To correct this, the parquet-mr library adopted a 
> behavior of ignoring statistics from the old form of the Parquet footer. 
> With 1.3, Drill upgraded to the latest version of parquet-mr and has now 
> started ignoring these statistics as well. This ensures correct result but 
> produces performance regressions (compared to Drill v1 and v2) when querying 
> against partitioned Parquet files generated in Drill 1.1 and 1.2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to