[ 
https://issues.apache.org/jira/browse/DRILL-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-6852:
------------------------------
    Labels: ready-to-commit  (was: )

> Adapt current Parquet Metadata cache implementation to use Drill Metastore API
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-6852
>                 URL: https://issues.apache.org/jira/browse/DRILL-6852
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.16.0
>
>
> According to the design document for DRILL-6552, existing metadata cache API 
> should be adapted to use generalized API for metastore and parquet metadata 
> cache will be presented as the implementation of metastore API.
> The aim of this Jira is to refactor Parquet Metadata cache implementation and 
> adapt it to use Drill Metastore API.
> Execution plan:
>  - Refactor AbstractParquetGroupScan and its implementations to use metastore 
> metadata classes. Store Drill data types in metadata files for Parquet tables.
>  - Storing the least restrictive type instead of current first file’s column 
> data type.
>  - Rework logic in AbstractParquetGroupScan to allow filtering at different 
> metadata layers: partition, file, row group, etc. The same for pushing the 
> limit.
>  - Implement logic to convert existing parquet metadata to metastore metadata 
> to preserve backward compatibility.
>  - Implement fetching metadata only when it is needed (for filtering, limit, 
> count(*) etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to