[ https://issues.apache.org/jira/browse/DRILL-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha updated DRILL-6852: ------------------------------ Labels: ready-to-commit (was: ) > Adapt current Parquet Metadata cache implementation to use Drill Metastore API > ------------------------------------------------------------------------------ > > Key: DRILL-6852 > URL: https://issues.apache.org/jira/browse/DRILL-6852 > Project: Apache Drill > Issue Type: Sub-task > Reporter: Volodymyr Vysotskyi > Assignee: Volodymyr Vysotskyi > Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > According to the design document for DRILL-6552, existing metadata cache API > should be adapted to use generalized API for metastore and parquet metadata > cache will be presented as the implementation of metastore API. > The aim of this Jira is to refactor Parquet Metadata cache implementation and > adapt it to use Drill Metastore API. > Execution plan: > - Refactor AbstractParquetGroupScan and its implementations to use metastore > metadata classes. Store Drill data types in metadata files for Parquet tables. > - Storing the least restrictive type instead of current first file’s column > data type. > - Rework logic in AbstractParquetGroupScan to allow filtering at different > metadata layers: partition, file, row group, etc. The same for pushing the > limit. > - Implement logic to convert existing parquet metadata to metastore metadata > to preserve backward compatibility. > - Implement fetching metadata only when it is needed (for filtering, limit, > count(*) etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)