[
https://issues.apache.org/jira/browse/HIVE-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvel Thirumoolan updated HIVE-7604:
---------------------------------------
Attachment: Design_HIVE_7604.1.txt
Thanks [~ashutoshc], uploading revised document with additional information for
return values. Lemme know if its unclear.
> Add Metastore API to fetch one or more partition names
> ------------------------------------------------------
>
> Key: HIVE-7604
> URL: https://issues.apache.org/jira/browse/HIVE-7604
> Project: Hive
> Issue Type: New Feature
> Components: Metastore
> Reporter: Thiruvel Thirumoolan
> Assignee: Thiruvel Thirumoolan
> Fix For: 0.14.0
>
> Attachments: Design_HIVE_7604.1.txt, Design_HIVE_7604.txt
>
>
> We need a new API in Metastore to address the following use cases. Both use
> cases arise from having tables with hundreds of thousands or in some cases
> millions of partitions.
> 1. It should be quick and easy to obtain distinct values of a partition. Eg:
> Obtain all dates for which partitions are available. This can be used by
> tools/frameworks programmatically to understand gaps in partitions before
> reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to
> obtain this information which is unfriendly and heavy weight. And for tables
> which have large number of partitions, it takes a long time to run the
> queries and it also requires large heap space.
> 2. Typically users would like to know the list of partitions available and
> would run queries that would only involve partition keys (select distinct
> partkey1 from table) Or to obtain the latest date partition from a dimension
> table to join against another fact table (select * from fact_table join
> select max(dt) from dimension_table). Those queries (metadata only queries)
> can be pushed to metastore and need not be run even locally in Hive. If the
> queries can be converted into database based queries, the clients can be
> light weight and need not fetch all partition names. The results can be
> obtained much faster with less resources.
--
This message was sent by Atlassian JIRA
(v6.2#6252)