[ https://issues.apache.org/jira/browse/HIVE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553396#comment-16553396 ]
Vihang Karajgaonkar commented on HIVE-19715: -------------------------------------------- Attached the first version of the design proposal for the new API. TLDR The API reuses existing {{PartitionSpec}} objects and makes some of the fields in PartitionSpec as optional. It also supports the following: 1. Projection list which is a list of string of dot separated field names. So example, clients who are interested only in partition locations can request {{sd.location}} and the result will only include the locations instead of the full partition objects. 2. FilterSpec which is provides different ways to filter the partitions for a given table. The current supports {{BY_NAMES}}, {{BY_VALUES}} or {{BY_EXPR}}. Although its not clear if there is value is providing {{BY_VALUES}} filters. 3. Pagination: API response contains a Pagination token which can used by the clients to send subsequent requests to retrieve configurable batches of partitions. The pagination token itself is a {{byte[]}} which client doesn't need to interpret. Internally server can send some values to in the token like last {{PART_ID}} sent previously, table modification stamp etc. Any thoughts or suggestions? cc: [~alangates] [~thejas] [~tlipcon] [~akolb] > Consolidated and flexible API for fetching partition metadata from HMS > ---------------------------------------------------------------------- > > Key: HIVE-19715 > URL: https://issues.apache.org/jira/browse/HIVE-19715 > Project: Hive > Issue Type: New Feature > Components: Standalone Metastore > Reporter: Todd Lipcon > Assignee: Vihang Karajgaonkar > Priority: Major > Attachments: HIVE-19715-design-doc.pdf > > > Currently, the HMS thrift API exposes 17 different APIs for fetching > partition-related information. There is somewhat of a combinatorial explosion > going on, where each API has variants with and without "auth" info, by pspecs > vs names, by filters, by exprs, etc. Having all of these separate APIs long > term is a maintenance burden and also more confusing for consumers. > Additionally, even with all of these APIs, there is a lack of granularity in > fetching only the information needed for a particular use case. For example, > in some use cases it may be beneficial to only fetch the partition locations > without wasting effort fetching statistics, etc. > This JIRA proposes that we add a new "one API to rule them all" for fetching > partition info. The request and response would be encapsulated in structs. > Some desirable properties: > - the request should be able to specify which pieces of information are > required (eg location, properties, etc) > - in the case of partition parameters, the request should be able to do > either whitelisting or blacklisting (eg to exclude large incremental column > stats HLL dumped in there by Impala) > - the request should optionally specify auth info (to encompas the > "with_auth" variants) > - the request should be able to designate the set of partitions to access > through one of several different methods (eg "all", list<name>, expr, > part_vals, etc) > - the struct should be easily evolvable so that new pieces of info can be > added > - the response should be designed in such a way as to avoid transferring > redundant information for common cases (eg simple "dictionary coding" of > strings like parameter names, etc) > - the API should support some form of pagination for tables with large > partition counts -- This message was sent by Atlassian JIRA (v7.6.3#76005)