Elliot West created HIVE-11229:
----------------------------------
Summary: Mutation API: Coordinator communication with meta store
should be optional
Key: HIVE-11229
URL: https://issues.apache.org/jira/browse/HIVE-11229
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 2.0.0
Reporter: Elliot West
Assignee: Elliot West
[~ekoifman] raised a theoretical issue with the streaming mutation API
(HIVE-10165) where worker nodes operating in a distributed cluster might
overwhelm a meta store while trying to obtain partition locks. Although this
does not happen in practice (see HIVE-11228), the API does communicate with the
meta store in this manner to obtain partition paths and create new partitions.
Therefore the issue described does in fact exist in the current implementation,
albeit in a different code path. I’d like to make such communication optional
like so:
* When the user chooses not to create partitions on demand, no meta store
connection will be created in the {{MutationCoordinators}}. Additionally,
partition paths will be resolved using
{{org.apache.hadoop.hive.metastore.Warehouse.getPartitionPath(Path,
LinkedHashMap<String, String>)}} which should be suitable so long as standard
Hive partition layouts are followed.
* If the user does choose to create partitions on demand then the system will
operate as is does currently; using the meta store to both issue
{{add_partition}} events and look up partition meta data.
* The documentation will be updated to describe these behaviours and outline
alternative approaches to collecting affected partition names and creating
partitions in a less intensive manner.
Side note for follow up: The parameter names {{tblName}} and {{dbName}} seem to
be the wrong way around on the method
{{org.apache.hadoop.hive.metastore.IMetaStoreClient.getPartition(String,
String, List<String>)}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)