[jira] [Commented] (HIVE-2612) support hive table/partitions coexistes in more than one clusters

Carl Steinbach (Commented) (JIRA) Thu, 19 Jan 2012 16:21:05 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189526#comment-13189526
 ]


Carl Steinbach commented on HIVE-2612:
--------------------------------------

bq. A table T1's primary cluster is C1 meaning :1) C1 contains all data that is 
available in all other clusters.

Does this mean that if T1's primary cluster is C1, then all of the partitions 
in T1 must also have have their primary partition set to C1? If that's the case 
then primary cluster should probably be a table level property, and the list of 
replica clusters can be a table/partition level property.

bq. 2) write is only allowed in this cluster for table C1. but need to allow 
exceptions here

What are the exceptions?

bq. 4) all data changes to T1 happened in the primary cluster should be 
replicated to other clusters if there are any secondary clusters. but there 
should be a conf to disable it as there are some exception situations.

What are the exceptions?

How will dynamic partitions work? Where will new partitions get the list of 
replica clusters from? Will they inherit it from the table definition?

Hive now supports insert-append into a partition (HIVE-306). Suppose that the 
metadata for a particular partition indicates that it is replicated to clusters 
C2 and C3. If I insert new data into the partition in the primary cluster C1, 
then the metadata is now invalid. How is this going to be handled?

bq. 2) add new interfaces which do exactly the same set of functionalities as 
old ones but using a different name (use _on_cluster suffix maybe?) and have a 
cluster parameter

This is going to introduce new codepaths that need to be tested separately, and 
also double the amount of work people need to do every time a new metastore API 
call is created. I don't think this is a good approach.

bq. 3) overwrite database name for the purpose of cluster name. And allow a 
table co-exist in multiple databases. But that require to promote table to top 
level citizen, and degrade database. For example, "show tables" used to scan 
all tables in current db, but now need to scan all tables in all databases.

I don't think this is an option since it breaks backwards compatibility and 
effectively changes the whole notion of what a db/schema is. A lot of people in 
the community already depend on this feature.

bq. 1) add a cluster parameter to existing thrift interfaces

This sounds like the best option to me. I think Thrift supports API evolution 
via default values for missing parameters, but setting a default value in this 
case may be a little tricky.

Also, instead of modifying the Thrift interface, is it possible that you could 
instead leverage the work that's being done in HIVE-2720?

                
> support hive table/partitions coexistes in more than one clusters
> -----------------------------------------------------------------
>
>                 Key: HIVE-2612
>                 URL: https://issues.apache.org/jira/browse/HIVE-2612
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> 1) add cluster object into hive metastore
> 2) each partition/table has a creation cluster and a list of living clusters, 
> and also data location in each cluster

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2612) support hive table/partitions coexistes in more than one clusters

Reply via email to