[jira] [Issue Comment Edited] (HIVE-2612) support hive table/partitions coexistes in more than one clusters

Namit Jain (Issue Comment Edited) (JIRA) Mon, 30 Jan 2012 13:41:37 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196447#comment-13196447
 ]


Namit Jain edited comment on HIVE-2612 at 1/30/12 9:40 PM:
-----------------------------------------------------------

.bq. write is only allowed in this cluster for table C1. but need to allow 
exceptions here. What are the exceptions ?

Currently, there should be no exceptions. Eventually, if we provide something 
in hive to do a cross-cluster write, that should be like an exception. There 
may be a hive command like, Replicate T@P from cluster1 to c1uster2.

.bq. all data changes to T1 happened in the primary cluster should be 
replicated to other clusters if there are any secondary clusters. but there 
should be a conf to disable it as there are some exception situations.

This question should not be relevant now. A much simpler to visualize this is: 
for every table, there is a primary cluster, and a list of secondary clusters. 
All the partitions belong to the primary cluster, and may belong to one or more 
secondary clusters. Every hive session has a current cluster, and the read 
happens from the current cluster. An error is thrown if the partition is 
missing from the current cluster, but is present in the primary cluster. I will 
write a new wiki, and attach it - it might be simpler to understand that way.

Dynamic partitions should not require anything different.


.bq. overwrite database name for the purpose of cluster name. And allow a table 
co-exist in multiple databases. But that require to promote table to top level 
citizen, and degrade database. For example, "show tables" used to scan all 
tables in current db, but now need to scan all tables in all databases. I don't 
think this is an option since it breaks backwards compatibility and effectively 
changes the whole notion of what a db/schema is. A lot of people in the 
community already depend on this feature.

Agreed.


.bq. add a cluster parameter to existing thrift interfaces. This sounds like 
the best option to me. I think Thrift supports API evolution via default values 
for missing parameters, but setting a default value in this case may be a 
little tricky.

Agreed

.bq. Also, instead of modifying the Thrift interface, is it possible that you 
could instead leverage the work that's being done in HIVE-2720?

Will look into it
                
      was (Author: namit):
    .bq write is only allowed in this cluster for table C1. but need to allow 
exceptions here. What are the exceptions?

Currently, there should be no exceptions. Eventually, if we provide something 
in hive to do a cross-cluster write, that should be like an exception. There 
may be a hive command like, Replicate T@P from cluster1 to c1uster2.

.bq all data changes to T1 happened in the primary cluster should be replicated 
to other clusters if there are any secondary clusters. but there should be a 
conf to disable it as there are some exception situations.

This question should not be relevant now. A much simpler to visualize this is: 
for every table, there is a primary cluster, and a list of secondary clusters. 
All the partitions belong to the primary cluster, and may belong to one or more 
secondary clusters. Every hive session has a current cluster, and the read 
happens from the current cluster. An error is thrown if the partition is 
missing from the current cluster, but is present in the primary cluster. I will 
write a new wiki, and attach it - it might be simpler to understand that way.

Dynamic partitions should not require anything different.



.bq overwrite database name for the purpose of cluster name. And allow a table 
co-exist in multiple databases. But that require to promote table to top level 
citizen, and degrade database. For example, "show tables" used to scan all 
tables in current db, but now need to scan all tables in all databases. I don't 
think this is an option since it breaks backwards compatibility and effectively 
changes the whole notion of what a db/schema is. A lot of people in the 
community already depend on this feature.

Agreed.


.bq add a cluster parameter to existing thrift interfaces. This sounds like the 
best option to me. I think Thrift supports API evolution via default values for 
missing parameters, but setting a default value in this case may be a little 
tricky.

Agreed

.bq Also, instead of modifying the Thrift interface, is it possible that you 
could instead leverage the work that's being done in HIVE-2720?

Will look into it
                  
> support hive table/partitions coexistes in more than one clusters
> -----------------------------------------------------------------
>
>                 Key: HIVE-2612
>                 URL: https://issues.apache.org/jira/browse/HIVE-2612
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: He Yongqiang
>            Assignee: Namit Jain
>
> 1) add cluster object into hive metastore
> 2) each partition/table has a creation cluster and a list of living clusters, 
> and also data location in each cluster

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HIVE-2612) support hive table/partitions coexistes in more than one clusters

Reply via email to