[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706597#comment-13706597
 ] 

Edward Capriolo commented on HIVE-2989:
---

Did we ditch this idea? should we close up shop?

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
 HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
 HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706598#comment-13706598
 ] 

Bhushan Mandhani commented on HIVE-2989:


Hi, Bhushan Mandhani is no longer at Facebook so this email address is no 
longer being monitored. If you need assistance, please contact another person 
who is currently at the company.


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
 HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
 HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293841#comment-13293841
 ] 

Namit Jain commented on HIVE-2989:
--

@Carl, We haven't seen any comments from you. All the follow-up jiras have been 
filed.
Let me know if you have any comments on this.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-12 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293931#comment-13293931
 ] 

Carl Steinbach commented on HIVE-2989:
--

I'm working on some comments. Will post them later today after the Hive meetup 
is over. Thanks.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-11 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293200#comment-13293200
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Bhushan: I think HIVE-3114 (Split Thrift interface for TableLink creation) 
should be done in this patch instead of splitting it out into a followup 
ticket. Here's what I said in HIVE-3114:

bq. I'm concerned that the Metastore Thrift interface is one of Hive's de facto 
public APIs, and any new functionality that appears in a release will need to 
be supported going forward. Why not just fix this in HIVE-2989 and eliminate 
the possibility that we're going to get stuck with an interface that we already 
know is broken?


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-11 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293201#comment-13293201
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Bhushan: I'll look over the rest of patch later tonight. Thanks.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-06 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290792#comment-13290792
 ] 

Carl Steinbach commented on HIVE-2989:
--

bq. The semantics of rename table and drop table can be similar. Both can fail 
or drop/rename the link depending on cascade etc. This can be done in a 
follow-up.

I'm fine with doing this in a followup as long as the semantics and expected 
behavior are described in the design doc. Right now it doesn't look like 
they're defined anywhere.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289815#comment-13289815
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Bhushan: I added more code review comments here 
https://reviews.facebook.net/D3405


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289887#comment-13289887
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Bhushan/Namit: Any thoughts on this?
{quote}
I think it would be a good idea to require the user to name the table link 
(using a valid SQL identifier) instead of tying the name of the link to the 
table/db it points to. Table links (like views) can provide a useful level of 
indirection, but we lose that if the name of link has to map directly to the 
target table. For example, suppose table t1 exists in the default db, and the 
table link for t1 is created in database db1. Suppose at some later point t1 is 
moved to a different db, or that the name of t1 is changed. With the current 
implementation we would then also have to change the name of the link, which 
would require us to also change any applications or scripts that refer to this 
link. With the other approach we would only need to ALTER the details of the 
link but would be able leave the name of link unchanged.
{quote}

I'll add to this that using the tab_name@db_name syntax also requires us to 
weaken the grammar and add lots of special-case logic to the methods that 
validate table names.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Sambavi Muthukrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289890#comment-13289890
 ] 

Sambavi Muthukrishnan commented on HIVE-2989:
-

Is there a command to move a table to another db? I don't see any command that 
lets you do that directly. Given this, if you are making a copy of the table, 
shouldn't the links ideally be set up all over again? Application code will 
need to change if that happens, but how likely is this sort of rename?

While I agree that using names without the @ syntax is nicer, I think having 
the @ syntax guarantees a nice naming convention to know what the table is 
pointing to instead of an opaque name from the users perspective.



 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289891#comment-13289891
 ] 

Bhushan Mandhani commented on HIVE-2989:


Carl, we discussed this among us and strongly believe the user should be able 
to refer to the target table in his queries without having to think about What 
do I call this table? Briefly, we were even considering the conventional X.Y 
access but that had other issues and we wanted to disable that syntax 
completely in our system. I have made the changes you wanted for name 
validation in MetaStoreUtils and there is minimal special case logic there. You 
can look at it when I update the diff.   

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289894#comment-13289894
 ] 

Carl Steinbach commented on HIVE-2989:
--

bq. we discussed this among us and strongly believe the user should be able to 
refer to the target table in his queries without having to think about What do 
I call this table?

If coming up with names for table links is a huge burden then it should be easy 
to provide a default name (e.g. 'tabname_at_dbname').

bq. Is there a command to move a table to another db? I don't see any command 
that lets you do that directly. Given this, if you are making a copy of the 
table, shouldn't the links ideally be set up all over again? Application code 
will need to change if that happens, but how likely is this sort of rename?

No, there's no command that lets you do that today, but changing the name of a 
table is supported 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable),
 and the same issues apply there. What's the expected behavior if I change the 
name of a target table?


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289897#comment-13289897
 ] 

Bhushan Mandhani commented on HIVE-2989:


One criteria was that the user should not be able to create a Managed Table 
with a name that looks like the name of a Table Link. If User A creates such a 
table with name X_at_Y, and user  B comes along and tries to create a link to 
X@Y the command will fail. Worse, user B  could see the table X_at_Y exists and 
query it assuming it is a link to the table X he is trying to query. Also, the 
X@Y syntax is already being used by Oracle for its Database Links. So we 
decided to use the same syntax. When you change the name of a target table, the 
name of the Links should be updated to reflect that. Sambavi will do that in 
the Alter Table patch.   

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289955#comment-13289955
 ] 

Namit Jain commented on HIVE-2989:
--

The semantics of rename table and drop table can be similar.
Both can fail or drop/rename the link depending on cascade etc.
This can be done in a follow-up.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289002#comment-13289002
 ] 

Carl Steinbach commented on HIVE-2989:
--

Is this behavior expected with the current version of the patch?

{noformat}
hive show tables;
show tables;
OK
target@default
Time taken: 0.076 seconds
hive SELECT * FROM target@default;
SELECT * FROM target@default;
FAILED: ParseException line 1:20 mismatched input '@' expecting EOF near 
'target'

hive select target;
select target;
FAILED: ParseException line 1:7 mismatched input 'EOF' expecting FROM near 
'target' in from clause

hive SELECT * FROM `target@default`;
SELECT * FROM `target@default`;
FAILED: ParseException line 1:21 mismatched character '@' expecting '`'
line 1:30 required (...)+ loop did not match anything at character 'EOF'
{noformat}


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289040#comment-13289040
 ] 

Carl Steinbach commented on HIVE-2989:
--

# I think it would be a good idea to require the user to name the table link 
(using a valid SQL identifier) instead of tying the name of the link to the 
table/db it points to. Table links (like views) can provide a useful level of 
indirection, but we lose that if the name of link has to map directly to the 
target table. For example, suppose table t1 exists in the default db, and the 
table link for t1 is created in database db1. Suppose at some later point t1 is 
moved to a different db, or that the name of t1 is changed. With the current 
implementation we would then also have to change the name of the link, which 
would require us to also change any applications or scripts that refer to this 
link. With the other approach we would only need to ALTER the details of the 
link but would be able leave the name of link unchanged.
# Given a table t, is there a way to print out the list of table links that 
point to t?
# What is the expected behavior if I create a link to target@default and then 
drop target?
## There are a bunch of variations on this same question, e.g. what happens if 
I change the name of the target table, or if I add columns to the target table, 
etc. There should be test coverage for each one of these cases.
# What is the expected behavior if a user creates a link to target@default, and 
then target is subsequently dropped and then created again by another user? 
Should the link continue to work? If so, doesn't this create a potential 
security vulnerability?
# It looks likes there's currently no difference in the way DYNAMIC and STATIC 
table links behave. If that's true then I think we should remove this from the 
grammar and metadata in order to avoid confusion.
# DESCRIBE FORMATTED needs to be updated to print the linkTarget and linkTables 
fields.


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that 

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289049#comment-13289049
 ] 

Carl Steinbach commented on HIVE-2989:
--

This looks like a bug:

{noformat}
hive CREATE TABLE target (x INT);

hive DESCRIBE FORMATTED target;
Location:   file:/user/hive/warehouse/target

hive use tmpdb;

hive CREATE TABLELINK TO target@default;

hive DESCRIBE FORMATTED target@default;
Location:   file:/user/hive/warehouse/target

hive use default;

hive ALTER TABLE target SET LOCATION 'file:/BOGUS_PATH';

hive DESCRIBE FORMATTED target;
Location:   file:/BOGUS_PATH

hive use tmpdb;

hive DESCRIBE FORMATTED target@default;
Location:   file:/user/hive/warehouse/target
{noformat}


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289056#comment-13289056
 ] 

Bhushan Mandhani commented on HIVE-2989:


It is not a bug. It is currently being implemented by Sambavi. She is handling 
Alter Link and Alter Table commands. This particular patch contains Create, 
Desc and Drop. Sambavi is working on another patch that will do the two Alter 
commands. I will do querying in a follow-up patch of my own. There is a lot of 
functionality here that needs to be built out. We are planning to do this in 
multiple patches. We would like to get this one in first because everything 
else will build on top of this. We have thought about all these questions you 
are bringing up and created the corresponding tasks. I'll update the design doc 
with this info. I and Sambavi will create the corresponding Jiras as well.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289065#comment-13289065
 ] 

Carl Steinbach commented on HIVE-2989:
--

Following up on (3) and (4) from above, I think the TableIdentifier structure 
that this patch adds is incomplete. Right now TableIdentifier is defined as

{noformat}
struct TableIdentifier {
  1: string dbName,
  2: string tableName   
}
{noformat}

The problem with this definition is that it only uniquely identifies a table if 
time is held constant. If you allow time to vary then it can refer to many 
different tables over the lifespan of the warehouse. One way to resolve this 
problem is to modify the definition as follows:

{noformat}
struct TableIdentifier {
  1: string dbName,
  2: string tableName,
  3: string owner,
  4: i32createTime
}
{noformat}


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289069#comment-13289069
 ] 

Carl Steinbach commented on HIVE-2989:
--

bq. It is not a bug. It is currently being implemented by Sambavi. She is 
handling Alter Link and Alter Table commands.

How is this not a bug? What is the expected behavior?

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289072#comment-13289072
 ] 

Bhushan Mandhani commented on HIVE-2989:


When a target table is dropped, all links pointing to it will automatically get 
dropped as well. That task is on my plate. I was planning to do that after 
querying. When someone comes along and creates a new target table, there is no 
link pointing to it. So there is no security vulnerability. I don't think we 
need to change struct TableIdentifier.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289074#comment-13289074
 ] 

Bhushan Mandhani commented on HIVE-2989:


Alter Table will be modified so that metadata changes made to the target table 
will propagate to the link table if appropriate. That is the expected behavior.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289083#comment-13289083
 ] 

Carl Steinbach commented on HIVE-2989:
--

bq. This particular patch contains Create, Desc and Drop. Sambavi is working on 
another patch that will do the two Alter commands.

The subject of this ticket is Adding Table Links to Hive. If your goal with 
this patch is only to add a small subset of the overall Table Link 
functionality then please update the ticket's subject and description to 
accurately reflect that. Otherwise we will run into a lot of problems since 
users will see the ticket's description in the release notes and assume that 
the feature is complete and ready for use (this has happened before, e.g. with 
indexes and authorization).

bq. When someone comes along and creates a new target table, there is no link 
pointing to it. So there is no security vulnerability.

But if we commit this patch on its own then there is a security vulnerability, 
right?

bq. When a target table is dropped, all links pointing to it will automatically 
get dropped as well.

That seems kind of dangerous. Perhaps users should be forced to use a command 
like DROP TABLE x CASCADE LINKS in cases like this?


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289085#comment-13289085
 ] 

Namit Jain commented on HIVE-2989:
--

@Carl, most of the development in hive so far has been done in a iterative 
manner.
There are so many examples of features that have been checked in multiple 
patches - indexes, views to name a few.

This patch is not breaking anything existing, but is not ready for final 
consumption. There will be a couple of follow-ups 
which will make this patch useful for everyone's consumption. 

Are you proposing a new policy that only complete features should be allowed to 
be checked-in in a single patch ?
That will slow the community down significantly. There will be multiple 
side-branches on which development will happen,
and it will be very difficult to get them back in trunk.

I don't think that has been the case for most of the development that has 
happened in the past.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289117#comment-13289117
 ] 

Edward Capriolo commented on HIVE-2989:
---

@Namit, What I think he is suggesting is that we should break out the undone 
tasks into separate linked tickets. In this way, someone does not assume that 
the entire feature is complete when this ticket is done.

This is mostly a semantic debate but I understand his position. We have done a 
better job then usual producing a wiki page with a design spec for table links. 
 

What tends to happen with hive and features is the 'iterative' style produces a 
final product not exactly aligned with our initial spec. T Since we deviate 
from the spec no one knows the status and when the feature is done. Then people 
move on in life and there is no one to answer a question on the feature. 

I fell comfortable that the FB crew will produce an awesome feature, but Carl 
is justified to suggest that if we not have at least the core tasks broken out 
into 3 or 4 jiras it might be too much In FB we trust. 


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289118#comment-13289118
 ] 

Edward Capriolo commented on HIVE-2989:
---

Last sentence did not make much sense. Lets break this out into multiple 
issues

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289176#comment-13289176
 ] 

Namit Jain commented on HIVE-2989:
--

@Carl/@Edward, I agree completely. Let us file the entire list of follow-up 
jiras.
That helps us much better in parallelizing the efforts also.

@Bhushan, can you please file follow-up jiras ? We can add the to the wiki also.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-03 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288340#comment-13288340
 ] 

Bhushan Mandhani commented on HIVE-2989:


@Carl It is in https://reviews.facebook.net/D3405. Thanks.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287616#comment-13287616
 ] 

Namit Jain commented on HIVE-2989:
--

+1

Addressed all the comments on the wiki, and the review comments have also been 
addressed.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287643#comment-13287643
 ] 

Carl Steinbach commented on HIVE-2989:
--

-1. This patch was committed two minutes after it was marked patch available 
which is unfair to the other committers. Also, there is still an ongoing 
discussion regarding the design proposal.

Please back this patch out.


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287646#comment-13287646
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Namit: I filed HIVE-3079 and assigned the ticket to you. Please revert this 
patch.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287650#comment-13287650
 ] 

Namit Jain commented on HIVE-2989:
--

@Carl, the patch was available for a long time. 
Bhushan, forgot to submit patch about it. 
We have addressed all your concerns in the wiki, and have very actively 
responded to all the comments.
We will revert the patch, and make it patch available for now.

We need it soon - so, please try to review asap. 

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287662#comment-13287662
 ] 

Bhushan Mandhani commented on HIVE-2989:


Please review promptly.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287668#comment-13287668
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Namit: Please +1 HIVE-3079. I will handle committing it. Thanks.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287682#comment-13287682
 ] 

Edward Capriolo commented on HIVE-2989:
---

Also this brings to light a rather unfair issue that we have no system for 
reviewing stuff patch_available some stuff sits patch_available and interviewed 
for months. 

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287683#comment-13287683
 ] 

Edward Capriolo commented on HIVE-2989:
---

*and un reviewed for months.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-06-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287845#comment-13287845
 ] 

Hudson commented on HIVE-2989:
--

Integrated in Hive-trunk-h0.21 #1462 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1462/])
HIVE-2989 Adding Table Links to Hive
(Bhushan Mandhani via namit) (Revision 1345318)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1345318
Files : 
* /hive/trunk/metastore/if/hive_metastore.thrift
* /hive/trunk/metastore/scripts/upgrade/mysql/010-HIVE-2989.mysql.sql
* /hive/trunk/metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql
* /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.10.0.oracle.sql
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TableIdentifier.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_types.php
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/TableType.java
* 
/hive/trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MTable.java
* /hive/trunk/metastore/src/model/package.jdo
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLinkDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java
* /hive/trunk/ql/src/test/queries/clientnegative/create_table_failure5.q
* /hive/trunk/ql/src/test/queries/clientnegative/create_tablelink_failure1.q
* /hive/trunk/ql/src/test/queries/clientnegative/create_tablelink_failure2.q
* /hive/trunk/ql/src/test/queries/clientpositive/create_tablelink.q
* /hive/trunk/ql/src/test/results/clientnegative/create_table_failure5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_tablelink_failure1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_tablelink_failure2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/drop_table_failure2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/drop_view_failure1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_tablelink.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_view.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_view_partitioned.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
 HIVE-2989.6.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-31 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286751#comment-13286751
 ] 

Namit Jain commented on HIVE-2989:
--

Bhushan, can you refresh your patch ?
I am getting some merge conflicts.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-31 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286834#comment-13286834
 ] 

Namit Jain commented on HIVE-2989:
--


1. Can you create a arc diff ? It is working now.
2. Instead of changing 
metastore/scripts/upgrade/mysql/hive-schema-0.9.0.mysql.sql,
   you should create a new file: 
metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql.
   Links is not part of 0.9
3. Same for metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
4. you can revert ql/src/java/org/apache/hadoop/hive/ql/Driver.java
5. DDLTask.java: line 3040 etc.
   These errors should be caught at compile time - DDLSemanticAnalyzer.
   Same for drop link error.
6. 3612: Can you add more detailed comments here.
   The code/semantics for create table like and create table link should be 
same.
   All the SD/SERDE properties are copied.
   None of the table properties are copied.
   You are doing this anyway, write a more detailed comment, and make a common 
function 
   for create table link and create table like
7. DDLSemanticAnalyzer.java: 708 - the new parameter expectLink is never used.
   Why did you change the signature of analyzeDropTable ?
   I think you wanted to pass it to DropTableDesc - but that is not done right 
now.
8. 867: add the outputs also - incomplete right now
9. Can you add a new negative test, where you try to create a table with the 
name A@B ?



 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-31 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286933#comment-13286933
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Namit, Sambavi: I added more comments/questions to the design doc.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-31 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287099#comment-13287099
 ] 

Carl Steinbach commented on HIVE-2989:
--

@Sambavi: I responded to your comments on the wiki. Please take a look. Thanks.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
 HIVE-2989.3.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-28 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284600#comment-13284600
 ] 

Bhushan Mandhani commented on HIVE-2989:


Carl, I've included the MetaStore upgrade scripts now. Regarding your first 
comment, let's discuss that on the design wiki page to keep the discussion in 
one place.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.8.1
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Fix For: 0.9.0

 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-05-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277629#comment-13277629
 ] 

Carl Steinbach commented on HIVE-2989:
--

I'm not sure what you mean by database that is different from the one he is 
associated with. Can you please clarify? Also, what's the motivation for this 
feature? What problem are you trying to solve?

bq. This feature can be used to provide access control (if access to 
databasename.tablename in queries and use database X is turned off in 
conjunction).

It sounds like you're saying this can be used to circumvent the authorization 
system. Is that the goal?

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Fix For: 0.10.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-04-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264973#comment-13264973
 ] 

Namit Jain commented on HIVE-2989:
--

In case of views, the burden is on the user to add partitions on the view.
In general, you cannot assume a one-one mapping between the table partition and 
the view partition.
In case of links (not static), all future partitions of the view will 
automatically lead to adding
a partition for the table.

Moreover, the corresponding link partition gets dropped when the appropriate 
table partition gets dropped.

I agree, all these functionalities can be added into the view, but that 
approach might be more error prone,
and we are overloading a existing concept to a different one.


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Fix For: 0.10.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2012-04-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264581#comment-13264581
 ] 

Edward Capriolo commented on HIVE-2989:
---

It seems like much of this could be done with a VIEW.

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Fix For: 0.10.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira