[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706597#comment-13706597 ] Edward Capriolo commented on HIVE-2989: --- Did we ditch this idea? should we close up shop? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706598#comment-13706598 ] Bhushan Mandhani commented on HIVE-2989: Hi, Bhushan Mandhani is no longer at Facebook so this email address is no longer being monitored. If you need assistance, please contact another person who is currently at the company. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293841#comment-13293841 ] Namit Jain commented on HIVE-2989: -- @Carl, We haven't seen any comments from you. All the follow-up jiras have been filed. Let me know if you have any comments on this. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293931#comment-13293931 ] Carl Steinbach commented on HIVE-2989: -- I'm working on some comments. Will post them later today after the Hive meetup is over. Thanks. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293200#comment-13293200 ] Carl Steinbach commented on HIVE-2989: -- @Bhushan: I think HIVE-3114 (Split Thrift interface for TableLink creation) should be done in this patch instead of splitting it out into a followup ticket. Here's what I said in HIVE-3114: bq. I'm concerned that the Metastore Thrift interface is one of Hive's de facto public APIs, and any new functionality that appears in a release will need to be supported going forward. Why not just fix this in HIVE-2989 and eliminate the possibility that we're going to get stuck with an interface that we already know is broken? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293201#comment-13293201 ] Carl Steinbach commented on HIVE-2989: -- @Bhushan: I'll look over the rest of patch later tonight. Thanks. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290792#comment-13290792 ] Carl Steinbach commented on HIVE-2989: -- bq. The semantics of rename table and drop table can be similar. Both can fail or drop/rename the link depending on cascade etc. This can be done in a follow-up. I'm fine with doing this in a followup as long as the semantics and expected behavior are described in the design doc. Right now it doesn't look like they're defined anywhere. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289815#comment-13289815 ] Carl Steinbach commented on HIVE-2989: -- @Bhushan: I added more code review comments here https://reviews.facebook.net/D3405 Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289887#comment-13289887 ] Carl Steinbach commented on HIVE-2989: -- @Bhushan/Namit: Any thoughts on this? {quote} I think it would be a good idea to require the user to name the table link (using a valid SQL identifier) instead of tying the name of the link to the table/db it points to. Table links (like views) can provide a useful level of indirection, but we lose that if the name of link has to map directly to the target table. For example, suppose table t1 exists in the default db, and the table link for t1 is created in database db1. Suppose at some later point t1 is moved to a different db, or that the name of t1 is changed. With the current implementation we would then also have to change the name of the link, which would require us to also change any applications or scripts that refer to this link. With the other approach we would only need to ALTER the details of the link but would be able leave the name of link unchanged. {quote} I'll add to this that using the tab_name@db_name syntax also requires us to weaken the grammar and add lots of special-case logic to the methods that validate table names. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289890#comment-13289890 ] Sambavi Muthukrishnan commented on HIVE-2989: - Is there a command to move a table to another db? I don't see any command that lets you do that directly. Given this, if you are making a copy of the table, shouldn't the links ideally be set up all over again? Application code will need to change if that happens, but how likely is this sort of rename? While I agree that using names without the @ syntax is nicer, I think having the @ syntax guarantees a nice naming convention to know what the table is pointing to instead of an opaque name from the users perspective. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289891#comment-13289891 ] Bhushan Mandhani commented on HIVE-2989: Carl, we discussed this among us and strongly believe the user should be able to refer to the target table in his queries without having to think about What do I call this table? Briefly, we were even considering the conventional X.Y access but that had other issues and we wanted to disable that syntax completely in our system. I have made the changes you wanted for name validation in MetaStoreUtils and there is minimal special case logic there. You can look at it when I update the diff. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289894#comment-13289894 ] Carl Steinbach commented on HIVE-2989: -- bq. we discussed this among us and strongly believe the user should be able to refer to the target table in his queries without having to think about What do I call this table? If coming up with names for table links is a huge burden then it should be easy to provide a default name (e.g. 'tabname_at_dbname'). bq. Is there a command to move a table to another db? I don't see any command that lets you do that directly. Given this, if you are making a copy of the table, shouldn't the links ideally be set up all over again? Application code will need to change if that happens, but how likely is this sort of rename? No, there's no command that lets you do that today, but changing the name of a table is supported (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable), and the same issues apply there. What's the expected behavior if I change the name of a target table? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289897#comment-13289897 ] Bhushan Mandhani commented on HIVE-2989: One criteria was that the user should not be able to create a Managed Table with a name that looks like the name of a Table Link. If User A creates such a table with name X_at_Y, and user B comes along and tries to create a link to X@Y the command will fail. Worse, user B could see the table X_at_Y exists and query it assuming it is a link to the table X he is trying to query. Also, the X@Y syntax is already being used by Oracle for its Database Links. So we decided to use the same syntax. When you change the name of a target table, the name of the Links should be updated to reflect that. Sambavi will do that in the Alter Table patch. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289955#comment-13289955 ] Namit Jain commented on HIVE-2989: -- The semantics of rename table and drop table can be similar. Both can fail or drop/rename the link depending on cascade etc. This can be done in a follow-up. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289002#comment-13289002 ] Carl Steinbach commented on HIVE-2989: -- Is this behavior expected with the current version of the patch? {noformat} hive show tables; show tables; OK target@default Time taken: 0.076 seconds hive SELECT * FROM target@default; SELECT * FROM target@default; FAILED: ParseException line 1:20 mismatched input '@' expecting EOF near 'target' hive select target; select target; FAILED: ParseException line 1:7 mismatched input 'EOF' expecting FROM near 'target' in from clause hive SELECT * FROM `target@default`; SELECT * FROM `target@default`; FAILED: ParseException line 1:21 mismatched character '@' expecting '`' line 1:30 required (...)+ loop did not match anything at character 'EOF' {noformat} Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289040#comment-13289040 ] Carl Steinbach commented on HIVE-2989: -- # I think it would be a good idea to require the user to name the table link (using a valid SQL identifier) instead of tying the name of the link to the table/db it points to. Table links (like views) can provide a useful level of indirection, but we lose that if the name of link has to map directly to the target table. For example, suppose table t1 exists in the default db, and the table link for t1 is created in database db1. Suppose at some later point t1 is moved to a different db, or that the name of t1 is changed. With the current implementation we would then also have to change the name of the link, which would require us to also change any applications or scripts that refer to this link. With the other approach we would only need to ALTER the details of the link but would be able leave the name of link unchanged. # Given a table t, is there a way to print out the list of table links that point to t? # What is the expected behavior if I create a link to target@default and then drop target? ## There are a bunch of variations on this same question, e.g. what happens if I change the name of the target table, or if I add columns to the target table, etc. There should be test coverage for each one of these cases. # What is the expected behavior if a user creates a link to target@default, and then target is subsequently dropped and then created again by another user? Should the link continue to work? If so, doesn't this create a potential security vulnerability? # It looks likes there's currently no difference in the way DYNAMIC and STATIC table links behave. If that's true then I think we should remove this from the grammar and metadata in order to avoid confusion. # DESCRIBE FORMATTED needs to be updated to print the linkTarget and linkTables fields. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289049#comment-13289049 ] Carl Steinbach commented on HIVE-2989: -- This looks like a bug: {noformat} hive CREATE TABLE target (x INT); hive DESCRIBE FORMATTED target; Location: file:/user/hive/warehouse/target hive use tmpdb; hive CREATE TABLELINK TO target@default; hive DESCRIBE FORMATTED target@default; Location: file:/user/hive/warehouse/target hive use default; hive ALTER TABLE target SET LOCATION 'file:/BOGUS_PATH'; hive DESCRIBE FORMATTED target; Location: file:/BOGUS_PATH hive use tmpdb; hive DESCRIBE FORMATTED target@default; Location: file:/user/hive/warehouse/target {noformat} Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289056#comment-13289056 ] Bhushan Mandhani commented on HIVE-2989: It is not a bug. It is currently being implemented by Sambavi. She is handling Alter Link and Alter Table commands. This particular patch contains Create, Desc and Drop. Sambavi is working on another patch that will do the two Alter commands. I will do querying in a follow-up patch of my own. There is a lot of functionality here that needs to be built out. We are planning to do this in multiple patches. We would like to get this one in first because everything else will build on top of this. We have thought about all these questions you are bringing up and created the corresponding tasks. I'll update the design doc with this info. I and Sambavi will create the corresponding Jiras as well. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289065#comment-13289065 ] Carl Steinbach commented on HIVE-2989: -- Following up on (3) and (4) from above, I think the TableIdentifier structure that this patch adds is incomplete. Right now TableIdentifier is defined as {noformat} struct TableIdentifier { 1: string dbName, 2: string tableName } {noformat} The problem with this definition is that it only uniquely identifies a table if time is held constant. If you allow time to vary then it can refer to many different tables over the lifespan of the warehouse. One way to resolve this problem is to modify the definition as follows: {noformat} struct TableIdentifier { 1: string dbName, 2: string tableName, 3: string owner, 4: i32createTime } {noformat} Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289069#comment-13289069 ] Carl Steinbach commented on HIVE-2989: -- bq. It is not a bug. It is currently being implemented by Sambavi. She is handling Alter Link and Alter Table commands. How is this not a bug? What is the expected behavior? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289072#comment-13289072 ] Bhushan Mandhani commented on HIVE-2989: When a target table is dropped, all links pointing to it will automatically get dropped as well. That task is on my plate. I was planning to do that after querying. When someone comes along and creates a new target table, there is no link pointing to it. So there is no security vulnerability. I don't think we need to change struct TableIdentifier. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289074#comment-13289074 ] Bhushan Mandhani commented on HIVE-2989: Alter Table will be modified so that metadata changes made to the target table will propagate to the link table if appropriate. That is the expected behavior. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289083#comment-13289083 ] Carl Steinbach commented on HIVE-2989: -- bq. This particular patch contains Create, Desc and Drop. Sambavi is working on another patch that will do the two Alter commands. The subject of this ticket is Adding Table Links to Hive. If your goal with this patch is only to add a small subset of the overall Table Link functionality then please update the ticket's subject and description to accurately reflect that. Otherwise we will run into a lot of problems since users will see the ticket's description in the release notes and assume that the feature is complete and ready for use (this has happened before, e.g. with indexes and authorization). bq. When someone comes along and creates a new target table, there is no link pointing to it. So there is no security vulnerability. But if we commit this patch on its own then there is a security vulnerability, right? bq. When a target table is dropped, all links pointing to it will automatically get dropped as well. That seems kind of dangerous. Perhaps users should be forced to use a command like DROP TABLE x CASCADE LINKS in cases like this? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289085#comment-13289085 ] Namit Jain commented on HIVE-2989: -- @Carl, most of the development in hive so far has been done in a iterative manner. There are so many examples of features that have been checked in multiple patches - indexes, views to name a few. This patch is not breaking anything existing, but is not ready for final consumption. There will be a couple of follow-ups which will make this patch useful for everyone's consumption. Are you proposing a new policy that only complete features should be allowed to be checked-in in a single patch ? That will slow the community down significantly. There will be multiple side-branches on which development will happen, and it will be very difficult to get them back in trunk. I don't think that has been the case for most of the development that has happened in the past. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289117#comment-13289117 ] Edward Capriolo commented on HIVE-2989: --- @Namit, What I think he is suggesting is that we should break out the undone tasks into separate linked tickets. In this way, someone does not assume that the entire feature is complete when this ticket is done. This is mostly a semantic debate but I understand his position. We have done a better job then usual producing a wiki page with a design spec for table links. What tends to happen with hive and features is the 'iterative' style produces a final product not exactly aligned with our initial spec. T Since we deviate from the spec no one knows the status and when the feature is done. Then people move on in life and there is no one to answer a question on the feature. I fell comfortable that the FB crew will produce an awesome feature, but Carl is justified to suggest that if we not have at least the core tasks broken out into 3 or 4 jiras it might be too much In FB we trust. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289118#comment-13289118 ] Edward Capriolo commented on HIVE-2989: --- Last sentence did not make much sense. Lets break this out into multiple issues Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289176#comment-13289176 ] Namit Jain commented on HIVE-2989: -- @Carl/@Edward, I agree completely. Let us file the entire list of follow-up jiras. That helps us much better in parallelizing the efforts also. @Bhushan, can you please file follow-up jiras ? We can add the to the wiki also. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288340#comment-13288340 ] Bhushan Mandhani commented on HIVE-2989: @Carl It is in https://reviews.facebook.net/D3405. Thanks. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287616#comment-13287616 ] Namit Jain commented on HIVE-2989: -- +1 Addressed all the comments on the wiki, and the review comments have also been addressed. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287643#comment-13287643 ] Carl Steinbach commented on HIVE-2989: -- -1. This patch was committed two minutes after it was marked patch available which is unfair to the other committers. Also, there is still an ongoing discussion regarding the design proposal. Please back this patch out. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287646#comment-13287646 ] Carl Steinbach commented on HIVE-2989: -- @Namit: I filed HIVE-3079 and assigned the ticket to you. Please revert this patch. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287650#comment-13287650 ] Namit Jain commented on HIVE-2989: -- @Carl, the patch was available for a long time. Bhushan, forgot to submit patch about it. We have addressed all your concerns in the wiki, and have very actively responded to all the comments. We will revert the patch, and make it patch available for now. We need it soon - so, please try to review asap. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287662#comment-13287662 ] Bhushan Mandhani commented on HIVE-2989: Please review promptly. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287668#comment-13287668 ] Carl Steinbach commented on HIVE-2989: -- @Namit: Please +1 HIVE-3079. I will handle committing it. Thanks. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287682#comment-13287682 ] Edward Capriolo commented on HIVE-2989: --- Also this brings to light a rather unfair issue that we have no system for reviewing stuff patch_available some stuff sits patch_available and interviewed for months. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287683#comment-13287683 ] Edward Capriolo commented on HIVE-2989: --- *and un reviewed for months. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287845#comment-13287845 ] Hudson commented on HIVE-2989: -- Integrated in Hive-trunk-h0.21 #1462 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1462/]) HIVE-2989 Adding Table Links to Hive (Bhushan Mandhani via namit) (Revision 1345318) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1345318 Files : * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/scripts/upgrade/mysql/010-HIVE-2989.mysql.sql * /hive/trunk/metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql * /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.10.0.oracle.sql * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TableIdentifier.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_types.php * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py * /hive/trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/TableType.java * /hive/trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MTable.java * /hive/trunk/metastore/src/model/package.jdo * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLinkDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java * /hive/trunk/ql/src/test/queries/clientnegative/create_table_failure5.q * /hive/trunk/ql/src/test/queries/clientnegative/create_tablelink_failure1.q * /hive/trunk/ql/src/test/queries/clientnegative/create_tablelink_failure2.q * /hive/trunk/ql/src/test/queries/clientpositive/create_tablelink.q * /hive/trunk/ql/src/test/results/clientnegative/create_table_failure5.q.out * /hive/trunk/ql/src/test/results/clientnegative/create_tablelink_failure1.q.out * /hive/trunk/ql/src/test/results/clientnegative/create_tablelink_failure2.q.out * /hive/trunk/ql/src/test/results/clientnegative/drop_table_failure2.q.out * /hive/trunk/ql/src/test/results/clientnegative/drop_view_failure1.q.out * /hive/trunk/ql/src/test/results/clientpositive/create_tablelink.q.out * /hive/trunk/ql/src/test/results/clientpositive/create_view.q.out * /hive/trunk/ql/src/test/results/clientpositive/create_view_partitioned.q.out * /hive/trunk/ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt Original Estimate: 672h Remaining Estimate: 672h
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286751#comment-13286751 ] Namit Jain commented on HIVE-2989: -- Bhushan, can you refresh your patch ? I am getting some merge conflicts. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286834#comment-13286834 ] Namit Jain commented on HIVE-2989: -- 1. Can you create a arc diff ? It is working now. 2. Instead of changing metastore/scripts/upgrade/mysql/hive-schema-0.9.0.mysql.sql, you should create a new file: metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql. Links is not part of 0.9 3. Same for metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql 4. you can revert ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5. DDLTask.java: line 3040 etc. These errors should be caught at compile time - DDLSemanticAnalyzer. Same for drop link error. 6. 3612: Can you add more detailed comments here. The code/semantics for create table like and create table link should be same. All the SD/SERDE properties are copied. None of the table properties are copied. You are doing this anyway, write a more detailed comment, and make a common function for create table link and create table like 7. DDLSemanticAnalyzer.java: 708 - the new parameter expectLink is never used. Why did you change the signature of analyzeDropTable ? I think you wanted to pass it to DropTableDesc - but that is not done right now. 8. 867: add the outputs also - incomplete right now 9. Can you add a new negative test, where you try to create a table with the name A@B ? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286933#comment-13286933 ] Carl Steinbach commented on HIVE-2989: -- @Namit, Sambavi: I added more comments/questions to the design doc. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287099#comment-13287099 ] Carl Steinbach commented on HIVE-2989: -- @Sambavi: I responded to your comments on the wiki. Please take a look. Thanks. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284600#comment-13284600 ] Bhushan Mandhani commented on HIVE-2989: Carl, I've included the MetaStore upgrade scripts now. Regarding your first comment, let's discuss that on the design wiki page to keep the discussion in one place. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.8.1 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Fix For: 0.9.0 Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277629#comment-13277629 ] Carl Steinbach commented on HIVE-2989: -- I'm not sure what you mean by database that is different from the one he is associated with. Can you please clarify? Also, what's the motivation for this feature? What problem are you trying to solve? bq. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). It sounds like you're saying this can be used to circumvent the authorization system. Is that the goal? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Fix For: 0.10.0 Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264973#comment-13264973 ] Namit Jain commented on HIVE-2989: -- In case of views, the burden is on the user to add partitions on the view. In general, you cannot assume a one-one mapping between the table partition and the view partition. In case of links (not static), all future partitions of the view will automatically lead to adding a partition for the table. Moreover, the corresponding link partition gets dropped when the appropriate table partition gets dropped. I agree, all these functionalities can be added into the view, but that approach might be more error prone, and we are overloading a existing concept to a different one. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Fix For: 0.10.0 Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264581#comment-13264581 ] Edward Capriolo commented on HIVE-2989: --- It seems like much of this could be done with a VIEW. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Fix For: 0.10.0 Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira