[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2019-01-17 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745561#comment-16745561
 ] 

Vihang Karajgaonkar commented on HIVE-20198:


Thanks [~ekoifman] very interesting to know that.

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2019-01-17 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745538#comment-16745538
 ] 

Eugene Koifman commented on HIVE-20198:
---

FYI, {{TBLS.TBL_ID}} is exposed via Thrift since HIVE-20556.

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2019-01-15 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743514#comment-16743514
 ] 

Vihang Karajgaonkar commented on HIVE-20198:


This somehow slipped through the cracks. I will assign it myself and see if I 
can take a first stab at it.

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2018-07-19 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549010#comment-16549010
 ] 

Peter Vary commented on HIVE-20198:
---

+1 to the general idea.

Some thoughts (all of them just general ideas, and stating here only to start 
the discussion)
 * I would vote for a separate new column signaling that the table is already 
soft deleted.
 ** I think that is the most straight-forward, method handling this type of 
situations. Helps to have a clean, easily readable code.
 ** We need several queries where we have to filter out the already deleted 
tables. The backend DB can do this for us better if there is a separate column.
 ** We would like to revise the indexes on the TBLS table, so the above queries 
can have faster results
 * Not sure about the TBLS.TBL_ID - If we want to follow up on [~alangates]' 
federated MetaStore idea, where separate catalogs can be served by different 
MetaStore instances we might end up with TBL_ID collisions. I would prefer to 
keep open this direction if we want to move there. Also when 
replicating/recovering the tables we might end up with different TBL_IDs.
 * Constant time table drops, and renames should work for external tables too, 
since we "by definition" do not manage the filesystem for them (only create 
directories when necessary)
 * I would love to see asynchronous metadata drop for managed tables too. We 
can mark the metadata to be dropped (and later removed asynchronously), and 
remove the directories synchronously.
 ** This would work in case when impersonation is enabled
 ** This would work for every existing managed tables too
 ** During my previous tests I have seen that the lion share of the time spent 
on dropping tables is the metadata changes. Removing files from HDFS is 
relatively cheap. In case of S3 or ADLS the user can still chose to "opt-in" 
for the new table handling system.
 * How to signal that this table is "rename" optimized:
 ** Currently we have TableType - Managed / External
 ** Currently we have properties - ACID, etc
 ** I think we should decide which is a way to go, and at least with the new 
types do accordingly
 * Renaming tables
 ** Currently we change only this for partitions:
{code:java}
String newPath = oldUri.getPath().replace(oldTblLocPath, newTblLocPath);
Path newPartLocPath = new Path(oldUri.getScheme(), oldUri.getAuthority(), 
newPath);
{code}

 ** It might be worth to consider some DirectSQL change for this - might be 
challenging to find a method that works on all supported databases, but we 
might get very fast results - not O(1), but fast anyway :)

Thanks for reading through this long post!

Peter

 

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using

[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2018-07-18 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548724#comment-16548724
 ] 

Vihang Karajgaonkar commented on HIVE-20198:


{{TBLS.TBL_ID}} is internally managed by datanucleus and not exposed to the 
Thrift Table object. Its value is dependent on the backing database according 
to datanucleus documentation (although I have almost always seen it as a 
monotonically increasing number). Its value is guaranteed to be unique so I 
think we can potentially use it. But in my opinion there is still value to 
expose such a id which is controlled by metastore at the thrift level. For 
instance such ids can be used to identify versions of the objects to provide 
optimistic concurrency control model instead of the lock based concurrency 
model we have currently. A table object which is altered can have a different 
uuid but not different currently the TBL_ID does not change.

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2018-07-18 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548628#comment-16548628
 ] 

Eugene Koifman commented on HIVE-20198:
---

could TBLS.TBL_ID be used as this ID?

Not strictly related, but it would be nice if Table object contained this 
TBL_ID as well.

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20198) Constant time table drops/renames

2018-07-18 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548105#comment-16548105
 ] 

Vihang Karajgaonkar commented on HIVE-20198:


+1 to the idea. In case of managed table when impersonation is turned off the 
owner of the hdfs location for the tables and its partitions is hive. I think 
it makes sense to let Hive manage the table location for a managed_table. 
Having a UUID based table location can provide lot of performance advantages in 
terms of async drops and renames.

In case of external tables, the hdfs data is anyways not deleted currently. So 
this approach can be used in external tables as well. A drop command on an 
external table should do a quick metadata operation which marks the table as 
deleted (may be by prefixing the table name with a special string + its uuid).

Surprisingly, in case of table renames on external tables, I see that HMS does 
not update the table in the partitions for that table (which seems wrong to me).

> Constant time table drops/renames
> -
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)