[
https://issues.apache.org/jira/browse/SENTRY-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539090#comment-16539090
]
Na Li commented on SENTRY-2304:
-------------------------------
If customer has 1 million tables in a thought databases, the performance
difference will be quite big
* Option 1: query the tables in a database per task. This requires a thousand
queries.
* Option 2: query all tables in all databases. This requires a single queries
The performance for a single query with or without "WHERE clause" may be small,
but the difference of time spent on having a thousand queries versus a single
query is significant.
> Optimize the time taken for HMS snapshot creation.
> --------------------------------------------------
>
> Key: SENTRY-2304
> URL: https://issues.apache.org/jira/browse/SENTRY-2304
> Project: Sentry
> Issue Type: Sub-task
> Components: Sentry
> Affects Versions: 2.1.0
> Reporter: kalyan kumar kalvagadda
> Priority: Major
>
> First get all the database names and their locations.
> # Create Table Task for each database by providing the db name
> # DB Task will get the names and locations for all the tables in that
> database.
> # Create Partition task for each table in that database by providing the
> database and table names.
> ## Table task will get the locations of all the partitions in that table.
> This approach needs new API’s implemented in HMS to get the names and
> locations of the databases/tables/partitions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)