[ 
https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=658569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658569
 ]

ASF GitHub Bot logged work on HIVE-25580:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Sep/21 19:19
            Start Date: 30/Sep/21 19:19
    Worklog Time Spent: 10m 
      Work Description: belugabehr commented on pull request #2692:
URL: https://github.com/apache/hive/pull/2692#issuecomment-931597115


   LGTM (pending tests)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 658569)
    Time Spent: 20m  (was: 10m)

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-25580
>                 URL: https://issues.apache.org/jira/browse/HIVE-25580
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>       query = pm.newQuery(MPartitionColumnStatistics.class);
>       query.setResult("DISTINCT engine");
>       Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to