[ https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496799#comment-16496799 ]
Steve Yeom commented on HIVE-19416: ----------------------------------- The current single version stats has: 1. Definitions and Categories - Valid transactional stats: I.e., a conjunction of the three: ~ a committed transaction created the stats ~ COLUMN_STATE_ACCURATE(CSA) state is true ~ Isolation-level (snapshot) compliant - Two kinds of stats: table and column - COLUMN_STATS_ACCURATE(CSA) states for a table/partition: true or false. one for table, one per each column - Categories of clients: ~ Stats reader: ^ StatsOptimizer for aggregation query: transactional stats reader ^ The rest that uses stats for cost computation inputs: non-transactional stats reader ~ Stats updater: transactional stats updater 2. Transactional Stats Operations 2.1 Stats Update Update the single version stats, both table and column and save a table snapshot to UPD_TXNS. - A client requests an update with stats and a table snapshot [1]. - creates a TBLS/PARTITIONS row adding a row into UPD_TXNS row with table write snapshot. ~ Updates "table stats" by updading TABLE_PARAMS/PARTITION_PARAMS - Updates "column stats" by updating TAB_COL_STATS/PART_COL_STATS - commit/abort ~ abortTcn() deletes the UPD_TXN row for the transaction. Note: now stats reader determines the state of the transactional stats' updater transaction by checking TXNS for open state, and checking existence of a row in UPD_TXNS for committed/aborted. 2.2 Stats Read StatsOptimizer determines validity of the MetaStore transactional stats to use stats for an aggregation query. 2.2.1 Table stats The reader gets a TBLS/PARTITIONS row that includes table stats. Then check the validity of the table stats. - A client comes in with its request that includes the client's table snapshot. - Reads a row from TBLS/PARTITIONS. - Check if the CSA for table stats is true. If not, return after setting CSA. - Check if stats' update transaction is committed: check if a row exists from UPD_TXNS for the TXN_ID from TBLS/PARTITIONS. If not, invalid. - compare the current stats' table snapshot with the client's table snapshot - if the table snapshots are equal in commits, table stats are valid. 2.2.2 Column stats The reader gets a row from TAB_COL_STATS/PART_COL_STATS. The same steps as table stats. 3. Current/Possible invariants 3.1 Current - Metastore TBLS/PARTITIONS keeps CSA updated for committed stats for both table and columns. 3.2 Possible - Metastore keeps one committed stats for both table and columns. Notes: [1]: transaction id and a valid writeId list for the table. > Create single version transactional table metastore statistics for > aggregation queries > -------------------------------------------------------------------------------------- > > Key: HIVE-19416 > URL: https://issues.apache.org/jira/browse/HIVE-19416 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Steve Yeom > Assignee: Steve Yeom > Priority: Major > > The system should use only statistics for aggregation queries like count on > transactional tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)