[ https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin resolved HIVE-19416. ------------------------------------- Resolution: Fixed Fix Version/s: 4.0.0 > Create single version transactional table metastore statistics for > aggregation queries > -------------------------------------------------------------------------------------- > > Key: HIVE-19416 > URL: https://issues.apache.org/jira/browse/HIVE-19416 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Steve Yeom > Assignee: Steve Yeom > Priority: Major > Fix For: 4.0.0 > > > This adds accurate stats support to Hive transactional (insert-only and full > ACID) tables, so that some queries from these tables can be answered from > stats and also so that the stats could be used for more optimization. This > support can be enabled via a config flag, and is on by default. > This is achieved via the following changes, that basically start us on the > path of treating ACID stats the same way we treat ACID data: > In addition to existing JSON blob, we store a write ID of the latest stats > writer with each table and partition. Any writer updating the stats or > altering the stats state for a txn table has to record his write ID - if the > write ID and some other context is not provided by the caller of alter, the > alter table/partition operation fails. It's the responsibility of the writer > to not commit its transaction if the operation fails. > In future, we'd like to move the stats state into actual stats tables, but > for now it's (logically) colocated with the existing json parameter. > In addition to its write ID, most callers (with the exception of the ones > that cannot have races, e.g. create table) provide their own txn state (write > ID list) for the table. The existing stats' write ID is verified against this > state. If the write ID is not visible to the updater, we still update the > stats, but set the stats state to invalid; that basically means that two > parallel operations that cannot see each others' data output are updating the > stats. > This way, txn stats stay valid as the result of a sequence of non-conflicting > updates that can all see each other and account for each others' data for the > stats. Any parallel updates invalidate the stats. > This is necessary because unlike data, stats are a single version summary of > the table. To be able to support parallel operations with valid stats, each > stats update would need to write a separate record, and would also need to > write mergeable records that only reflect its own changes, instead of the > final view of the table stats (two of those are hard or impossible to merge). > This approach resulted in a few changes to alter/etc APIs in metastore; it > also requires that many alter operations, as well as analyze table, allocate > a write ID (because they affect stats-that-are-treated-like-data, and so are > essentially a write operation). > The reader, in turn, verifies that the stats are valid and written by a write > ID that is itself valid given the reader's transactional state (i.e. not > aborted, nor in progress). This is done on metastore side; if the stats are > invalid for the reader, we transparently update the stats state returned to > the caller to mark the stats as inaccurate. > We've considered (and actually implemented) an alternative approach of > recording the full txn state of the stats writer to be compared with the > state of the stats reader (to see if they are compatible and avoid the extra > write IDs and strict write-time checks), however it results in problems with > partitioned tables, where not all writes affect all partitions, and so the > stats state of all the untouched partitions becomes invalid once a subset of > partitions is updated (because we cannot tell whether the write ID, a table > level operation, didn't touch the partition, or did touch it but didn't > record the stats). Additionally, storing full txn state for every partition > and table can be expensive, especially in extreme cases where the watermark > doesn't advance for a while for some reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)