[ 
https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553696#comment-16553696
 ] 

Sergey Shelukhin commented on HIVE-20109:
-----------------------------------------

I think the plan is to make this a breaking change for 4.0, should be ok for a 
major version - there will no longer be json or even stats storage as part of 
table parameters.
There will be an upgrade option to transfer the json object into the new 
fields; given that the consequence of not running the upgrade script is a 
one-time loss of accurate stats, this should be acceptable.
I'm looking at the code to see how easy it is to normalize table stats storage 
into a separate table, so that the TBLS and PARTITIONS are not even affected by 
stats changes (that is good for CachedStore).
Regardless, as is already the case with column stats, the basic stats state 
will be updated via a separate API from alter table to make it more explicit.


> get rid of COLUMN_STATS_ACCURATE
> --------------------------------
>
>                 Key: HIVE-20109
>                 URL: https://issues.apache.org/jira/browse/HIVE-20109
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>
> I don't know why anyone would come up with an idea of storing a set of 
> booleans in a database using JSON. This has caused various problems in the 
> past (text field limitations, perf issues when parsing a giant string; also 
> bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes 
> especially problematic and error prone because the code in Hive sets C_S_A in 
> random places with reckless abandon, whereas we want to change the state of 
> the stats in well defined places where txn semantics can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from 
> metastore itself to output committers, various stats tasks, commands like 
> truncate, etc.) via a pile of hacks, but the best solution would be to remove 
> it completely and replace with a DB table/columns in stats tables that would 
> need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to