> On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650
> > <https://reviews.apache.org/r/55731/diff/3/?file=1613301#file1613301line3650>
> >
> >     Thanks Chaoyu for the patch and comments. I still think we do not need 
> > the function hasStatsInParameters. We only care about row_count and 
> > raw_data_size, especially row_count. We use row_count to answer some query 
> > directly from metastore. The other parameters, totalSize etc do not matter. 
> > The other part of the patch looks good to me. Thanks.
> 
> Chaoyu Tang wrote:
>     Thanks, PengCheng. In current code, command "alter table .. set 
> tblproperty" can also set the stats including row_count, raw_data_size, 
> without setting the flag STATS_GENERATED to USER  like it does for "alter 
> table .. update statistics" in DDLSemanticAnalyzer. So in this case, we need 
> a way in DDLTask alterTableOrSinglePartition to determine whether this 
> property change is related to the stats. I used hasStatsInParameters, though 
> it also include the change of numFiles & totalSize into consideration.
>     I quite do not understand, if we only care about row_count/raw_data_size, 
> why do we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of 
> cases?
>     Also I want to confirm that currently COLUMN_STATS_ACCURATE and 
> BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size? 
> If so, is there any use of numFiles/totalSize?

Thanks Chaoyu. You raised very good questions. (1) Although "alter table .. set 
tblproperty" can also set the stats including row_count, in current code, it 
will remove basic stats = true anyway. Thus it is safe. Now, 
"DO_NOT_UPDATE_STATS" is introduced in the patch. I think we can move (not 
remove) the function that you introduced to rewrite L1364 to L1392 in 
DDLSemanticAnalyzer. That is, if a user is going to change row_count, 
raw_data_size, no matter where it comes from, set tblproperty or update 
statistics, we will set it to be coming from the USER. Later, we can check it 
in alterTableOrSinglePartition. What do you think about this idea? (2) 
totalSize etc are only useful when row_count/raw_data_size are not available. 
For more details, you can refer to L156-L173. And, we only use row_count in 
StatsOptimizer. That is why "COLUMN_STATS_ACCURATE and BASIC_STATS only reflect 
the stats accuracy of row_count and raw_data_size" (actually, it should be only 
row_count because we only use 
 row_count in StatsOptimizer). Thanks!


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162771
-----------------------------------------------------------


On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 5:01 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change 
> column type etc (besides the set table properties), the table stats status 
> should not change. But for some other operations like update statistics, 
> change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>

Reply via email to