> On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650 > > <https://reviews.apache.org/r/55731/diff/3/?file=1613301#file1613301line3650> > > > > Thanks Chaoyu for the patch and comments. I still think we do not need > > the function hasStatsInParameters. We only care about row_count and > > raw_data_size, especially row_count. We use row_count to answer some query > > directly from metastore. The other parameters, totalSize etc do not matter. > > The other part of the patch looks good to me. Thanks. > > Chaoyu Tang wrote: > Thanks, PengCheng. In current code, command "alter table .. set > tblproperty" can also set the stats including row_count, raw_data_size, > without setting the flag STATS_GENERATED to USER like it does for "alter > table .. update statistics" in DDLSemanticAnalyzer. So in this case, we need > a way in DDLTask alterTableOrSinglePartition to determine whether this > property change is related to the stats. I used hasStatsInParameters, though > it also include the change of numFiles & totalSize into consideration. > I quite do not understand, if we only care about row_count/raw_data_size, > why do we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of > cases? > Also I want to confirm that currently COLUMN_STATS_ACCURATE and > BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size? > If so, is there any use of numFiles/totalSize?
Thanks Chaoyu. You raised very good questions. (1) Although "alter table .. set tblproperty" can also set the stats including row_count, in current code, it will remove basic stats = true anyway. Thus it is safe. Now, "DO_NOT_UPDATE_STATS" is introduced in the patch. I think we can move (not remove) the function that you introduced to rewrite L1364 to L1392 in DDLSemanticAnalyzer. That is, if a user is going to change row_count, raw_data_size, no matter where it comes from, set tblproperty or update statistics, we will set it to be coming from the USER. Later, we can check it in alterTableOrSinglePartition. What do you think about this idea? (2) totalSize etc are only useful when row_count/raw_data_size are not available. For more details, you can refer to L156-L173. And, we only use row_count in StatsOptimizer. That is why "COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size" (actually, it should be only row_count because we only use row_count in StatsOptimizer). Thanks! - pengcheng ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55731/#review162771 ----------------------------------------------------------- On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55731/ > ----------------------------------------------------------- > > (Updated Jan. 24, 2017, 5:01 a.m.) > > > Review request for hive and pengcheng xiong. > > > Bugs: HIVE-15653 > https://issues.apache.org/jira/browse/HIVE-15653 > > > Repository: hive-git > > > Description > ------- > > For most of alter table operations like table rename, add columns, change > column type etc (besides the set table properties), the table stats status > should not change. But for some other operations like update statistics, > change location, the basic stats status should change. > > > Diffs > ----- > > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java > 4aea152 > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 0f472e7 > ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION > ql/src/test/results/clientpositive/alter_table_stats_status.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/55731/diff/ > > > Testing > ------- > > 1. Manual tests > 2. new unit tests > > > Thanks, > > Chaoyu Tang > >
