[ 
https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812396#comment-15812396
 ] 

Chaoyu Tang commented on HIVE-15530:
------------------------------------

[~Yibing] The patch looks good. However, I have a small question about this:
{code}
  static boolean columnsIncluded(List<FieldSchema> oldCols, List<FieldSchema> 
newCols) {
    if (oldCols.size() > newCols.size()) {
      return false;
    } else if (oldCols.size() == newCols.size()){
      return areSameColumns(oldCols, newCols);
    } else {
      return areSameColumns(oldCols, newCols.subList(0, oldCols.size()));
    }
  }
{code}
For the alter table only changing the column name or/and position in a table, 
the oldCols.size() equals to newCols.size(), but areSameColumns(oldCols, 
newCols) might return false, in this case, should we still update the the 
column statistics?

> Optimize the column stats update logic in table alteration
> ----------------------------------------------------------
>
>                 Key: HIVE-15530
>                 URL: https://issues.apache.org/jira/browse/HIVE-15530
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Yibing Shi
>            Assignee: Yibing Shi
>         Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, 
> HIVE-15530.3.patch, HIVE-15530.4.patch
>
>
> Currently when a table is altered, if any of below conditions is true, HMS 
> would try to update column statistics for the table:
> # database name is changed
> # table name is changed
> # old columns and new columns are not the same
> As a result, when a column is added to a table, Hive also tries to update 
> column statistics, which is not necessary. We can loose the last condition by 
> checking whether all existing columns are changed or not. If not, we don't 
> have to update stats info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to