[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989269#comment-15989269
 ] 

Chaoyu Tang commented on HIVE-16147:
------------------------------------

[~pxiong] Thanks for looking into this. Yeah, I made some changes to fix the 
test failures and also optimized the code a little. I have uploaded the 2nd 
patch to RB requesting for the review.

> Rename a partitioned table should not drop its partition columns stats
> ----------------------------------------------------------------------
>
>                 Key: HIVE-16147
>                 URL: https://issues.apache.org/jira/browse/HIVE-16147
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information               
> Partition Value:      [3]                      
> Database:             default                  
> Table:                sample_pt                
> CreateTime:           Fri Jan 20 15:42:30 EST 2017     
> LastAccessTime:       UNKNOWN                  
> Location:             file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:          
>       COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>       last_modified_by        ctang               
>       last_modified_time      1485217063          
>       numFiles                1                   
>       numRows                 100                 
>       rawDataSize             5143                
>       totalSize               5243                
>       transient_lastDdlTime   1488842358    
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_name                    data_type               min                     
> max                     num_nulls               distinct_count          
> avg_col_len             max_col_len             num_trues               
> num_falses              comment             
>                                                                               
>  
> salary                int                     1                       151370  
>                 0                       94                                    
>                                                                               
>     from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information               
> Partition Value:      [3]                      
> Database:             default                  
> Table:                sample_pt_rename         
> CreateTime:           Fri Jan 20 15:42:30 EST 2017     
> LastAccessTime:       UNKNOWN                  
> Location:             
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3        
> Partition Parameters:          
>       COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>       last_modified_by        ctang               
>       last_modified_time      1485217063          
>       numFiles                1                   
>       numRows                 100                 
>       rawDataSize             5143                
>       totalSize               5243                
>       transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_name                    data_type               comment                 
>                                                          
>                                                                               
>  
> salary                int                     from deserializer               
>                                                  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to