[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-09 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16572:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to 3.0.0and 2.4.0. Thanks [~ychena] for review.

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16572.1.patch, HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-04 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16572:
---
Attachment: HIVE-16572.1.patch

Fixed the failure for test rename_external_partition_location.q, and added more 
tests for renaming a partition in an external table.
The other two test failures are not related to this patch, I was not able to 
reproduce in my local machine.
[~pxiong] could you help to review the patch? Thanks

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.1.patch, HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16572:
---
Status: Patch Available  (was: Open)

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16572:
---
Attachment: HIVE-16572.patch

The patch is to do following:
1. keep the partition column stats when a partition is renamed
2. refactor the partition renaming logic. We move the partition directory 
before committing the HMS transaction, since it will be easier to revert the 
data moving in a rename failure.

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)