[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats
[ https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16572: --- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) Committed to 3.0.0and 2.4.0. Thanks [~ychena] for review. > Rename a partition should not drop its column stats > --- > > Key: HIVE-16572 > URL: https://issues.apache.org/jira/browse/HIVE-16572 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16572.1.patch, HIVE-16572.patch > > > The column stats for the table sample_pt partition (dummy=1) is as following: > {code} > hive> describe formatted sample_pt partition (dummy=1) code; > OK > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > code string > 0 303 6.985 > 7 > from deserializer > Time taken: 0.259 seconds, Fetched: 3 row(s) > {code} > But when this partition is renamed, say > alter table sample_pt partition (dummy=1) rename to partition (dummy=11); > The COLUMN_STATS in partition description are true, but column stats are > actually all deleted. > {code} > hive> describe formatted sample_pt partition (dummy=11); > OK > # col_namedata_type comment > > code string > description string > salaryint > total_emp int > > # Partition Information > # col_namedata_type comment > > dummy int > > # Detailed Partition Information > Partition Value: [11] > Database: default > Table:sample_pt > CreateTime: Thu Mar 30 23:03:59 EDT 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 > > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > numFiles1 > numRows 200 > rawDataSize 10228 > totalSize 10428 > transient_lastDdlTime 1490929439 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 6.783 seconds, Fetched: 37 row(s) > === > hive> describe formatted sample_pt partition (dummy=11) code; > OK > # col_namedata_type comment > > > > code string from deserializer > > Time taken: 9.429 seconds, Fetched: 3 row(s) > {code} > The column stats should not be drop when a partition is renamed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats
[ https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16572: --- Attachment: HIVE-16572.1.patch Fixed the failure for test rename_external_partition_location.q, and added more tests for renaming a partition in an external table. The other two test failures are not related to this patch, I was not able to reproduce in my local machine. [~pxiong] could you help to review the patch? Thanks > Rename a partition should not drop its column stats > --- > > Key: HIVE-16572 > URL: https://issues.apache.org/jira/browse/HIVE-16572 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16572.1.patch, HIVE-16572.patch > > > The column stats for the table sample_pt partition (dummy=1) is as following: > {code} > hive> describe formatted sample_pt partition (dummy=1) code; > OK > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > code string > 0 303 6.985 > 7 > from deserializer > Time taken: 0.259 seconds, Fetched: 3 row(s) > {code} > But when this partition is renamed, say > alter table sample_pt partition (dummy=1) rename to partition (dummy=11); > The COLUMN_STATS in partition description are true, but column stats are > actually all deleted. > {code} > hive> describe formatted sample_pt partition (dummy=11); > OK > # col_namedata_type comment > > code string > description string > salaryint > total_emp int > > # Partition Information > # col_namedata_type comment > > dummy int > > # Detailed Partition Information > Partition Value: [11] > Database: default > Table:sample_pt > CreateTime: Thu Mar 30 23:03:59 EDT 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 > > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > numFiles1 > numRows 200 > rawDataSize 10228 > totalSize 10428 > transient_lastDdlTime 1490929439 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 6.783 seconds, Fetched: 37 row(s) > === > hive> describe formatted sample_pt partition (dummy=11) code; > OK > # col_namedata_type comment > > > > code string from deserializer > > Time taken: 9.429 seconds, Fetched: 3 row(s) > {code} > The column stats should not be drop when a partition is renamed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats
[ https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16572: --- Status: Patch Available (was: Open) > Rename a partition should not drop its column stats > --- > > Key: HIVE-16572 > URL: https://issues.apache.org/jira/browse/HIVE-16572 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16572.patch > > > The column stats for the table sample_pt partition (dummy=1) is as following: > {code} > hive> describe formatted sample_pt partition (dummy=1) code; > OK > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > code string > 0 303 6.985 > 7 > from deserializer > Time taken: 0.259 seconds, Fetched: 3 row(s) > {code} > But when this partition is renamed, say > alter table sample_pt partition (dummy=1) rename to partition (dummy=11); > The COLUMN_STATS in partition description are true, but column stats are > actually all deleted. > {code} > hive> describe formatted sample_pt partition (dummy=11); > OK > # col_namedata_type comment > > code string > description string > salaryint > total_emp int > > # Partition Information > # col_namedata_type comment > > dummy int > > # Detailed Partition Information > Partition Value: [11] > Database: default > Table:sample_pt > CreateTime: Thu Mar 30 23:03:59 EDT 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 > > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > numFiles1 > numRows 200 > rawDataSize 10228 > totalSize 10428 > transient_lastDdlTime 1490929439 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 6.783 seconds, Fetched: 37 row(s) > === > hive> describe formatted sample_pt partition (dummy=11) code; > OK > # col_namedata_type comment > > > > code string from deserializer > > Time taken: 9.429 seconds, Fetched: 3 row(s) > {code} > The column stats should not be drop when a partition is renamed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16572) Rename a partition should not drop its column stats
[ https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16572: --- Attachment: HIVE-16572.patch The patch is to do following: 1. keep the partition column stats when a partition is renamed 2. refactor the partition renaming logic. We move the partition directory before committing the HMS transaction, since it will be easier to revert the data moving in a rename failure. > Rename a partition should not drop its column stats > --- > > Key: HIVE-16572 > URL: https://issues.apache.org/jira/browse/HIVE-16572 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16572.patch > > > The column stats for the table sample_pt partition (dummy=1) is as following: > {code} > hive> describe formatted sample_pt partition (dummy=1) code; > OK > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > code string > 0 303 6.985 > 7 > from deserializer > Time taken: 0.259 seconds, Fetched: 3 row(s) > {code} > But when this partition is renamed, say > alter table sample_pt partition (dummy=1) rename to partition (dummy=11); > The COLUMN_STATS in partition description are true, but column stats are > actually all deleted. > {code} > hive> describe formatted sample_pt partition (dummy=11); > OK > # col_namedata_type comment > > code string > description string > salaryint > total_emp int > > # Partition Information > # col_namedata_type comment > > dummy int > > # Detailed Partition Information > Partition Value: [11] > Database: default > Table:sample_pt > CreateTime: Thu Mar 30 23:03:59 EDT 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 > > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > numFiles1 > numRows 200 > rawDataSize 10228 > totalSize 10428 > transient_lastDdlTime 1490929439 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 6.783 seconds, Fetched: 37 row(s) > === > hive> describe formatted sample_pt partition (dummy=11) code; > OK > # col_namedata_type comment > > > > code string from deserializer > > Time taken: 9.429 seconds, Fetched: 3 row(s) > {code} > The column stats should not be drop when a partition is renamed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)