[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990334#comment-15990334 ] Pengcheng Xiong commented on HIVE-16147: +1 > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989269#comment-15989269 ] Chaoyu Tang commented on HIVE-16147: [~pxiong] Thanks for looking into this. Yeah, I made some changes to fix the test failures and also optimized the code a little. I have uploaded the 2nd patch to RB requesting for the review. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989211#comment-15989211 ] Pengcheng Xiong commented on HIVE-16147: [~ctang.ma], may i ask what did u change from the 1st patch? thanks. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989168#comment-15989168 ] Chaoyu Tang commented on HIVE-16147: The only one test failure is not related to this patch. [~pxiong] could you review the patch? Thanks > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988389#comment-15988389 ] Hive QA commented on HIVE-16147: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865390/HIVE-16147.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4912/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865390 - PreCommit-HIVE-Build > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987783#comment-15987783 ] Chaoyu Tang commented on HIVE-16147: The test failures are not related to the patch. [~pxiong], could you help to review it again? Thanks > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987696#comment-15987696 ] Hive QA commented on HIVE-16147: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865390/HIVE-16147.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10640 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[wrong_distinct2] (batchId=233) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4894/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4894/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4894/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865390 - PreCommit-HIVE-Build > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang >
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985430#comment-15985430 ] Hive QA commented on HIVE-16147: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865159/HIVE-16147.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_stats_status] (batchId=51) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=236) org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testAlterViewParititon (batchId=200) org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testAlterViewParititon (batchId=203) org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testAlterViewParititon (batchId=199) org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testAlterViewParititon (batchId=197) org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testAlterViewParititon (batchId=208) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4886/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4886/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4886/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865159 - PreCommit-HIVE-Build > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. >
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982331#comment-15982331 ] Pengcheng Xiong commented on HIVE-16147: LGTM. +1 pending tests. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982021#comment-15982021 ] Chaoyu Tang commented on HIVE-16147: Patch has been uploaded to RB. [~pxiong], could you help to review it. Thanks. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)