[jira] [Commented] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-08 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000866#comment-16000866
 ] 

Yongzhi Chen commented on HIVE-16572:
-

LGTM  +1

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.1.patch, HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-06 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999650#comment-15999650
 ] 

Chaoyu Tang commented on HIVE-16572:


The test failure is not related to the patch.

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.1.patch, HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_namedata_type   comment 
>  
>   
>  
> code  string  from deserializer   
>  
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999496#comment-15999496
 ] 

Hive QA commented on HIVE-16572:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866411/HIVE-16572.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10652 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5079/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5079/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5079/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866411 - PreCommit-HIVE-Build

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.1.patch, HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe 

[jira] [Commented] (HIVE-16572) Rename a partition should not drop its column stats

2017-05-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995953#comment-15995953
 ] 

Hive QA commented on HIVE-16572:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866271/HIVE-16572.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10648 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rename_external_partition_location]
 (batchId=46)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[external_table_ppd] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5031/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5031/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866271 - PreCommit-HIVE-Build

> Rename a partition should not drop its column stats
> ---
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16572.patch
>
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> code  string  
> 0   303 6.985 
>   7   
> from deserializer   
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are 
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_namedata_type   comment 
>
> code  string  
> description   string  
> salaryint 
> total_emp int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> dummy int 
>
> # Detailed Partition Information   
> Partition Value:  [11] 
> Database: default  
> Table:sample_pt
> CreateTime:   Thu Mar 30 23:03:59 EDT 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11 
>  
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   numFiles1   
>   numRows 200 
>   rawDataSize 10228   
>   totalSize   10428   
>   transient_lastDdlTime   1490929439  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []