[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094321#comment-14094321
 ] 

pengcheng xiong commented on HIVE-7654:
---------------------------------------

Done.


Thanks!

Best
Pengcheng Xiong
pxi...@hortonworks.com




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


> A method to extrapolate columnStats for partitions of a table
> -------------------------------------------------------------
>
>                 Key: HIVE-7654
>                 URL: https://issues.apache.org/jira/browse/HIVE-7654
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: pengcheng xiong
>            Assignee: pengcheng xiong
>            Priority: Minor
>         Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch
>
>
> In a PARTITIONED table, there are many partitions. For example, 
> create table if not exists loc_orc (
>   state string,
>   locid int,
>   zip bigint
> ) partitioned by(year string) stored as orc;
> We assume there are 4 partitions, partition(year='2000'), 
> partition(year='2001'), partition(year='2002') and partition(year='2003').
> We can use the following command to compute statistics for columns 
> state,locid of partition(year='2001')
> analyze table loc_orc partition(year='2001') compute statistics for columns 
> state,locid;
> We need to know the “aggregated” column status for the whole table loc_orc. 
> However, we may not have the column status for some partitions, e.g., 
> partition(year='2002') and also we may not have the column status for some 
> columns, e.g., zip bigint for partition(year='2001')
> We propose a method to extrapolate the missing column status for the 
> partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to