[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-----------------------------

    Attachment: HIVE-1361.2.patch
                HIVE-1361.2_java_only.patch

Uploading a new patch (including a full version and a Java_only version 
including XML build files) for review. This is against the latest trunk.

The major changes from the last patch include: 
  1) Make JDBC update/insert/select using PreparedStatement(). 
  2) In HBase, use HTable.delete(ArrayList<Delete>) to speed up delete, and 
flushCommit() to batch update. 
  3) Refactor StatsTask to put stats into PartitionStatistics and 
TableStatistics so that it is easier to add new stats later. 
  4) Move WriteEntity creation from StatsTask to compile-time.

 I'm running tests again after refreshed to the latest trunk.

> table/partition level statistics
> --------------------------------
>
>                 Key: HIVE-1361
>                 URL: https://issues.apache.org/jira/browse/HIVE-1361
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ahmed M Aly
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to