-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/#review12137
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
<https://reviews.apache.org/r/6878/#comment25825>

    Will remove this change to MapRedTask.java. Sorry abt this.


- Shreepadma Venugopalan


On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6878/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2012, 3:10 a.m.)
> 
> 
> Review request for hive and Carl Steinbach.
> 
> 
> Description
> -------
> 
> This patch implements version 1 of the column statistics project in Hive. It 
> adds support for computing and persisting statistical summary of column 
> values in Hive Tables and Partitions. In order to support column statistics 
> in Hive, this patch does the following,
> 
> * Adds a new compute stats UDAF to compute scalar statistics for all 
> primitive Hive data types. In version 1 of the project, we support the 
> following scalar statistics on primitive types - estimate of number of 
> distinct values, number of null values, number of trues/falses for boolean 
> typed columsn, max and avg length for string and binary typed columns, max 
> and min value for long and double typed columns. Note that version 1 of the 
> column stats project includes support for column statistics both at the table 
> and partition level.
> 
> * Adds Metastore schema tables to persist the newly added statistics both at 
> table and partition level.
> * Adds Metastore Thrift API to persist, retrieve and delete column statistics 
> at both table and partition level. 
> Please refer to the following wiki link for the details of the schema and the 
> Thrift API changes - 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
> 
> * Extends the analyze table compute statistics statement to trigger 
> statistics computation and persistence for one or more columns. Please note 
> that statistics for multiple columns is computed through a single scan of the 
> table data. Please refer to the following wiki link for the syntax changes - 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
> 
> One thing missing from the patch at this point is the metastore upgrade 
> scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to 
> finalize the metastore schema changes before I go ahead and add the upgrade 
> scripts.
> 
> In a follow on patch, as part of version 2 of the column statistics project, 
> we will add support for computing, persisting and retrieving histograms on 
> long and double typed column values.
> 
> Generated Thrift files have been removed for viewing pleasure. JIRA page has 
> the patch with the generated Thrift files.
> 
> 
> This addresses bug HIVE-1362.
>     https://issues.apache.org/jira/browse/HIVE-1362
> 
> 
> Diffs
> -----
> 
>   data/files/UserVisits.dat PRE-CREATION 
>   data/files/binary.txt PRE-CREATION 
>   data/files/bool.txt PRE-CREATION 
>   data/files/double.txt PRE-CREATION 
>   data/files/employee.dat PRE-CREATION 
>   data/files/employee2.dat PRE-CREATION 
>   data/files/int.txt PRE-CREATION 
>   ivy/libraries.properties 7ac6778 
>   metastore/if/hive_metastore.thrift d4fad72 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 8fec13d 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> 17b986c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
> 3883b5b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> eff44b1 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
>   
> metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
>  PRE-CREATION 
>   
> metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
>  PRE-CREATION 
>   metastore/src/model/package.jdo 38ce6d5 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  528a100 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
> 925938d 
>   ql/build.xml 5de3f78 
>   ql/if/queryplan.thrift 05fbf58 
>   ql/ivy.xml aa3b8ce 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 4c8831f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
>  0b55ac4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 344dc69 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f7257cd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 
> e75a075 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> 61bc7fd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 
> 6024dd4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 356779a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 09ef969 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 22fa20f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java a0ccbe6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java b38c002 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 5ce31f1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
> ad1a14c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java cb54753 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/columnstats_partlvl.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/columnstats_tbllvl.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/compute_stats_binary.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/compute_stats_boolean.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/compute_stats_double.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/compute_stats_long.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/compute_stats_string.q PRE-CREATION 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_binary.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_boolean.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 02f6a94 
>   ql/src/test/results/clientpositive/udaf_histogram.q.out PRE-CREATION 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
>  5430814 
> 
> Diff: https://reviews.apache.org/r/6878/diff/
> 
> 
> Testing
> -------
> 
> All the existing hive tests pass. Additionally this patch adds the following 
> unit tests,
> 
> * Tests to TestHiveMetaStore.java to test the Metastore schema and Thrift API 
> changes,
> * Tests to exercise compute_stats UDAF for all primitive types,
> * End to end test both at table and partition level for computing stats on 
> multiple columns. Note that these tests use the extended syntax of the 
> analyze command.
> 
> 
> Thanks,
> 
> Shreepadma Venugopalan
> 
>

Reply via email to