----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/#review12133 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java <https://reviews.apache.org/r/6878/#comment25822> I'll replace LHS with generic java types. - Shreepadma Venugopalan On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6878/ > ----------------------------------------------------------- > > (Updated Oct. 3, 2012, 3:10 a.m.) > > > Review request for hive and Carl Steinbach. > > > Description > ------- > > This patch implements version 1 of the column statistics project in Hive. It > adds support for computing and persisting statistical summary of column > values in Hive Tables and Partitions. In order to support column statistics > in Hive, this patch does the following, > > * Adds a new compute stats UDAF to compute scalar statistics for all > primitive Hive data types. In version 1 of the project, we support the > following scalar statistics on primitive types - estimate of number of > distinct values, number of null values, number of trues/falses for boolean > typed columsn, max and avg length for string and binary typed columns, max > and min value for long and double typed columns. Note that version 1 of the > column stats project includes support for column statistics both at the table > and partition level. > > * Adds Metastore schema tables to persist the newly added statistics both at > table and partition level. > * Adds Metastore Thrift API to persist, retrieve and delete column statistics > at both table and partition level. > Please refer to the following wiki link for the details of the schema and the > Thrift API changes - > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > > * Extends the analyze table compute statistics statement to trigger > statistics computation and persistence for one or more columns. Please note > that statistics for multiple columns is computed through a single scan of the > table data. Please refer to the following wiki link for the syntax changes - > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > > One thing missing from the patch at this point is the metastore upgrade > scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to > finalize the metastore schema changes before I go ahead and add the upgrade > scripts. > > In a follow on patch, as part of version 2 of the column statistics project, > we will add support for computing, persisting and retrieving histograms on > long and double typed column values. > > Generated Thrift files have been removed for viewing pleasure. JIRA page has > the patch with the generated Thrift files. > > > This addresses bug HIVE-1362. > https://issues.apache.org/jira/browse/HIVE-1362 > > > Diffs > ----- > > data/files/UserVisits.dat PRE-CREATION > data/files/binary.txt PRE-CREATION > data/files/bool.txt PRE-CREATION > data/files/double.txt PRE-CREATION > data/files/employee.dat PRE-CREATION > data/files/employee2.dat PRE-CREATION > data/files/int.txt PRE-CREATION > ivy/libraries.properties 7ac6778 > metastore/if/hive_metastore.thrift d4fad72 > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 8fec13d > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 17b986c > metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 3883b5b > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java > eff44b1 > metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a > metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa > > metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java > PRE-CREATION > > metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java > PRE-CREATION > metastore/src/model/package.jdo 38ce6d5 > > metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java > 528a100 > metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java > 925938d > ql/build.xml 5de3f78 > ql/if/queryplan.thrift 05fbf58 > ql/ivy.xml aa3b8ce > ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d > ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 4c8831f > ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 > ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java > 0b55ac4 > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java > 344dc69 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > f7257cd > ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java > e75a075 > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > 61bc7fd > ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java > 6024dd4 > ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 356779a > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > 09ef969 > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java > 22fa20f > ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java a0ccbe6 > ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java b38c002 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 5ce31f1 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java > ad1a14c > ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java cb54753 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java > PRE-CREATION > ql/src/test/queries/clientpositive/columnstats_partlvl.q PRE-CREATION > ql/src/test/queries/clientpositive/columnstats_tbllvl.q PRE-CREATION > ql/src/test/queries/clientpositive/compute_stats_binary.q PRE-CREATION > ql/src/test/queries/clientpositive/compute_stats_boolean.q PRE-CREATION > ql/src/test/queries/clientpositive/compute_stats_double.q PRE-CREATION > ql/src/test/queries/clientpositive/compute_stats_long.q PRE-CREATION > ql/src/test/queries/clientpositive/compute_stats_string.q PRE-CREATION > ql/src/test/results/clientpositive/columnstats_partlvl.q.out PRE-CREATION > ql/src/test/results/clientpositive/columnstats_tbllvl.q.out PRE-CREATION > ql/src/test/results/clientpositive/compute_stats_binary.q.out PRE-CREATION > ql/src/test/results/clientpositive/compute_stats_boolean.q.out PRE-CREATION > ql/src/test/results/clientpositive/compute_stats_double.q.out PRE-CREATION > ql/src/test/results/clientpositive/compute_stats_long.q.out PRE-CREATION > ql/src/test/results/clientpositive/compute_stats_string.q.out PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out 02f6a94 > ql/src/test/results/clientpositive/udaf_histogram.q.out PRE-CREATION > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java > 5430814 > > Diff: https://reviews.apache.org/r/6878/diff/ > > > Testing > ------- > > All the existing hive tests pass. Additionally this patch adds the following > unit tests, > > * Tests to TestHiveMetaStore.java to test the Metastore schema and Thrift API > changes, > * Tests to exercise compute_stats UDAF for all primitive types, > * End to end test both at table and partition level for computing stats on > multiple columns. Note that these tests use the extended syntax of the > analyze command. > > > Thanks, > > Shreepadma Venugopalan > >