> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java,
> >  line 59
> > <https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59>
> >
> >     The statistic data shoud be quite unaccurate after filter and group, as 
> > it's computered based on estimation during compile time. I think threshold 
> > verification on unaccurate data should be unacceptable as that means the 
> > threshold may not work at all.
> >     We may check this threshold in SparkPartitionPruningSinkOperator at 
> > runtime.

Switching to runtime would be very different - here we want to check this 
threshold, and avoid generating the pruning task if possible.
How inaccurate the stats would be? I'm fine if it's always more conservative.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
> > 396
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line396>
> >
> >     Why we need List for table/cloumnname/partkey here? do we support multi 
> > PartitionPruningSinkOperator inside single operator tree?

This is because a target work with a partitioned table could have multiple 
partition columns which could come from multiple table and/or partkeys.
You can check test output file for some examples.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
> >  line 61
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line61>
> >
> >     While append data size overwhelm its capability, DataOutputBuffer 
> > expand its byte array size by create a new byte array with 2x size and copy 
> > old one to new one. A estimated initial byte array size should be able to 
> > reduce most array copy.

Yes, this would be an improvement. Xuefu and me talked about adding an extra 
parameter to control the generated file size. We plan to do that as a follow-up 
task.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
> optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
> 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   
> ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
>  e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 21398d8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> e6c845c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 5f731d7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> e27ce0d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
>  f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 19aae70 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
> 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
> e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
> e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   
> ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out 
> bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 
> 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out 
> ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java
>  dda3c5f 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java
>  ff0c1f2 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
>  fba49e4 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java
>  a50a508 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java
>  334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java
>  1c44789 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java
>  6b1b054 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java
>  efd571c 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java
>  169bfde 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java
>  4fc5454 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java
>  c973fcc 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java
>  c836630 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java
>  6c6c5f3 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java
>  cc383ed 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java
>  a44cfb0 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java
>  d16c8a4 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java
>  24a746e 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java
>  3dae460 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java
>  ff5e54d 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java
>  251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both 
> are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Reply via email to