> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > itests/src/test/resources/testconfiguration.properties, line 894
> > <https://reviews.apache.org/r/34666/diff/1/?file=971683#file971683line894>
> >
> >     Are there more test cases that can be turned on?

will turn on vectorized_dynamic_partition_pruning.q


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java,
> >  line 74
> > <https://reviews.apache.org/r/34666/diff/1/?file=971705#file971705line74>
> >
> >     Is there anything specific to Spark? If not, we should probably reuse 
> > rather than copying.

The only difference is the type of pruning sink added - we use 
SparkPartitionPruningSinkOp while Tez uses AppMasterEventOp.
OK, I'll reuse the existing class.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
> > 589
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line589>
> >
> >     It seems that an operator might be visited multiple times.

Yea, but I guess it doesn't matter here. We just use this to find the enclosing 
work for a op, and we just need to find at least one root op.
Duplicate doesn't matter here I think.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java,
> >  line 107
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line107>
> >
> >     For the cloned tree, don't we need to remove the branches that's not 
> > connected to the pruning sink operator, i.e., RS->Join?

This is done before we clone the branch:

```
List<Operator<?>> savedChildOps = filterOp.getChildOperators();
filterOp.setChildOperators(Utilities.makeList(selOp));
```


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java,
> >  line 111
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line111>
> >
> >     This is not cloned as part of cloneOperatorTree()?

no - because it is a transient field.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
> >  line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?

It shouldn't - I think work ID and Random#nextInt() should both be unique, 
right?


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java,
> >  line 68
> > <https://reviews.apache.org/r/34666/diff/1/?file=971701#file971701line68>
> >
> >     I think we should delegate the processing to the parent when processing 
> > one row from the batch. Refer to VectorReduceSinkOperator for an example.

Not much we can do here, since here the processing is more complicated. I 
changed part of the code to call the super.process().


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
> >  line 98
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98>
> >
> >     Nit: Potential leak of BufferedOutputStream.

Can you explain a little under which situation this would happen? and what is 
the better way to do this? Thanks.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
> optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
> 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   
> ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
>  e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 21398d8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> e6c845c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 5f731d7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> e27ce0d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
>  f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 19aae70 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
> 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
> e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
> e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   
> ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out 
> bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 
> 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out 
> ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java
>  dda3c5f 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java
>  ff0c1f2 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
>  fba49e4 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java
>  a50a508 
>   
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java
>  334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java
>  1c44789 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java
>  6b1b054 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java
>  efd571c 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java
>  169bfde 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java
>  4fc5454 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java
>  c973fcc 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java
>  c836630 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java
>  6c6c5f3 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java
>  cc383ed 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java
>  a44cfb0 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java
>  d16c8a4 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java
>  24a746e 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java
>  3dae460 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java
>  ff5e54d 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java
>  251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both 
> are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Reply via email to