-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
-----------------------------------------------------------

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: Hive-8756
    https://issues.apache.org/jira/browse/Hive-8756


Repository: hive-git


Description
-------

numRows and rawDataSize are not collected by the Spark stats. That is caused by 
the FileSinkOperator in the ReduceWork is not set the stats config. In the 
GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
FileSinkOperator is generated and set to the reduce work. However, during 
processFileSink, the original FileSinkOperator is set the collectStats tag in 
GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the 
ReduceWork.  


Diffs
-----

  itests/src/test/resources/testconfiguration.properties 79a0132 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
8290568 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 
  ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27719/diff/


Testing
-------


Thanks,

Na Yang

Reply via email to