Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

Jimmy Xiang Thu, 06 Nov 2014 12:02:26 -0800


> On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 86
> > <https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line86>
> >
> >     Can we document what are in the tuple, especially what each means?


Sure. Will add a doc.


> On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75
> > <https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75>
> >
> >     I don't feel we need to cache this, as this can change during a user 
> > session.

Yes, it will change during a user session. I was thinking to update this when 
things are changed base on some event callbacks.

Such info may be needed many times if there are many reducers. It should save 
us some time to go to the Spark master (assuming getExecutorMemoryStatus 
checking with the master).


> On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89
> > <https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89>
> >
> >     I'm not sure why this needs to be synchronized. Will this method be 
> > called by concurrent threads? It doesn't seem to be the case.

Are you saying it won't be called by many threads? Each JVM can run one query 
at a time during all deployment modes? How come SparkClient.getInstance is 
synchronized?


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/#review60210
-----------------------------------------------------------


On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27687/
> -----------------------------------------------------------
> 
> (Updated Nov. 6, 2014, 5:25 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8649
>     https://issues.apache.org/jira/browse/HIVE-8649
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> First patch for HIVE-8649, to increase the number of reducers for spark based 
> on some info about the spark cluster.
> We need to add a SparkListener to handle cluster status change if such events 
> are supported by spark.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
>  2dbb5a3 
> 
> Diff: https://reviews.apache.org/r/27687/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

Reply via email to