-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/
-----------------------------------------------------------

(Updated Jan. 20, 2015, 11:34 p.m.)


Review request for hive and chengxiang li.


Changes
-------

Address review comments from Lefty and Brock.  Also in the descriptions, put 
'Hive' as the client in order to clarify it.

Looked at Chengxiang's suggestions a little more to use --conf to pass the 
values down to remote Spark driver, I guess I must have had a bug in my 
original attempt, and after fixing those ran a few basic tests and it seemed to 
work.


Bugs: HIVE-9337
    https://issues.apache.org/jira/browse/HIVE-9337


Repository: hive-git


Description
-------

This change allows the Remote Spark Driver's properties to be set dynamically 
via Hive configuration (ie, set commands).

Went through the Remote Spark Driver's properties and added them to HiveConf, 
fixing the descriptions so that they're more clear in a global context with 
other Hive properties.  Also fixed a bug in description that stated default 
value of max message size is 10MB, should read 50MB.  One open question is that 
I did not move 'hive.spark.log.dir' as I could not find where it was read, and 
did not know if its still being used somewhere?

The passing of these properties between client (Hive) and RemoteSparkDriver is 
done via the properties file.  One note is that these properties have to be 
appended with 'spark', as SparkConf only accepts those.  I tried a long time to 
pass them via 'conf' but found that it won't work (see 
SparkSubmitArguments.scala).  It may be possible to pass them each as another 
argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
via properties file.

On the Remote Spark Driver side, I kept the defensive logic to provide a 
default value in case the conf object doesn't contain the property.  This may 
occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
process to get the default value, as some of the timeout props need a hiveConf 
instance to do calculation on.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
334c191 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
044f189 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
dab92f6 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
5e3777a 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
851e937 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java ac71ae9 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 5a826ba 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
def4907 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
a2dd3e6 

Diff: https://reviews.apache.org/r/30055/diff/


Testing
-------


Thanks,

Szehon Ho

Reply via email to