-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/
-----------------------------------------------------------
(Updated Jan. 20, 2015, 11:34 p.m.)
Review request for hive and chengxiang li.
Changes
-------
Address review comments from Lefty and Brock. Also in the descriptions, put
'Hive' as the client in order to clarify it.
Looked at Chengxiang's suggestions a little more to use --conf to pass the
values down to remote Spark driver, I guess I must have had a bug in my
original attempt, and after fixing those ran a few basic tests and it seemed to
work.
Bugs: HIVE-9337
https://issues.apache.org/jira/browse/HIVE-9337
Repository: hive-git
Description
-------
This change allows the Remote Spark Driver's properties to be set dynamically
via Hive configuration (ie, set commands).
Went through the Remote Spark Driver's properties and added them to HiveConf,
fixing the descriptions so that they're more clear in a global context with
other Hive properties. Also fixed a bug in description that stated default
value of max message size is 10MB, should read 50MB. One open question is that
I did not move 'hive.spark.log.dir' as I could not find where it was read, and
did not know if its still being used somewhere?
The passing of these properties between client (Hive) and RemoteSparkDriver is
done via the properties file. One note is that these properties have to be
appended with 'spark', as SparkConf only accepts those. I tried a long time to
pass them via 'conf' but found that it won't work (see
SparkSubmitArguments.scala). It may be possible to pass them each as another
argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it
via properties file.
On the Remote Spark Driver side, I kept the defensive logic to provide a
default value in case the conf object doesn't contain the property. This may
occur if a prop is unset. For this, I had to instantiate a HiveConf on that
process to get the default value, as some of the timeout props need a hiveConf
instance to do calculation on.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java
334c191
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
044f189
spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java
dab92f6
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java
5e3777a
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java
851e937
spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java ac71ae9
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
5a826ba
spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java
def4907
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java
a2dd3e6
Diff: https://reviews.apache.org/r/30055/diff/
Testing
-------
Thanks,
Szehon Ho