Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-20 Thread Lefty Leverenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/#review68853
---

Ship it!


Ship It!

- Lefty Leverenz


On Jan. 20, 2015, 11:34 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30055/
> ---
> 
> (Updated Jan. 20, 2015, 11:34 p.m.)
> 
> 
> Review request for hive and chengxiang li.
> 
> 
> Bugs: HIVE-9337
> https://issues.apache.org/jira/browse/HIVE-9337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This change allows the Remote Spark Driver's properties to be set dynamically 
> via Hive configuration (ie, set commands).
> 
> Went through the Remote Spark Driver's properties and added them to HiveConf, 
> fixing the descriptions so that they're more clear in a global context with 
> other Hive properties.  Also fixed a bug in description that stated default 
> value of max message size is 10MB, should read 50MB.  One open question is 
> that I did not move 'hive.spark.log.dir' as I could not find where it was 
> read, and did not know if its still being used somewhere?
> 
> The passing of these properties between client (Hive) and RemoteSparkDriver 
> is done via the properties file.  One note is that these properties have to 
> be appended with 'spark', as SparkConf only accepts those.  I tried a long 
> time to pass them via 'conf' but found that it won't work (see 
> SparkSubmitArguments.scala).  It may be possible to pass them each as another 
> argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
> via properties file.
> 
> On the Remote Spark Driver side, I kept the defensive logic to provide a 
> default value in case the conf object doesn't contain the property.  This may 
> occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
> process to get the default value, as some of the timeout props need a 
> hiveConf instance to do calculation on.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 334c191 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> 044f189 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> dab92f6 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java
>  5e3777a 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 851e937 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> ac71ae9 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  5a826ba 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> def4907 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> a2dd3e6 
> 
> Diff: https://reviews.apache.org/r/30055/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-20 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/#review68850
---

Ship it!


Ship It!

- chengxiang li


On 一月 20, 2015, 11:34 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30055/
> ---
> 
> (Updated 一月 20, 2015, 11:34 p.m.)
> 
> 
> Review request for hive and chengxiang li.
> 
> 
> Bugs: HIVE-9337
> https://issues.apache.org/jira/browse/HIVE-9337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This change allows the Remote Spark Driver's properties to be set dynamically 
> via Hive configuration (ie, set commands).
> 
> Went through the Remote Spark Driver's properties and added them to HiveConf, 
> fixing the descriptions so that they're more clear in a global context with 
> other Hive properties.  Also fixed a bug in description that stated default 
> value of max message size is 10MB, should read 50MB.  One open question is 
> that I did not move 'hive.spark.log.dir' as I could not find where it was 
> read, and did not know if its still being used somewhere?
> 
> The passing of these properties between client (Hive) and RemoteSparkDriver 
> is done via the properties file.  One note is that these properties have to 
> be appended with 'spark', as SparkConf only accepts those.  I tried a long 
> time to pass them via 'conf' but found that it won't work (see 
> SparkSubmitArguments.scala).  It may be possible to pass them each as another 
> argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
> via properties file.
> 
> On the Remote Spark Driver side, I kept the defensive logic to provide a 
> default value in case the conf object doesn't contain the property.  This may 
> occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
> process to get the default value, as some of the timeout props need a 
> hiveConf instance to do calculation on.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 334c191 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> 044f189 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> dab92f6 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java
>  5e3777a 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 851e937 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> ac71ae9 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  5a826ba 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> def4907 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> a2dd3e6 
> 
> Diff: https://reviews.apache.org/r/30055/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-20 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/
---

(Updated Jan. 20, 2015, 11:34 p.m.)


Review request for hive and chengxiang li.


Changes
---

Address review comments from Lefty and Brock.  Also in the descriptions, put 
'Hive' as the client in order to clarify it.

Looked at Chengxiang's suggestions a little more to use --conf to pass the 
values down to remote Spark driver, I guess I must have had a bug in my 
original attempt, and after fixing those ran a few basic tests and it seemed to 
work.


Bugs: HIVE-9337
https://issues.apache.org/jira/browse/HIVE-9337


Repository: hive-git


Description
---

This change allows the Remote Spark Driver's properties to be set dynamically 
via Hive configuration (ie, set commands).

Went through the Remote Spark Driver's properties and added them to HiveConf, 
fixing the descriptions so that they're more clear in a global context with 
other Hive properties.  Also fixed a bug in description that stated default 
value of max message size is 10MB, should read 50MB.  One open question is that 
I did not move 'hive.spark.log.dir' as I could not find where it was read, and 
did not know if its still being used somewhere?

The passing of these properties between client (Hive) and RemoteSparkDriver is 
done via the properties file.  One note is that these properties have to be 
appended with 'spark', as SparkConf only accepts those.  I tried a long time to 
pass them via 'conf' but found that it won't work (see 
SparkSubmitArguments.scala).  It may be possible to pass them each as another 
argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
via properties file.

On the Remote Spark Driver side, I kept the defensive logic to provide a 
default value in case the conf object doesn't contain the property.  This may 
occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
process to get the default value, as some of the timeout props need a hiveConf 
instance to do calculation on.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
334c191 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
044f189 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
dab92f6 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
5e3777a 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
851e937 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java ac71ae9 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 5a826ba 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
def4907 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
a2dd3e6 

Diff: https://reviews.apache.org/r/30055/diff/


Testing
---


Thanks,

Szehon Ho



Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-19 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/#review68686
---


Szehon, do you try to transfer RSC configuration through RemoteDriver --conf 
option? The command generated in SparkClientImpl should looks like: 
SparkSubmit --properties-file /tmp/spark-submit.1267525585014474423.properties 
--class org.apache.hive.spark.client.RemoteDriver 
/usr/lib/hive-0.15.0/lib/hive-exec-0.15.0-SNAPSHOT.jar --remote-host node14-4 
--remote-port 38136 --conf hive.spark=... --conf hive.spark...=...
It's quite strage that SparkSubmit handle it's child main class's arguments.

- chengxiang li


On Jan. 19, 2015, 11:16 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30055/
> ---
> 
> (Updated Jan. 19, 2015, 11:16 p.m.)
> 
> 
> Review request for hive and chengxiang li.
> 
> 
> Bugs: HIVE-9337
> https://issues.apache.org/jira/browse/HIVE-9337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This change allows the Remote Spark Driver's properties to be set dynamically 
> via Hive configuration (ie, set commands).
> 
> Went through the Remote Spark Driver's properties and added them to HiveConf, 
> fixing the descriptions so that they're more clear in a global context with 
> other Hive properties.  Also fixed a bug in description that stated default 
> value of max message size is 10MB, should read 50MB.  One open question is 
> that I did not move 'hive.spark.log.dir' as I could not find where it was 
> read, and did not know if its still being used somewhere?
> 
> The passing of these properties between client (Hive) and RemoteSparkDriver 
> is done via the properties file.  One note is that these properties have to 
> be appended with 'spark', as SparkConf only accepts those.  I tried a long 
> time to pass them via 'conf' but found that it won't work (see 
> SparkSubmitArguments.scala).  It may be possible to pass them each as another 
> argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
> via properties file.
> 
> On the Remote Spark Driver side, I kept the defensive logic to provide a 
> default value in case the conf object doesn't contain the property.  This may 
> occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
> process to get the default value, as some of the timeout props need a 
> hiveConf instance to do calculation on.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 068c962 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 334c191 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 865e03e 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> ac71ae9 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  5a826ba 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> a2dd3e6 
> 
> Diff: https://reviews.apache.org/r/30055/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-19 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/#review68678
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java


I think we'll be doing this lookup potentially hundreds of times. Let's 
make HIVE_SPARK_RSC_CONFIGS an ImmutableSet to improve lookup speed?


- Brock Noland


On Jan. 19, 2015, 11:16 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30055/
> ---
> 
> (Updated Jan. 19, 2015, 11:16 p.m.)
> 
> 
> Review request for hive and chengxiang li.
> 
> 
> Bugs: HIVE-9337
> https://issues.apache.org/jira/browse/HIVE-9337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This change allows the Remote Spark Driver's properties to be set dynamically 
> via Hive configuration (ie, set commands).
> 
> Went through the Remote Spark Driver's properties and added them to HiveConf, 
> fixing the descriptions so that they're more clear in a global context with 
> other Hive properties.  Also fixed a bug in description that stated default 
> value of max message size is 10MB, should read 50MB.  One open question is 
> that I did not move 'hive.spark.log.dir' as I could not find where it was 
> read, and did not know if its still being used somewhere?
> 
> The passing of these properties between client (Hive) and RemoteSparkDriver 
> is done via the properties file.  One note is that these properties have to 
> be appended with 'spark', as SparkConf only accepts those.  I tried a long 
> time to pass them via 'conf' but found that it won't work (see 
> SparkSubmitArguments.scala).  It may be possible to pass them each as another 
> argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
> via properties file.
> 
> On the Remote Spark Driver side, I kept the defensive logic to provide a 
> default value in case the conf object doesn't contain the property.  This may 
> occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
> process to get the default value, as some of the timeout props need a 
> hiveConf instance to do calculation on.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 068c962 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 334c191 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 865e03e 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> ac71ae9 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  5a826ba 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> a2dd3e6 
> 
> Diff: https://reviews.apache.org/r/30055/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-19 Thread Lefty Leverenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/#review68674
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


init-cap Spark in "remote spark driver" (here and all other parameter 
descriptions)



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


init-cap Spark



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


init-cap Spark



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


(1) init-cap Spark
(2) second line of description should align with first, no extra indentation



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


init-cap Spark



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


init-cap Spark



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


(1) init-cap Spark
(2) final period after WARN}


- Lefty Leverenz


On Jan. 19, 2015, 11:16 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30055/
> ---
> 
> (Updated Jan. 19, 2015, 11:16 p.m.)
> 
> 
> Review request for hive and chengxiang li.
> 
> 
> Bugs: HIVE-9337
> https://issues.apache.org/jira/browse/HIVE-9337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This change allows the Remote Spark Driver's properties to be set dynamically 
> via Hive configuration (ie, set commands).
> 
> Went through the Remote Spark Driver's properties and added them to HiveConf, 
> fixing the descriptions so that they're more clear in a global context with 
> other Hive properties.  Also fixed a bug in description that stated default 
> value of max message size is 10MB, should read 50MB.  One open question is 
> that I did not move 'hive.spark.log.dir' as I could not find where it was 
> read, and did not know if its still being used somewhere?
> 
> The passing of these properties between client (Hive) and RemoteSparkDriver 
> is done via the properties file.  One note is that these properties have to 
> be appended with 'spark', as SparkConf only accepts those.  I tried a long 
> time to pass them via 'conf' but found that it won't work (see 
> SparkSubmitArguments.scala).  It may be possible to pass them each as another 
> argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
> via properties file.
> 
> On the Remote Spark Driver side, I kept the defensive logic to provide a 
> default value in case the conf object doesn't contain the property.  This may 
> occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
> process to get the default value, as some of the timeout props need a 
> hiveConf instance to do calculation on.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 068c962 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 334c191 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 865e03e 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> ac71ae9 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  5a826ba 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> a2dd3e6 
> 
> Diff: https://reviews.apache.org/r/30055/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf

2015-01-19 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/
---

Review request for hive and chengxiang li.


Bugs: HIVE-9337
https://issues.apache.org/jira/browse/HIVE-9337


Repository: hive-git


Description
---

This change allows the Remote Spark Driver's properties to be set dynamically 
via Hive configuration (ie, set commands).

Went through the Remote Spark Driver's properties and added them to HiveConf, 
fixing the descriptions so that they're more clear in a global context with 
other Hive properties.  Also fixed a bug in description that stated default 
value of max message size is 10MB, should read 50MB.  One open question is that 
I did not move 'hive.spark.log.dir' as I could not find where it was read, and 
did not know if its still being used somewhere?

The passing of these properties between client (Hive) and RemoteSparkDriver is 
done via the properties file.  One note is that these properties have to be 
appended with 'spark', as SparkConf only accepts those.  I tried a long time to 
pass them via 'conf' but found that it won't work (see 
SparkSubmitArguments.scala).  It may be possible to pass them each as another 
argument (like --hive.spark.XXX=YYY), but I think its more scalable to do it 
via properties file.

On the Remote Spark Driver side, I kept the defensive logic to provide a 
default value in case the conf object doesn't contain the property.  This may 
occur if a prop is unset. For this, I had to instantiate a HiveConf on that 
process to get the default value, as some of the timeout props need a hiveConf 
instance to do calculation on.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 068c962 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
334c191 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
865e03e 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java ac71ae9 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 5a826ba 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
a2dd3e6 

Diff: https://reviews.apache.org/r/30055/diff/


Testing
---


Thanks,

Szehon Ho