from:"Marcelo Vanzin"

Re: Review Request 27987: HIVE-8833 implement remote spark client

2014-11-14 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27987/#review61479
---

Ship it!


LGTM, just small nits.


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java
<https://reviews.apache.org/r/27987/#comment103130>

nit: space before {

Maybe implement Closeable?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java
<https://reviews.apache.org/r/27987/#comment103134>

Use "properties.load(Reader)" instead, so you can force UTF-8 encoding.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java
<https://reviews.apache.org/r/27987/#comment103135>

Doesn't this work?

for (Map.Entry entry : hiveConf)



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java
<https://reviews.apache.org/r/27987/#comment103137>

Is Hive still using commons-logging? slf4j makes this much better since it 
handles format strings for you...



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
<https://reviews.apache.org/r/27987/#comment103145>

Don't you get warnings here since JobHandle needs a type parameter?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
<https://reviews.apache.org/r/27987/#comment103148>

You could use:

  new URI(path).getScheme() != null



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
<https://reviews.apache.org/r/27987/#comment103150>

    You could use:

  new File(path).toURI().toURL()


- Marcelo Vanzin


On Nov. 14, 2014, 3:43 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27987/
> ---
> 
> (Updated Nov. 14, 2014, 3:43 a.m.)
> 
> 
> Review request for hive, Rui Li, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8833
> https://issues.apache.org/jira/browse/HIVE-8833
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive would support submitting spark job through both local spark client and 
> remote spark client. we should unify the spark client API, and implement 
> remote spark client through Remote Spark Context.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml 06d7f27 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java ee16c9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 2fea62d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> e3e6d16 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
>  51e0510 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobRef.java 
> bf43b6e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
>  d4d14a3 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
> 8346b28 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 5af66ee 
> 
> Diff: https://reviews.apache.org/r/27987/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin

E-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 8, 2014, 7:40 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description (updated)
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 8, 2014, 7:47 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin



> On Dec. 8, 2014, 9:03 p.m., Brock Noland wrote:
> > Hey Marcelo,
> > 
> > When I send an HTTP request to the port where RSC is listening the message 
> > below is printed. Thus it's doing a good job in that it's checking the max 
> > message size which is awesome, but I feel we need to:
> > 
> > 1) Add a small header so that when junk data is sent to this port we can 
> > log a better exception than the one below. As I mentioned, we've had 
> > massive problems with this is in flume which also uses netty for 
> > communication.
> > 
> > 2) ensure the income size is not negative.
> > 
> > 
> > 2014-12-08 20:56:41,070 WARN  [RPC-Handler-7]: rpc.RpcDispatcher 
> > (RpcDispatcher.java:exceptionCaught(154)) - [HelloDispatcher] Caught 
> > exception in channel pipeline.
> > io.netty.handler.codec.DecoderException: 
> > java.lang.IllegalArgumentException: Message exceeds maximum allowed size 
> > (10485760 bytes).
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280)
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
> > at 
> > io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:108)
> > at 
> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> > at 
> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> > at 
> > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
> > at 
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at 
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.IllegalArgumentException: Message exceeds maximum 
> > allowed size (10485760 bytes).
> > at 
> > org.apache.hive.spark.client.rpc.KryoMessageCodec.checkSize(KryoMessageCodec.java:117)
> > at 
> > org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:77)
> > at 
> > io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42)
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
> > ... 12 more

I can add the check for negative sizes, but I still don't understand why you 
want a header. It doesn't serve any practical purposes. The protocol itself has 
a handshake that needs to be successful for the connection to be established; 
adding a header will add nothing to the process, just complexity.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/#review64279
---


On Dec. 8, 2014, 7:47 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28779/
> ---
> 
> (Updated Dec. 8, 2014, 7:47 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Bugs: HIVE-9036
> https://issues.apache.org/jira/browse/HIVE-9036
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
> any features on top of the existing spark-client API, which is unchanged 
> (except for the need to add empty constructors in some places).
> 
> With the new backend we can think about adding some nice features such as 
> future listeners (which were awkward with akka because of Scala), but those 
> are left for a different time.
> 
> The full change set, with more detailed descriptions, can be seen here:
> https://github.com/vanzin/hive/commits/spark-client-netty
> 
> 
> Diffs
> -
> 
>   pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/s

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 8, 2014, 9:11 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin



> On Dec. 8, 2014, 9:03 p.m., Brock Noland wrote:
> > Hey Marcelo,
> > 
> > When I send an HTTP request to the port where RSC is listening the message 
> > below is printed. Thus it's doing a good job in that it's checking the max 
> > message size which is awesome, but I feel we need to:
> > 
> > 1) Add a small header so that when junk data is sent to this port we can 
> > log a better exception than the one below. As I mentioned, we've had 
> > massive problems with this is in flume which also uses netty for 
> > communication.
> > 
> > 2) ensure the income size is not negative.
> > 
> > 
> > 2014-12-08 20:56:41,070 WARN  [RPC-Handler-7]: rpc.RpcDispatcher 
> > (RpcDispatcher.java:exceptionCaught(154)) - [HelloDispatcher] Caught 
> > exception in channel pipeline.
> > io.netty.handler.codec.DecoderException: 
> > java.lang.IllegalArgumentException: Message exceeds maximum allowed size 
> > (10485760 bytes).
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280)
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
> > at 
> > io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:108)
> > at 
> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> > at 
> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> > at 
> > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
> > at 
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at 
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at 
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.IllegalArgumentException: Message exceeds maximum 
> > allowed size (10485760 bytes).
> > at 
> > org.apache.hive.spark.client.rpc.KryoMessageCodec.checkSize(KryoMessageCodec.java:117)
> > at 
> > org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:77)
> > at 
> > io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42)
> > at 
> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
> > ... 12 more
> 
> Marcelo Vanzin wrote:
> I can add the check for negative sizes, but I still don't understand why 
> you want a header. It doesn't serve any practical purposes. The protocol 
> itself has a handshake that needs to be successful for the connection to be 
> established; adding a header will add nothing to the process, just complexity.
> 
> Brock Noland wrote:
> The only thing I would add is that it's easy for engineers who work on 
> this to look at the exception and know that it's not related, but it's not 
> easy for operations folks. When they turn on debug logging and see these 
> exceptions they will get taken off the trail of the real problem they are 
> trying to debug.

Ops folks should not turn on debug logging unless they're told to; otherwise 
they'll potentially see a lot of these kinds of things. If they do turn on 
debug logging by themselves, then they shouldn't be surprised to see things 
they may not fully understand. There's a reason why it's called "debug", and 
not "just print the log messages specific to the problem I'm having".


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/#review64279
---


On Dec. 8, 2014, 9:11 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28779/
> ---
> 
> (Updated Dec. 8, 2014, 9:11 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Bugs: HIVE-9036
>

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 8, 2014, 9:52 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 8, 2014, 9:54 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-08 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 9, 2014, 1:01 a.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-09 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 9, 2014, 6:49 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-09 Thread Marcelo Vanzin



> On Dec. 9, 2014, 7:05 p.m., Xuefu Zhang wrote:
> > pom.xml, line 152
> > <https://reviews.apache.org/r/28779/diff/7/?file=786238#file786238line152>
> >
> > Is there a reason that we cannot keep 3.7.0? Upgrading a dep version 
> > usually gives some headaches.

This version is not used anywhere in the Hive build. In fact, there is no 
version "3.7.0.Final" of "io.netty" (that's for the old "org.jboss.netty" 
package).


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/#review64411
-------


On Dec. 9, 2014, 6:49 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28779/
> ---
> 
> (Updated Dec. 9, 2014, 6:49 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Bugs: HIVE-9036
> https://issues.apache.org/jira/browse/HIVE-9036
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
> any features on top of the existing spark-client API, which is unchanged 
> (except for the need to add empty constructors in some places).
> 
> With the new backend we can think about adding some nice features such as 
> future listeners (which were awkward with akka because of Scala), but those 
> are left for a different time.
> 
> The full change set, with more detailed descriptions, can be seen here:
> https://github.com/vanzin/hive/commits/spark-client-netty
> 
> 
> Diffs
> -
> 
>   pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
>  PRE-CREATION 
>   spark-client/pom.xml PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
> PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
>  PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java
>  PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
> PRE-CREATION 
>   
> spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java
>  PRE-CREATION 
>   spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
>  PRE-CREATION 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/28779/diff/
> 
> 
> Testing
> ---
> 
> spark-client unit tests, plus some qtests.
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 28779: [spark-client] Netty-based RPC implementation.

2014-12-09 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28779/
---

(Updated Dec. 9, 2014, 9:17 p.m.)


Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu 
Zhang.


Bugs: HIVE-9036
https://issues.apache.org/jira/browse/HIVE-9036


Repository: hive-git


Description
---

This patch replaces akka with a simple netty-based RPC layer. It doesn't add 
any features on top of the existing spark-client API, which is unchanged 
(except for the need to add empty constructors in some places).

With the new backend we can think about adding some nice features such as 
future listeners (which were awkward with akka because of Scala), but those are 
left for a different time.

The full change set, with more detailed descriptions, can be seen here:
https://github.com/vanzin/hive/commits/spark-client-netty


Diffs (updated)
-

  pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 PRE-CREATION 
  spark-client/pom.xml PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java 
PRE-CREATION 
  
spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28779/diff/


Testing
---

spark-client unit tests, plus some qtests.


Thanks,

Marcelo Vanzin

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-17 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/#review65348
---

Ship it!


+1 to Xuefu's comments. The config name also looks very generic, since it's 
only applied to a couple of jobs submitted to the client. But I don't have a 
good suggestion here.

- Marcelo Vanzin


On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29145/
> ---
> 
> (Updated Dec. 17, 2014, 6:28 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9094
> https://issues.apache.org/jira/browse/HIVE-9094
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
> not launched yet
> 1. set the timeout value configurable.
> 2. set default timeout value 60s.
> 3. enable timeout for get spark job info and get spark stage info.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
> 5d6a02c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> e1946d5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
>  6217de4 
> 
> Diff: https://reviews.apache.org/r/29145/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>

Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.

2015-01-12 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29832/
---

Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9178
https://issues.apache.org/jira/browse/HIVE-9178


Repository: hive-git


Description
---

HIVE-9178. Add a synchronous RPC API to the remote Spark context.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 5c3ca018bb177ef9fd9fb24b054a9db29274b31e 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
5e767ef5eb47e493a332607204f4c522028d7d0e 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
f8b2202a465bb8abe3d2c34e49ade6387482738c 

Diff: https://reviews.apache.org/r/29832/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.

2015-01-13 Thread Marcelo Vanzin



> On Jan. 13, 2015, 6:47 a.m., chengxiang li wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java, 
> > line 55
> > <https://reviews.apache.org/r/29832/diff/1/?file=818434#file818434line55>
> >
> > In API level, it's still an asynchronous RPC API, as the use case of 
> > this API described in the javadoc, do you think it would be more clean to 
> > supply a synchronous API like: T run(Job job)?

No. With a client-side synchronous API, it's awkward to specify things like 
timeouts - you either need explicit parameters which are not really part of the 
RPC, or extra configuration. Here, you just say `client.run().get(someTimeout)` 
if you want the call to be synchronous on the client side.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29832/#review67813
-----------


On Jan. 13, 2015, 12:31 a.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29832/
> ---
> 
> (Updated Jan. 13, 2015, 12:31 a.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9178
> https://issues.apache.org/jira/browse/HIVE-9178
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9178. Add a synchronous RPC API to the remote Spark context.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
>  5c3ca018bb177ef9fd9fb24b054a9db29274b31e 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
> 5e767ef5eb47e493a332607204f4c522028d7d0e 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> f8b2202a465bb8abe3d2c34e49ade6387482738c 
> 
> Diff: https://reviews.apache.org/r/29832/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.

2015-01-14 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29832/
---

(Updated Jan. 14, 2015, 8:45 p.m.)


Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9178
https://issues.apache.org/jira/browse/HIVE-9178


Repository: hive-git


Description (updated)
---

Fix return value of synchronous RPCs.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 5c3ca018bb177ef9fd9fb24b054a9db29274b31e 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
5e767ef5eb47e493a332607204f4c522028d7d0e 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
f8b2202a465bb8abe3d2c34e49ade6387482738c 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
795d62c776cec5e9da2a24b7d40bc749a03186ab 

Diff: https://reviews.apache.org/r/29832/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.

2015-01-14 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29832/
---

(Updated Jan. 14, 2015, 8:47 p.m.)


Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9178
https://issues.apache.org/jira/browse/HIVE-9178


Repository: hive-git


Description (updated)
---

Add a synchronous RPC API to the remote Spark context.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 5c3ca018bb177ef9fd9fb24b054a9db29274b31e 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
5e767ef5eb47e493a332607204f4c522028d7d0e 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
f8b2202a465bb8abe3d2c34e49ade6387482738c 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
795d62c776cec5e9da2a24b7d40bc749a03186ab 

Diff: https://reviews.apache.org/r/29832/diff/


Testing
---


Thanks,

Marcelo Vanzin

Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-15 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/
---

Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9179
https://issues.apache.org/jira/browse/HIVE-9179


Repository: hive-git


Description
---

HIVE-9179. Add listener API to JobHandle.


Diffs
-

  spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
a30d8cbbaae9d25b1cffdc286b546f549e439545 
  spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
795d62c776cec5e9da2a24b7d40bc749a03186ab 

Diff: https://reviews.apache.org/r/29954/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin



> On Jan. 16, 2015, 7:14 p.m., Xuefu Zhang wrote:
> > One additional question for my understanding:
> > 
> > Originally Hive has to poll to get job ID after submitting a spark job, in 
> > RemoteSparkJobStatus.getSparkJobInfo(). With this patch, do we still need 
> > to do this.

Yeah, that's still needed. I thought about adding a `onSparkJobStarted` 
callback or something. If there's interest in that I can add it, should be easy.


> On Jan. 16, 2015, 7:14 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 442
> > <https://reviews.apache.org/r/29954/diff/1/?file=823288#file823288line442>
> >
> > This method, together with other existing handl() methods, are invoked 
> > using reflection, which makes the code hard to understand. I'm wondering if 
> > this can be improved.

The alternative is having cascading `if..else if..else` blocks with a bunch of 
`instanceof` checks, as was done in the akka-based code before. I think that's 
much uglier and harder to read.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/#review68430
---


On Jan. 16, 2015, 1:05 a.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29954/
> ---
> 
> (Updated Jan. 16, 2015, 1:05 a.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9179
> https://issues.apache.org/jira/browse/HIVE-9179
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9179. Add listener API to JobHandle.
> 
> 
> Diffs
> -
> 
>   spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> a30d8cbbaae9d25b1cffdc286b546f549e439545 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> 795d62c776cec5e9da2a24b7d40bc749a03186ab 
> 
> Diff: https://reviews.apache.org/r/29954/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/
---

(Updated Jan. 16, 2015, 9:22 p.m.)


Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9179
https://issues.apache.org/jira/browse/HIVE-9179


Repository: hive-git


Description
---

HIVE-9179. Add listener API to JobHandle.


Diffs (updated)
-

  spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
a30d8cbbaae9d25b1cffdc286b546f549e439545 
  spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
795d62c776cec5e9da2a24b7d40bc749a03186ab 

Diff: https://reviews.apache.org/r/29954/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin



> On Jan. 16, 2015, 10:35 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, 
> > line 179
> > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179>
> >
> > Here sparkJobIds.add() is in the synchronized block. However, we have 
> > code accessing the same variable (sparkJobIds) such as in 
> > RemoteSparkJobStatus class. Does that also needs protection?

No, we don't. The job id list itself is thread-safe. The synchronization 
happens here so that we notify all listeners of everything. We don't want a 
listener being registered concurrently with a new spark job arriving to miss 
that event.

(That reminds me that I probably should switch the order of events around if a 
listener is added after the handle is in a final state. Stay tuned.)


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/#review68492
-----------


On Jan. 16, 2015, 9:22 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29954/
> ---
> 
> (Updated Jan. 16, 2015, 9:22 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9179
> https://issues.apache.org/jira/browse/HIVE-9179
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9179. Add listener API to JobHandle.
> 
> 
> Diffs
> -
> 
>   spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> a30d8cbbaae9d25b1cffdc286b546f549e439545 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> 795d62c776cec5e9da2a24b7d40bc749a03186ab 
> 
> Diff: https://reviews.apache.org/r/29954/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin



> On Jan. 16, 2015, 10:35 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, 
> > line 179
> > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179>
> >
> > Here sparkJobIds.add() is in the synchronized block. However, we have 
> > code accessing the same variable (sparkJobIds) such as in 
> > RemoteSparkJobStatus class. Does that also needs protection?
> 
> Marcelo Vanzin wrote:
> No, we don't. The job id list itself is thread-safe. The synchronization 
> happens here so that we notify all listeners of everything. We don't want a 
> listener being registered concurrently with a new spark job arriving to miss 
> that event.
> 
> (That reminds me that I probably should switch the order of events around 
> if a listener is added after the handle is in a final state. Stay tuned.)
> 
> Xuefu Zhang wrote:
> In that case, can we move sparkJobIds.add() outside the sync block?

I don't think that works well. That can cause two different conditions 
depending on where "outside" means:

- if you do it before the synchronized block, the listener may be notified 
twice of the same Spark job
- if you do it after the synchronized block, the listener will be called with a 
Spark job that is not yet listed in `handle.getSparkJobIds()`.

Since I don't belived this will cause any performance issue at all, I'd rather 
keep the behavior consistent.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/#review68492
---


On Jan. 16, 2015, 9:22 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29954/
> ---
> 
> (Updated Jan. 16, 2015, 9:22 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9179
> https://issues.apache.org/jira/browse/HIVE-9179
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9179. Add listener API to JobHandle.
> 
> 
> Diffs
> -
> 
>   spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> a30d8cbbaae9d25b1cffdc286b546f549e439545 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> 795d62c776cec5e9da2a24b7d40bc749a03186ab 
> 
> Diff: https://reviews.apache.org/r/29954/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/
---

(Updated Jan. 16, 2015, 11:24 p.m.)


Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9179
https://issues.apache.org/jira/browse/HIVE-9179


Repository: hive-git


Description
---

HIVE-9179. Add listener API to JobHandle.


Diffs (updated)
-

  spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
  spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
  spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
a30d8cbbaae9d25b1cffdc286b546f549e439545 
  spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
PRE-CREATION 
  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
795d62c776cec5e9da2a24b7d40bc749a03186ab 

Diff: https://reviews.apache.org/r/29954/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-16 Thread Marcelo Vanzin



> On Jan. 17, 2015, 12:19 a.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, 
> > line 179
> > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179>
> >
> > Sorry I didn't get it, but why?
> > Clarity but not perf is my concern. Here we are notifying listeners 
> > with a new Spark job ID, which is done in the for loop, which is 
> > synchronized. This means no listener may be added or removed from the 
> > listeners. On the other hand, sparkJobIds.add(sparkJobId) seems irrelevant 
> > to any changes to listeners, unless I missed anything. I don't understand 
> > why either of the two cases might happen as you suggested.

Threads: T1 updating the job handle, T2 adding a listener

Case 1:
   Statement 1 (S1): sparkJobIds.add(sparkJobId);
   Statement 2 (S2): synchronized (listeners) { /* call 
onSparkJobStarted(newSparkJobId) on every listener */ }

Timeline:
T1: executes S1
T2: calls addListener(), new listener is notified of the sparkJobId added above
T1: executes S2. New listener is notified again of new spark job ID.


Case 2:
  Invert S1 and S2.
  
T2: calls addListener()
T1: executes S1. Listener is called with the current state of the handle and 
new Spark job ID. Listener checks 
`handle.getSparkJobIDs().contains(newSparkJobId)`, check fails.


Those seem pretty easy to understand to me. The current code avoids both of 
them.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/#review68513
-----------


On Jan. 16, 2015, 11:24 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29954/
> ---
> 
> (Updated Jan. 16, 2015, 11:24 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9179
> https://issues.apache.org/jira/browse/HIVE-9179
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9179. Add listener API to JobHandle.
> 
> 
> Diffs
> -
> 
>   spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> a30d8cbbaae9d25b1cffdc286b546f549e439545 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> 795d62c776cec5e9da2a24b7d40bc749a03186ab 
> 
> Diff: https://reviews.apache.org/r/29954/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.

2015-01-20 Thread Marcelo Vanzin



> On Jan. 17, 2015, 12:19 a.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, 
> > line 179
> > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179>
> >
> > Sorry I didn't get it, but why?
> > Clarity but not perf is my concern. Here we are notifying listeners 
> > with a new Spark job ID, which is done in the for loop, which is 
> > synchronized. This means no listener may be added or removed from the 
> > listeners. On the other hand, sparkJobIds.add(sparkJobId) seems irrelevant 
> > to any changes to listeners, unless I missed anything. I don't understand 
> > why either of the two cases might happen as you suggested.
> 
> Marcelo Vanzin wrote:
> Threads: T1 updating the job handle, T2 adding a listener
> 
> Case 1:
>Statement 1 (S1): sparkJobIds.add(sparkJobId);
>Statement 2 (S2): synchronized (listeners) { /* call 
> onSparkJobStarted(newSparkJobId) on every listener */ }
> 
> Timeline:
> T1: executes S1
> T2: calls addListener(), new listener is notified of the sparkJobId added 
> above
> T1: executes S2. New listener is notified again of new spark job ID.
> 
> 
> Case 2:
>   Invert S1 and S2.
>   
> T2: calls addListener()
> T1: executes S1. Listener is called with the current state of the handle 
> and new Spark job ID. Listener checks 
> `handle.getSparkJobIDs().contains(newSparkJobId)`, check fails.
> 
> 
> Those seem pretty easy to understand to me. The current code avoids both 
> of them.
> 
> Xuefu Zhang wrote:
> I see. So the shared state of the job handler consists of state, 
> listeners, and sparkJobIds, which needs to be protected. Thus, I'd suggest we 
> change synchronize(listeners) to synchronized(this) or declare the method as 
> "synchronized". No essential difference, but for better clarity.

The synchronization is *only* needed because of the listeners. It's there so 
that when you add a listener, you never miss an event - if they didn't exist, 
you wouldn't need any synchronization anywhere in this class. So it makes 
better sense to synchronize on the listeners.


- Marcelo


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29954/#review68513
---


On Jan. 16, 2015, 11:24 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29954/
> ---
> 
> (Updated Jan. 16, 2015, 11:24 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9179
> https://issues.apache.org/jira/browse/HIVE-9179
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-9179. Add listener API to JobHandle.
> 
> 
> Diffs
> -
> 
>   spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 
>   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
> f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> a30d8cbbaae9d25b1cffdc286b546f549e439545 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> 795d62c776cec5e9da2a24b7d40bc749a03186ab 
> 
> Diff: https://reviews.apache.org/r/29954/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Review Request 30385: Use SASL to establish the remote context connection.

2015-01-28 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30385/
---

Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.


Bugs: HIVE-9487
https://issues.apache.org/jira/browse/HIVE-9487


Repository: hive-git


Description
---

Instead of the insecure, ad-hoc auth mechanism currently used, perform
a SASL negotiation to establish trust. This requires the secret to be
distributed through some secure channel (just like before).

Using SASL with DIGEST-MD5 (or GSSAPI, which hasn't been tested and
probably wouldn't work well here) also allows us to add encryption
without the need for SSL (yay?).

Only DIGEST-MD5 has been really tested. Supporting other mechanisms
will probably mean adding new callback handlers in the client and
server portions, but shouldn't be hard if desired.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
d4d98d7c0c28cdb1d19c700e20537ef405be2e01 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
ce2f9b6b132dc47f899798e47d18a1f6b0dd707f 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 
3a7149341bac086e5efe931595143d3bebbdb5db 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be658a855cc15c576f1a98376fcd85475e3b7 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
 0c29c9441fb3e9daf690510a2c9b5716671e2571 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
2c858a121aaeca6af20f5e332de207694348a030 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
fffe24b3cbe6a5d7387e751adbc65f5b140c9089 
  
spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
 eff640f7b24348043dbce734510698d9294579c6 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
5e18a3c0b5ea4f1b9c83f78faa3408e2dd479c2c 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/SaslHandler.java 
PRE-CREATION 
  
spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
 af534375a3ed86a3a9ad57c2f21a9a8bf6113714 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
ec7842398d3c4112f83f00e8cd3e5d4f9fdf8ca9 

Diff: https://reviews.apache.org/r/30385/diff/


Testing
---

Unit tests.


Thanks,

Marcelo Vanzin

Re: Review Request 30385: Use SASL to establish the remote context connection.

2015-01-28 Thread Marcelo Vanzin



> On Jan. 29, 2015, 12:36 a.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java, line 
> > 20
> > <https://reviews.apache.org/r/30385/diff/1/?file=839319#file839319line20>
> >
> > Nit: if you need to submit another patch, let's not auto reorg the 
> > imports.

I changed this because someone broke it... now it's in line with the usual 
order you see in the rest of Hive code.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30385/#review70119
-------


On Jan. 28, 2015, 11:22 p.m., Marcelo Vanzin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30385/
> ---
> 
> (Updated Jan. 28, 2015, 11:22 p.m.)
> 
> 
> Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9487
> https://issues.apache.org/jira/browse/HIVE-9487
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Instead of the insecure, ad-hoc auth mechanism currently used, perform
> a SASL negotiation to establish trust. This requires the secret to be
> distributed through some secure channel (just like before).
> 
> Using SASL with DIGEST-MD5 (or GSSAPI, which hasn't been tested and
> probably wouldn't work well here) also allows us to add encryption
> without the need for SSL (yay?).
> 
> Only DIGEST-MD5 has been really tested. Supporting other mechanisms
> will probably mean adding new callback handlers in the client and
> server portions, but shouldn't be hard if desired.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> d4d98d7c0c28cdb1d19c700e20537ef405be2e01 
>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
> ce2f9b6b132dc47f899798e47d18a1f6b0dd707f 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java
>  3a7149341bac086e5efe931595143d3bebbdb5db 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 5f9be658a855cc15c576f1a98376fcd85475e3b7 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
>  0c29c9441fb3e9daf690510a2c9b5716671e2571 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 
> 2c858a121aaeca6af20f5e332de207694348a030 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java 
> fffe24b3cbe6a5d7387e751adbc65f5b140c9089 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java
>  eff640f7b24348043dbce734510698d9294579c6 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 5e18a3c0b5ea4f1b9c83f78faa3408e2dd479c2c 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/rpc/SaslHandler.java 
> PRE-CREATION 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java
>  af534375a3ed86a3a9ad57c2f21a9a8bf6113714 
>   spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
> ec7842398d3c4112f83f00e8cd3e5d4f9fdf8ca9 
> 
> Diff: https://reviews.apache.org/r/30385/diff/
> 
> 
> Testing
> ---
> 
> Unit tests.
> 
> 
> Thanks,
> 
> Marcelo Vanzin
> 
>

Review Request 32631: [HIVE-10143] Properly clean up client state when client times out.

2015-03-30 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32631/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Repository: hive-git


Description
---

Clean up needs to occur whenever the client future fails, not just
when it's explicitly cancelled.


Diffs
-

  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
b923acf78c8459cf49d47268233b328957a1ae6e 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 
8207514342bed544e1a01fc41c892825f330cf3c 

Diff: https://reviews.apache.org/r/32631/diff/


Testing
---


Thanks,

Marcelo Vanzin

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81103
---



spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
<https://reviews.apache.org/r/33422/#comment131349>

This will throw an exception if the child process exits with a non-zero 
status after the RSC connects back to HS2. I don't think you want that.



spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
<https://reviews.apache.org/r/33422/#comment131351>

While the only current call site reflects the error message, this method 
seems more generic than that. Maybe pass the error message as a parameter to 
the method?


- Marcelo Vanzin


On April 22, 2015, 12:30 a.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 22, 2015, 12:30 a.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81328
---

Ship it!


Just a minor thing left to fix.


spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
<https://reviews.apache.org/r/33422/#comment131664>

To avoid races, I'd do:

final ClientInfo cinfo = pendingClients.remove(clientId);
if (cinfo == null) { /* nothing to do */ }


- Marcelo Vanzin


On April 22, 2015, 1:25 a.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 22, 2015, 1:25 a.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Marcelo Vanzin



> On April 23, 2015, 6:22 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
> > line 176
> > <https://reviews.apache.org/r/33422/diff/2/?file=939013#file939013line176>
> >
> > I'm wondering if cinfo can be null here. After the contains() check 
> > above, things might have changed. So, cinfo is not guaranteed to be not 
> > null.

Yeah, that was my suggestion above. Don't use `containsKey`, instead just 
remove and check for null.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81361
---


On April 23, 2015, 6:11 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 23, 2015, 6:11 p.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-24 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81520
---

Ship it!


Ship It!

- Marcelo Vanzin


On April 23, 2015, 6:54 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 23, 2015, 6:54 p.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

[jira] [Created] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created HIVE-8528:


 Summary: Add remote Spark client to Hive [Spark Branch]
 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin


For the time being, at least, we've decided to build the Spark client (see 
SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: 0001-HIVE-8528-Add-Spark-Client.patch

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
> Attachments: 0001-HIVE-8528-Add-Spark-Client.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: HIVE-8528-spark-client.patch

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
> Attachments: HIVE-8528-spark-client.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: (was: 0001-HIVE-8528-Add-Spark-Client.patch)

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
> Attachments: HIVE-8528-spark-client.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: HIVE-8528.1-spark-client.patch

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-20 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: (was: HIVE-8528-spark-client.patch)

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-21 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178575#comment-14178575
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

I upgraded because my tests use {{assertNotEquals}} which was added in 4.11. 
I'll revert that change and change the test to see if it fixes the issues.

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-21 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178613#comment-14178613
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Ah, I'll also have to update the code to match changes in the Spark API, so it 
will take a little longer...

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-21 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: HIVE-8528.2-spark.patch

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-21 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178656#comment-14178656
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Hmmm, seems the rest of the Spark-related code in Hive needs to be updated to 
match the recent changes in Spark...

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-21 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178937#comment-14178937
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

https://reviews.apache.org/r/26993/

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-23 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8528:
-
Attachment: HIVE-8528.3-spark.patch

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-23 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181576#comment-14181576
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Yeah, the metrics code in general is a little hacky and sort of ugly to use. I 
need to spend more time thinking about it.

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8574) Enhance metrics gathering in Spark Client

2014-10-23 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created HIVE-8574:


 Summary: Enhance metrics gathering in Spark Client
 Key: HIVE-8574
 URL: https://issues.apache.org/jira/browse/HIVE-8574
 Project: Hive
  Issue Type: Sub-task
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin


The current implementation of metrics gathering in the Spark client is a little 
hacky. First, it's awkward to use (and the implementation is also pretty ugly). 
Second, it will just collect metrics indefinitely, so in the long term it turns 
into a huge memory leak.

We need a simplified interface and some mechanism for disposing of old metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]

2014-10-23 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181753#comment-14181753
 ] 

Marcelo Vanzin commented on HIVE-8548:
--

BTW, you can still use the client for local mode. It just means the "remote" 
context and executors will be on the same machine (but still on a different 
process, which is still a gain).

Might actually be better, since it will mean tests still go through the remote 
interface.

> Integrate with remote Spark context after HIVE-8528 [Spark Branch]
> --
>
> Key: HIVE-8548
> URL: https://issues.apache.org/jira/browse/HIVE-8548
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>
> With HIVE-8528, HiverSever2 should use remote Spark context to submit job and 
> monitor progress, etc. This is necessary if Hive runs on standalone cluster, 
> Yarn, or Mesos. If Hive runs with spark.master=local, we should continue 
> using SparkContext in current way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-23 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182278#comment-14182278
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Hi Lefty, what kind of documentation are you looking for? This is, at the 
moment, targeted at internal Hive use only, so having nice end-user 
documentation is not currently a goal. (In fact, I should probably go and add 
those annotations to the classes.)

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Fix For: spark-branch
>
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183060#comment-14183060
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Actually, Left, that's a good point, this might need some end-user 
documentation since the recommended setup is to have a full Spark installation 
available on the HS2 node. I don't know if the plan is to somehow package that 
with HS2 or leave it as a configuration step.

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: spark-branch
>
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183079#comment-14183079
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

It is optional, but I don't really think we should encourage that. A full 
install should be the recommended setup.

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: spark-branch
>
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created HIVE-8599:


 Summary: Add InterfaceAudience annotations to spark-client [Spark 
Branch]
 Key: HIVE-8599
 URL: https://issues.apache.org/jira/browse/HIVE-8599
 Project: Hive
  Issue Type: Sub-task
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8599:
-
Attachment: HIVE-8599.1-spark.patch

> Add InterfaceAudience annotations to spark-client [Spark Branch]
> 
>
> Key: HIVE-8599
> URL: https://issues.apache.org/jira/browse/HIVE-8599
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-8599.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]

2014-10-27 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185457#comment-14185457
 ] 

Marcelo Vanzin commented on HIVE-8599:
--

There isn't really any code in the change, but well: 
https://reviews.apache.org/r/27235/

> Add InterfaceAudience annotations to spark-client [Spark Branch]
> 
>
> Key: HIVE-8599
> URL: https://issues.apache.org/jira/browse/HIVE-8599
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8599.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]

2014-11-12 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208576#comment-14208576
 ] 

Marcelo Vanzin commented on HIVE-8548:
--

Suhas, I'm not sure I understand your question.

First, you can't use yarn in unit tests since Spark does not publish the 
classes needed to run against yarn in any artifact that you can depend on.

Second, the choice of "client" vs. "cluster" should make no difference when 
running unit tests. A "local" master is not Spark standalone; it's Spark 
running in a single JVM, with no cluster manager. A "local-cluster" master is 
standalone mode, similar to running a MiniYARNCluster, and should support both 
client and cluster mode.

> Integrate with remote Spark context after HIVE-8528 [Spark Branch]
> --
>
> Key: HIVE-8548
> URL: https://issues.apache.org/jira/browse/HIVE-8548
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>
> With HIVE-8528, HiverSever2 should use remote Spark context to submit job and 
> monitor progress, etc. This is necessary if Hive runs on standalone cluster, 
> Yarn, or Mesos. If Hive runs with spark.master=local, we should continue 
> using SparkContext in current way.
> We take this as root JIRA to track all Remote Spark Context integration 
> related subtasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-13 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210079#comment-14210079
 ] 

Marcelo Vanzin commented on HIVE-8854:
--

Hmmm. The {{Optional}} mess is Spark's doing and shading doesn't help there, 
since Spark exposes it in its public API.

I think the easier solution here is for the remote client to not use 
{{Optional}} at all in its communication (and, in general, avoid third-party 
libraries for types that are serialized).

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>  Labels: Spark-M3
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
>

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-13 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210080#comment-14210080
 ] 

Marcelo Vanzin commented on HIVE-8854:
--

Actually, the exception looks a little weird, thinking about it, since 
spark-core should contain both {{Optional}} and {{Absent}} from the same Guava 
version.

[~chengxiang li] could you run the Hive driver with {{-verbose:class}} and 
provide the generated output? I want to see where the classes are being loaded 
from to see if Spark's shading is somehow missing something.

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>  Labels: Spark-M3
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessa

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-14 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212660#comment-14212660
 ] 

Marcelo Vanzin commented on HIVE-8854:
--

bq. Note:com.google.common.base.Absent does not exist in guava11, it's 
counterpart should be com.google.common.base.Optional$Absent.

Damn. That's why then... let me change the spark-client code to not use Guava 
classes in types that will be serialized.

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>  Labels: Spark-M3
> Attachments: hive-dirver-classloader-info.output
>
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)

[jira] [Assigned] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-14 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned HIVE-8854:


Assignee: Marcelo Vanzin

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>    Assignee: Marcelo Vanzin
>  Labels: Spark-M3
> Attachments: hive-dirver-classloader-info.output
>
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
> at 
> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoi

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-14 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212676#comment-14212676
 ] 

Marcelo Vanzin commented on HIVE-8854:
--

Not sure. I'm looking at HIVE-8833 (https://reviews.apache.org/r/27987), and if 
you really want to support in-process SparkContext like that, then you need 
Guava 14.

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Marcelo Vanzin
>  Labels: Spark-M3
> Attachments: hive-dirver-classloader-info.output
>
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.pa

[jira] [Commented] (HIVE-8833) Unify spark client API and implement remote spark client.[Spark Branch]

2014-11-14 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212720#comment-14212720
 ] 

Marcelo Vanzin commented on HIVE-8833:
--

bq.  SparkClientImpl ignore spark driver parameters while submit job through 
SparkSubmit class to spark standalone cluster, I'm not sure why.

Can you clarify what you mean here? What exactly is the launch path here 
(in-process, spark client directly executing SparkSubmit, or spark client 
executing out-of-process spark-submit script)?

In the first two cases, there are some driver options that may not take, since 
the driver will be executing in the same process as the caller.

> Unify spark client API and implement remote spark client.[Spark Branch]
> ---
>
> Key: HIVE-8833
> URL: https://issues.apache.org/jira/browse/HIVE-8833
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8833.1-spark.patch, HIVE-8833.2-spark.patch
>
>
> Hive would support submitting spark job through both local spark client and 
> remote spark client. we should unify the spark client API, and implement 
> remote spark client through Remote Spark Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-14 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8854:
-
Attachment: HIVE-8854.1-spark.patch

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>    Assignee: Marcelo Vanzin
>  Labels: Spark-M3
> Attachments: HIVE-8854.1-spark.patch, 
> hive-dirver-classloader-info.output
>
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
> at 
> akka.remote.EndpointReader$$anonf

[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-19 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8854:
-
Status: Patch Available  (was: Open)

> Guava dependency conflict between hive driver and remote spark context[Spark 
> Branch]
> 
>
> Key: HIVE-8854
> URL: https://issues.apache.org/jira/browse/HIVE-8854
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>    Assignee: Marcelo Vanzin
>  Labels: Spark-M3
> Attachments: HIVE-8854.1-spark.patch, 
> hive-dirver-classloader-info.output
>
>
> Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
> context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
> on Hive driver side since Absent is used in Metrics, here is the hive driver 
> log:
> {noformat}
> java.lang.IllegalAccessError: tried to access method 
> com.google.common.base.Optional.()V from class 
> com.google.common.base.Absent
> at com.google.common.base.Absent.(Absent.java:35)
> at com.google.common.base.Absent.(Absent.java:33)
> at sun.misc.Unsafe.ensureClassInitialized(Native Method)
> at 
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
> at 
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
> at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
> at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
> at java.lang.reflect.Field.getLong(Field.java:591)
> at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
> at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
> at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
> at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
> at 
> akka.remote.EndpointReader$$anonf

[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223383#comment-14223383
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Haven't had a chance to look at this yet. Hopefully this week.

> Enhance metrics gathering in Spark Client [Spark Branch]
> 
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]

2014-11-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223388#comment-14223388
 ] 

Marcelo Vanzin commented on HIVE-8951:
--

`SparkClientImpl` has `stop()`, which should be cleaning things up and properly 
stopping the driver. Are you calling it?

> Spark remote context doesn't work with local-cluster [Spark Branch]
> ---
>
> Key: HIVE-8951
> URL: https://issues.apache.org/jira/browse/HIVE-8951
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> What I did:
> {code}
> set spark.home=/home/xzhang/apache/spark;
> set spark.master=local-cluster[2,1,2048];
> set hive.execution.engine=spark; 
> set spark.executor.memory=2g;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
> select name, avg(value) as v from dec group by name order by v;
> {code}
> Exeptions seen:
> {code}
> 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark
> 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master 
> spark://xzdt.local:55151...
> 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark
> 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID 
> app-20141123104215-
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20141123104215-
> 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676
> 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager
> 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager 
> xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, 
> 41676)
> 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:174)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.SparkContext.(SparkContext.scala:267)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>

[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]

2014-11-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223507#comment-14223507
 ] 

Marcelo Vanzin commented on HIVE-8951:
--

That `BindException` should not be fatal; Spark just retried on a different 
port when it happens. So something else must be going wrong.

> Spark remote context doesn't work with local-cluster [Spark Branch]
> ---
>
> Key: HIVE-8951
> URL: https://issues.apache.org/jira/browse/HIVE-8951
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> What I did:
> {code}
> set spark.home=/home/xzhang/apache/spark;
> set spark.master=local-cluster[2,1,2048];
> set hive.execution.engine=spark; 
> set spark.executor.memory=2g;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
> select name, avg(value) as v from dec group by name order by v;
> {code}
> Exeptions seen:
> {code}
> 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark
> 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master 
> spark://xzdt.local:55151...
> 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark
> 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID 
> app-20141123104215-
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20141123104215-
> 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676
> 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager
> 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager 
> xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, 
> 41676)
> 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:174)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.SparkContext.(SparkContext.scala:267)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>

[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]

2014-11-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223661#comment-14223661
 ] 

Marcelo Vanzin commented on HIVE-8951:
--

Not from just those logs. Is this easily reproduced via some unit test? (Feel 
free to send me an e-mail with reproduction steps so I can try it myself.)

> Spark remote context doesn't work with local-cluster [Spark Branch]
> ---
>
> Key: HIVE-8951
> URL: https://issues.apache.org/jira/browse/HIVE-8951
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> What I did:
> {code}
> set spark.home=/home/xzhang/apache/spark;
> set spark.master=local-cluster[2,1,2048];
> set hive.execution.engine=spark; 
> set spark.executor.memory=2g;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
> select name, avg(value) as v from dec group by name order by v;
> {code}
> Exeptions seen:
> {code}
> 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark
> 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master 
> spark://xzdt.local:55151...
> 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark
> 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID 
> app-20141123104215-
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20141123104215-
> 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676
> 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager
> 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager 
> xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, 
> 41676)
> 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:174)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.SparkContext.(SparkContext.scala:267)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)

[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]

2014-11-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224022#comment-14224022
 ] 

Marcelo Vanzin commented on HIVE-8951:
--

Xuefu, increasing the timeout should be fine, but you also mentioned that the 
child driver stuck around after the timeout. If that's the case we should still 
fix that bug.

> Spark remote context doesn't work with local-cluster [Spark Branch]
> ---
>
> Key: HIVE-8951
> URL: https://issues.apache.org/jira/browse/HIVE-8951
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-8951.1-spark.patch
>
>
> What I did:
> {code}
> set spark.home=/home/xzhang/apache/spark;
> set spark.master=local-cluster[2,1,2048];
> set hive.execution.engine=spark; 
> set spark.executor.memory=2g;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
> select name, avg(value) as v from dec group by name order by v;
> {code}
> Exeptions seen:
> {code}
> 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark
> 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master 
> spark://xzdt.local:55151...
> 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark
> 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID 
> app-20141123104215-
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20141123104215-
> 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676
> 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager
> 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager 
> xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, 
> 41676)
> 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:174)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.SparkContext.(SparkContext.scala:267)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAcc

[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]

2014-11-25 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225049#comment-14225049
 ] 

Marcelo Vanzin commented on HIVE-8836:
--

I talked briefly with Brock about this, but the main thing here is that, right 
now, Spark is not very friendly to applications that are trying to embed it. As 
you've noticed, the assembly jar, which contains almost everything you need to 
run Spark, is not published in maven or anywhere. And not all artifacts used to 
build the assembly are published - for example, the Yarn backend cannot be 
found anywhere in maven, so without the assembly you cannot submit jobs to Yarn.

I've suggested it in the past, but I think right now, or until Spark makes 
itself more friendly to such use cases, Hive should require a full Spark 
install to work. If desired we could use the hacks I added to the remote client 
to not need the full install for unit tests, but even those are very limited; 
it probably only works with a "local" master as some of you may have noticed.

> Enable automatic tests with remote spark client.[Spark Branch]
> --
>
> Key: HIVE-8836
> URL: https://issues.apache.org/jira/browse/HIVE-8836
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Rui Li
>  Labels: Spark-M3
> Attachments: HIVE-8836-brock-1.patch, HIVE-8836-brock-2.patch, 
> HIVE-8836-brock-3.patch, HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
> HIVE-8836.3-spark.patch
>
>
> In real production environment, remote spark client should be used to submit 
> spark job for Hive mostly, we should enable automatic test with remote spark 
> client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution[Spark Branch]

2014-11-25 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225082#comment-14225082
 ] 

Marcelo Vanzin commented on HIVE-8956:
--

This is ok if it unblocks something right now. For the code, I'd suggest using 
{{System.nanoTime()}} to calculate durations, since it's monotonic. And use 
{{long}} instead of {{int}}.

But I think a better approach is needed here. Currently the {{JobSubmitted}} 
message seems to only be sent when you use Spark's async APIs to submit a Spark 
job. A remote client job that does not use those APIs would never generate that 
message. Also, the backend uses a thread pool to execute jobs - so if you're 
queueing up multiple jobs, you may hit this timeout.

I think we need more fine-grained remote client-level events for tracking job 
progress. e.g., adding {{JobReceived}} and {{JobStarted}} messages would be a 
good start ({{JobResult}} already covers the "job finished" case). I think 
these two extra messages should be enough to cover the problems described in 
this bug.

> Hive hangs while some error/exception happens beyond job execution[Spark 
> Branch]
> 
>
> Key: HIVE-8956
> URL: https://issues.apache.org/jira/browse/HIVE-8956
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Rui Li
>  Labels: Spark-M3
> Attachments: HIVE-8956.1-spark.patch
>
>
> Remote spark client communicate with remote spark context asynchronously, if 
> error/exception is throw out during job execution in remote spark context, it 
> would be wrapped and send back to remote spark client, but if error/exception 
> is throw out beyond job execution, such as job serialized failed, remote 
> spark client would never know what's going on in remote spark context, and it 
> would hangs there.
> Set a timeout in remote spark client side may not a great idea, as we are not 
> sure how long the query executed in spark cluster. we need find a way to 
> check whether job has failed(whole life cycle) in remote spark context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-25 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225276#comment-14225276
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Hey [~chengxiang li], I'd like to have a better understanding of how these 
metrics will be used by Hive to come up with the proper fix here.

I see two approaches:

* Add an API to clean up the metrics. This keeps the current "collect all 
metrics" approach, but adds APIs that will to delete the metrics. This assumes 
that Hive will always process metrics of finished jobs, even if just to ask 
them to be deleted.

* Suggested by [~xuefuz]: add a timeout after a job is finished for cleaning up 
the metrics. This means that Hive has some time after a job finished where this 
data will be available, but after that, it's gone.

I could also add some internal checks so that the collection doesn't keep 
acumulating data indefinitely if data is never deleted; like track only the 
last "x" finished jobs, evicting the oldest when a new job starts.

What do you think?



> Enhance metrics gathering in Spark Client [Spark Branch]
> 
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226504#comment-14226504
 ] 

Marcelo Vanzin commented on HIVE-8956:
--

I haven't looked at akka in that much detail to see if there is some API to 
catch those. You can enable akka logging (set {{spark.akka.logLifecycleEvents}} 
to true) and that will print these errors to the logs. Spark tries to serialize 
data before sending it to akka, to try to catch serialization issues, but that 
adds overhead, and it also doesn't help in the deserialization path...

> Hive hangs while some error/exception happens beyond job execution [Spark 
> Branch]
> -
>
> Key: HIVE-8956
> URL: https://issues.apache.org/jira/browse/HIVE-8956
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Rui Li
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-8956.1-spark.patch
>
>
> Remote spark client communicate with remote spark context asynchronously, if 
> error/exception is throw out during job execution in remote spark context, it 
> would be wrapped and send back to remote spark client, but if error/exception 
> is throw out beyond job execution, such as job serialized failed, remote 
> spark client would never know what's going on in remote spark context, and it 
> would hangs there.
> Set a timeout in remote spark client side may not a great idea, as we are not 
> sure how long the query executed in spark cluster. we need find a way to 
> check whether job has failed(whole life cycle) in remote spark context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226515#comment-14226515
 ] 

Marcelo Vanzin commented on HIVE-8957:
--

I think a fix here will be a little more complicated than that. Let me look at 
the code and think about it.

> Remote spark context needs to clean up itself in case of connection timeout 
> [Spark Branch]
> --
>
> Key: HIVE-8957
> URL: https://issues.apache.org/jira/browse/HIVE-8957
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-8957.1-spark.patch
>
>
> In the current SparkClient implementation (class SparkClientImpl), the 
> constructor does some initialization and in the end waits for the remote 
> driver to connect. In case of timeout, it just throws an exception without 
> cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226517#comment-14226517
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Actually, after a quick look at the code again, this might not be a problem. 
Metrics are kept per-job handle. Job handles are managed by the code submitting 
jobs - leave them for garbage collection and metrics go away.

So unless we're worried about a single job creating so many tasks that it will 
run the driver out of memory with all the metrics data, this shouldn't really 
be an issue.

> Enhance metrics gathering in Spark Client [Spark Branch]
> 
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226668#comment-14226668
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Rounding up, each task metrics data structure will take around 256 bytes. So 
~25MB?

> Enhance metrics gathering in Spark Client [Spark Branch]
> 
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]

2014-12-01 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230227#comment-14230227
 ] 

Marcelo Vanzin commented on HIVE-8991:
--

Hi [~lirui], the patch looks good if it unblocks the unit tests. I have to 
think a bit about whether it would work in a real deployment scenario, since 
IIRC hive-exec shades a lot of dependencies and it might cause problems with 
Spark. But the main one (Guava) should be solved in Spark, so hopefully there 
won't be other cases like that.

> Fix custom_input_output_format [Spark Branch]
> -
>
> Key: HIVE-8991
> URL: https://issues.apache.org/jira/browse/HIVE-8991
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8991.1-spark.patch
>
>
> After HIVE-8836, {{custom_input_output_format}} fails because of missing 
> hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230254#comment-14230254
 ] 

Marcelo Vanzin commented on HIVE-8995:
--

The three threads are from akka; I wonder if the test code is failing to 
properly shut down clients or the library itself (i.e. call 
{{SparkClientFactory.stop()}}).

> Find thread leak in RSC Tests
> -
>
> Key: HIVE-8995
> URL: https://issues.apache.org/jira/browse/HIVE-8995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>
> I was regenerating output as part of the merge:
> {noformat}
> mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
> -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
>  
> auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
>  
> bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
>  
> join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
>  
> join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
>  
> mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
>  
> ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
>  
> skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_

[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230273#comment-14230273
 ] 

Marcelo Vanzin commented on HIVE-8995:
--

You don't need to call that method for every session. The pattern here is:

* Call {{SparkClientFactory.initialize()}} once
* Create / use as many clients as you want
* When app shuts down, call {{SparkClientFactory.stop()}}

So this should work nicely for HS2 (call initialize during bring up, call stop 
during shut down).

I see {{RemoteHiveSparkClient}} calls initialize; that seems wrong, if my 
understanding of that class is correct (that it will be instantiated once for 
each session).

Another option is to make {{initialize}} idempotent; right now it will just 
leak the old akka actor system, which is bad. This should be a trivial change 
(just add a check for {{initialized}}).

> Find thread leak in RSC Tests
> -
>
> Key: HIVE-8995
> URL: https://issues.apache.org/jira/browse/HIVE-8995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>
> I was regenerating output as part of the merge:
> {noformat}
> mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
> -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
>  
> auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
>  
> bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
>  
> join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
>  
> join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
>  
> mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
>  
> ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoin

[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-12-01 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230478#comment-14230478
 ] 

Marcelo Vanzin commented on HIVE-8957:
--

If you don't mind the bug remaining unattended for several days, sure. I have 
my hands full with all sorts of other things at the moment.

> Remote spark context needs to clean up itself in case of connection timeout 
> [Spark Branch]
> --
>
> Key: HIVE-8957
> URL: https://issues.apache.org/jira/browse/HIVE-8957
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-8957.1-spark.patch
>
>
> In the current SparkClient implementation (class SparkClientImpl), the 
> constructor does some initialization and in the end waits for the remote 
> driver to connect. In case of timeout, it just throws an exception without 
> cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]

2014-12-02 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231866#comment-14231866
 ] 

Marcelo Vanzin commented on HIVE-8991:
--

I didn't mean to stop you guys from checking in this patch. I just said that 
while this may fix the test, it's an indication of something that we need to 
understand better (i.e. how to properly add jars to the Spark job's classpath 
without causing conflicts).

> Fix custom_input_output_format [Spark Branch]
> -
>
> Key: HIVE-8991
> URL: https://issues.apache.org/jira/browse/HIVE-8991
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8991.1-spark.patch
>
>
> After HIVE-8836, {{custom_input_output_format}} fails because of missing 
> hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-12-05 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved HIVE-8574.
--
Resolution: Not a Problem

I'll close this as "not a problem" for now. If we decide the overhead is too 
much, we can revisit it.

As for the ugly API, currently I couldn't think of a way to avoid it. Spark's 
API is just not very friendly in this area.

> Enhance metrics gathering in Spark Client [Spark Branch]
> 
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
>  Issue Type: Sub-task
>      Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-05 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created HIVE-9036:


 Summary: Replace akka for remote spark client RPC [Spark Branch]
 Key: HIVE-9036
 URL: https://issues.apache.org/jira/browse/HIVE-9036
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin


We've had weird issues with akka, especially when something goes wrong and it 
becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-05 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.1-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-05 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Status: Patch Available  (was: Open)

Patch is rather large but shouldn't be too complicated; and there are unit 
tests! (Plus I've run some of the qtests.)

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236447#comment-14236447
 ] 

Marcelo Vanzin commented on HIVE-9036:
--

I'll look at why the patch isn't applying later... probably need to rebase my 
branch.

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.2-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: (was: HIVE-9036.2-spark.patch)

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.2-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Status: Open  (was: Patch Available)

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Status: Patch Available  (was: Open)

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.3-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, 
> HIVE-9036.3-spark.patch
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238670#comment-14238670
 ] 

Marcelo Vanzin commented on HIVE-9036:
--

I have a live job in that state, should be better for debugging.

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, 
> HIVE-9036.3-spark.patch, rsc-problem-1.tar.gz
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-08 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.4-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, 
> HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, rsc-problem-1.tar.gz
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-09 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.5-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, 
> HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, HIVE-9036.5-spark.patch, 
> rsc-problem-1.tar.gz
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]

2014-12-09 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-9036:
-
Attachment: HIVE-9036.6-spark.patch

> Replace akka for remote spark client RPC [Spark Branch]
> ---
>
> Key: HIVE-9036
> URL: https://issues.apache.org/jira/browse/HIVE-9036
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>        Reporter: Marcelo Vanzin
>    Assignee: Marcelo Vanzin
> Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, 
> HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, HIVE-9036.5-spark.patch, 
> HIVE-9036.6-spark.patch, rsc-problem-1.tar.gz
>
>
> We've had weird issues with akka, especially when something goes wrong and it 
> becomes a little hard to debug. Let's replace it with a simple(r) RPC system 
> built on top of netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9085) Spark Client RPC should have larger default max message size [Spark Branch]

2014-12-11 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243516#comment-14243516
 ] 

Marcelo Vanzin commented on HIVE-9085:
--

LGTM (as discussed by e-mail).

> Spark Client RPC should have larger default max message size [Spark Branch]
> ---
>
> Key: HIVE-9085
> URL: https://issues.apache.org/jira/browse/HIVE-9085
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-9085-spark.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9085) Spark Client RPC should have larger default max message size [Spark Branch]

2014-12-11 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243535#comment-14243535
 ] 

Marcelo Vanzin commented on HIVE-9085:
--

When an exception is thrown in the write path, it's not safe to use the RPC 
channel anymore. Partial data may have been written to the socket and may cause 
both endpoints to get out of sync.

Right now the approach the code has taken is "close the socket on any error". 
If we'd prefer, in the long term, a more resilient approach, more modifications 
will have to be made.

> Spark Client RPC should have larger default max message size [Spark Branch]
> ---
>
> Key: HIVE-9085
> URL: https://issues.apache.org/jira/browse/HIVE-9085
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-9085-spark.1.patch, HIVE-9085-spark.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]

2014-12-12 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244944#comment-14244944
 ] 

Marcelo Vanzin commented on HIVE-9017:
--

These files are created by Spark when downloading resources for the app (e.g. 
application jars). In standalone mode, by default, these files will end up in 
/tmp (java.io.tmpdir). The problem is that the app doesn't clean up these 
files; in fact, it can't, because they are supposed to be shared in case 
multiple executors run on the same host - so one executor cannot unilaterally 
decide to delete them.

(That's not entirely true; I guess it could, but then it would cause other 
executors to re-download the file when needed, so more overhead.)

This is not a problem in Yarn mode, since the temp dir is under a Yarn-managed 
directory that is deleted when the app shuts down.

So, while I think of a clean way to fix this in Spark, the following can be 
done on the Hive side:

- create an app-specific temp directory before launching the Spark app
- set {{spark.local.dir}} to that location
- delete the directory when the client shuts down

> Clean up temp files of RSC [Spark Branch]
> -
>
> Key: HIVE-9017
> URL: https://issues.apache.org/jira/browse/HIVE-9017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>
> Currently RSC will leave a lot of temp files in {{/tmp}}, including 
> {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]

2014-12-12 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244948#comment-14244948
 ] 

Marcelo Vanzin commented on HIVE-9017:
--

P.S.: that solution will probably not work very well in real standalone mode, 
since {{spark.local.dir}} would have to be created / deleted on every node in 
the cluster, and the client probably doesn't have the means to do that.

> Clean up temp files of RSC [Spark Branch]
> -
>
> Key: HIVE-9017
> URL: https://issues.apache.org/jira/browse/HIVE-9017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>
> Currently RSC will leave a lot of temp files in {{/tmp}}, including 
> {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]

2014-12-12 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244951#comment-14244951
 ] 

Marcelo Vanzin commented on HIVE-9017:
--

Correct. All files written by spark will end up under that directory (right now 
they all end up in /tmp since it's not set).

> Clean up temp files of RSC [Spark Branch]
> -
>
> Key: HIVE-9017
> URL: https://issues.apache.org/jira/browse/HIVE-9017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>
> Currently RSC will leave a lot of temp files in {{/tmp}}, including 
> {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 132 matches

Mail list logo