Re: Review Request 27987: HIVE-8833 implement remote spark client
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27987/#review61479 --- Ship it! LGTM, just small nits. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java <https://reviews.apache.org/r/27987/#comment103130> nit: space before { Maybe implement Closeable? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java <https://reviews.apache.org/r/27987/#comment103134> Use "properties.load(Reader)" instead, so you can force UTF-8 encoding. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java <https://reviews.apache.org/r/27987/#comment103135> Doesn't this work? for (Map.Entry entry : hiveConf) ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java <https://reviews.apache.org/r/27987/#comment103137> Is Hive still using commons-logging? slf4j makes this much better since it handles format strings for you... ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java <https://reviews.apache.org/r/27987/#comment103145> Don't you get warnings here since JobHandle needs a type parameter? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java <https://reviews.apache.org/r/27987/#comment103148> You could use: new URI(path).getScheme() != null ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java <https://reviews.apache.org/r/27987/#comment103150> You could use: new File(path).toURI().toURL() - Marcelo Vanzin On Nov. 14, 2014, 3:43 a.m., chengxiang li wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27987/ > --- > > (Updated Nov. 14, 2014, 3:43 a.m.) > > > Review request for hive, Rui Li, Szehon Ho, and Xuefu Zhang. > > > Bugs: HIVE-8833 > https://issues.apache.org/jira/browse/HIVE-8833 > > > Repository: hive-git > > > Description > --- > > Hive would support submitting spark job through both local spark client and > remote spark client. we should unify the spark client API, and implement > remote spark client through Remote Spark Context. > > > Diffs > - > > ql/pom.xml 06d7f27 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java ee16c9e > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 2fea62d > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java > e3e6d16 > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java > 51e0510 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobRef.java > bf43b6e > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java > d4d14a3 > spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java > 8346b28 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 5af66ee > > Diff: https://reviews.apache.org/r/27987/diff/ > > > Testing > --- > > > Thanks, > > chengxiang li > >
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
E-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 8, 2014, 7:40 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description (updated) --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 8, 2014, 7:47 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
> On Dec. 8, 2014, 9:03 p.m., Brock Noland wrote: > > Hey Marcelo, > > > > When I send an HTTP request to the port where RSC is listening the message > > below is printed. Thus it's doing a good job in that it's checking the max > > message size which is awesome, but I feel we need to: > > > > 1) Add a small header so that when junk data is sent to this port we can > > log a better exception than the one below. As I mentioned, we've had > > massive problems with this is in flume which also uses netty for > > communication. > > > > 2) ensure the income size is not negative. > > > > > > 2014-12-08 20:56:41,070 WARN [RPC-Handler-7]: rpc.RpcDispatcher > > (RpcDispatcher.java:exceptionCaught(154)) - [HelloDispatcher] Caught > > exception in channel pipeline. > > io.netty.handler.codec.DecoderException: > > java.lang.IllegalArgumentException: Message exceeds maximum allowed size > > (10485760 bytes). > > at > > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280) > > at > > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) > > at > > io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:108) > > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > > at > > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > > at > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.IllegalArgumentException: Message exceeds maximum > > allowed size (10485760 bytes). > > at > > org.apache.hive.spark.client.rpc.KryoMessageCodec.checkSize(KryoMessageCodec.java:117) > > at > > org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:77) > > at > > io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42) > > at > > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) > > ... 12 more I can add the check for negative sizes, but I still don't understand why you want a header. It doesn't serve any practical purposes. The protocol itself has a handshake that needs to be successful for the connection to be established; adding a header will add nothing to the process, just complexity. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/#review64279 --- On Dec. 8, 2014, 7:47 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28779/ > --- > > (Updated Dec. 8, 2014, 7:47 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu > Zhang. > > > Bugs: HIVE-9036 > https://issues.apache.org/jira/browse/HIVE-9036 > > > Repository: hive-git > > > Description > --- > > This patch replaces akka with a simple netty-based RPC layer. It doesn't add > any features on top of the existing spark-client API, which is unchanged > (except for the need to add empty constructors in some places). > > With the new backend we can think about adding some nice features such as > future listeners (which were awkward with akka because of Scala), but those > are left for a different time. > > The full change set, with more detailed descriptions, can be seen here: > https://github.com/vanzin/hive/commits/spark-client-netty > > > Diffs > - > > pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 > ql/src/java/org/apache/hadoop/hive/ql/exec/s
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 8, 2014, 9:11 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
> On Dec. 8, 2014, 9:03 p.m., Brock Noland wrote: > > Hey Marcelo, > > > > When I send an HTTP request to the port where RSC is listening the message > > below is printed. Thus it's doing a good job in that it's checking the max > > message size which is awesome, but I feel we need to: > > > > 1) Add a small header so that when junk data is sent to this port we can > > log a better exception than the one below. As I mentioned, we've had > > massive problems with this is in flume which also uses netty for > > communication. > > > > 2) ensure the income size is not negative. > > > > > > 2014-12-08 20:56:41,070 WARN [RPC-Handler-7]: rpc.RpcDispatcher > > (RpcDispatcher.java:exceptionCaught(154)) - [HelloDispatcher] Caught > > exception in channel pipeline. > > io.netty.handler.codec.DecoderException: > > java.lang.IllegalArgumentException: Message exceeds maximum allowed size > > (10485760 bytes). > > at > > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280) > > at > > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) > > at > > io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:108) > > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > > at > > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > > at > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.IllegalArgumentException: Message exceeds maximum > > allowed size (10485760 bytes). > > at > > org.apache.hive.spark.client.rpc.KryoMessageCodec.checkSize(KryoMessageCodec.java:117) > > at > > org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:77) > > at > > io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42) > > at > > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) > > ... 12 more > > Marcelo Vanzin wrote: > I can add the check for negative sizes, but I still don't understand why > you want a header. It doesn't serve any practical purposes. The protocol > itself has a handshake that needs to be successful for the connection to be > established; adding a header will add nothing to the process, just complexity. > > Brock Noland wrote: > The only thing I would add is that it's easy for engineers who work on > this to look at the exception and know that it's not related, but it's not > easy for operations folks. When they turn on debug logging and see these > exceptions they will get taken off the trail of the real problem they are > trying to debug. Ops folks should not turn on debug logging unless they're told to; otherwise they'll potentially see a lot of these kinds of things. If they do turn on debug logging by themselves, then they shouldn't be surprised to see things they may not fully understand. There's a reason why it's called "debug", and not "just print the log messages specific to the problem I'm having". - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/#review64279 --- On Dec. 8, 2014, 9:11 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28779/ > --- > > (Updated Dec. 8, 2014, 9:11 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu > Zhang. > > > Bugs: HIVE-9036 >
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 8, 2014, 9:52 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 8, 2014, 9:54 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 9, 2014, 1:01 a.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 9, 2014, 6:49 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
> On Dec. 9, 2014, 7:05 p.m., Xuefu Zhang wrote: > > pom.xml, line 152 > > <https://reviews.apache.org/r/28779/diff/7/?file=786238#file786238line152> > > > > Is there a reason that we cannot keep 3.7.0? Upgrading a dep version > > usually gives some headaches. This version is not used anywhere in the Hive build. In fact, there is no version "3.7.0.Final" of "io.netty" (that's for the old "org.jboss.netty" package). - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/#review64411 ------- On Dec. 9, 2014, 6:49 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28779/ > --- > > (Updated Dec. 9, 2014, 6:49 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu > Zhang. > > > Bugs: HIVE-9036 > https://issues.apache.org/jira/browse/HIVE-9036 > > > Repository: hive-git > > > Description > --- > > This patch replaces akka with a simple netty-based RPC layer. It doesn't add > any features on top of the existing spark-client API, which is unchanged > (except for the need to add empty constructors in some places). > > With the new backend we can think about adding some nice features such as > future listeners (which were awkward with akka because of Scala), but those > are left for a different time. > > The full change set, with more detailed descriptions, can be seen here: > https://github.com/vanzin/hive/commits/spark-client-netty > > > Diffs > - > > pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java > PRE-CREATION > spark-client/pom.xml PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java > PRE-CREATION > > spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java > PRE-CREATION > spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java > PRE-CREATION > spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/28779/diff/ > > > Testing > --- > > spark-client unit tests, plus some qtests. > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 28779: [spark-client] Netty-based RPC implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28779/ --- (Updated Dec. 9, 2014, 9:17 p.m.) Review request for hive, Brock Noland, chengxiang li, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-9036 https://issues.apache.org/jira/browse/HIVE-9036 Repository: hive-git Description --- This patch replaces akka with a simple netty-based RPC layer. It doesn't add any features on top of the existing spark-client API, which is unchanged (except for the need to add empty constructors in some places). With the new backend we can think about adding some nice features such as future listeners (which were awkward with akka because of Scala), but those are left for a different time. The full change set, with more detailed descriptions, can be seen here: https://github.com/vanzin/hive/commits/spark-client-netty Diffs (updated) - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java PRE-CREATION spark-client/pom.xml PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/ClientUtils.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/Protocol.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/InputMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/Metrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleReadMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcException.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounterGroup.java PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounters.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java PRE-CREATION Diff: https://reviews.apache.org/r/28779/diff/ Testing --- spark-client unit tests, plus some qtests. Thanks, Marcelo Vanzin
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/#review65348 --- Ship it! +1 to Xuefu's comments. The config name also looks very generic, since it's only applied to a couple of jobs submitted to the client. But I don't have a good suggestion here. - Marcelo Vanzin On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29145/ > --- > > (Updated Dec. 17, 2014, 6:28 a.m.) > > > Review request for hive and Xuefu Zhang. > > > Bugs: HIVE-9094 > https://issues.apache.org/jira/browse/HIVE-9094 > > > Repository: hive-git > > > Description > --- > > RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has > not launched yet > 1. set the timeout value configurable. > 2. set default timeout value 60s. > 3. enable timeout for get spark job info and get spark stage info. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java > 5d6a02c > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java > e1946d5 > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java > 6217de4 > > Diff: https://reviews.apache.org/r/29145/diff/ > > > Testing > --- > > > Thanks, > > chengxiang li > >
Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/ --- Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9178 https://issues.apache.org/jira/browse/HIVE-9178 Repository: hive-git Description --- HIVE-9178. Add a synchronous RPC API to the remote Spark context. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 5c3ca018bb177ef9fd9fb24b054a9db29274b31e spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 5e767ef5eb47e493a332607204f4c522028d7d0e spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java f8b2202a465bb8abe3d2c34e49ade6387482738c Diff: https://reviews.apache.org/r/29832/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.
> On Jan. 13, 2015, 6:47 a.m., chengxiang li wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java, > > line 55 > > <https://reviews.apache.org/r/29832/diff/1/?file=818434#file818434line55> > > > > In API level, it's still an asynchronous RPC API, as the use case of > > this API described in the javadoc, do you think it would be more clean to > > supply a synchronous API like: T run(Job job)? No. With a client-side synchronous API, it's awkward to specify things like timeouts - you either need explicit parameters which are not really part of the RPC, or extra configuration. Here, you just say `client.run().get(someTimeout)` if you want the call to be synchronous on the client side. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/#review67813 ----------- On Jan. 13, 2015, 12:31 a.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29832/ > --- > > (Updated Jan. 13, 2015, 12:31 a.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9178 > https://issues.apache.org/jira/browse/HIVE-9178 > > > Repository: hive-git > > > Description > --- > > HIVE-9178. Add a synchronous RPC API to the remote Spark context. > > > Diffs > - > > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java > 5c3ca018bb177ef9fd9fb24b054a9db29274b31e > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java > 5e767ef5eb47e493a332607204f4c522028d7d0e > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > f8b2202a465bb8abe3d2c34e49ade6387482738c > > Diff: https://reviews.apache.org/r/29832/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/ --- (Updated Jan. 14, 2015, 8:45 p.m.) Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9178 https://issues.apache.org/jira/browse/HIVE-9178 Repository: hive-git Description (updated) --- Fix return value of synchronous RPCs. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 5c3ca018bb177ef9fd9fb24b054a9db29274b31e spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 5e767ef5eb47e493a332607204f4c522028d7d0e spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java f8b2202a465bb8abe3d2c34e49ade6387482738c spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 795d62c776cec5e9da2a24b7d40bc749a03186ab Diff: https://reviews.apache.org/r/29832/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/ --- (Updated Jan. 14, 2015, 8:47 p.m.) Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9178 https://issues.apache.org/jira/browse/HIVE-9178 Repository: hive-git Description (updated) --- Add a synchronous RPC API to the remote Spark context. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 5c3ca018bb177ef9fd9fb24b054a9db29274b31e spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 5e767ef5eb47e493a332607204f4c522028d7d0e spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java f8b2202a465bb8abe3d2c34e49ade6387482738c spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 795d62c776cec5e9da2a24b7d40bc749a03186ab Diff: https://reviews.apache.org/r/29832/diff/ Testing --- Thanks, Marcelo Vanzin
Review Request 29954: HIVE-9179. Add listener API to JobHandle.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/ --- Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9179 https://issues.apache.org/jira/browse/HIVE-9179 Repository: hive-git Description --- HIVE-9179. Add listener API to JobHandle. Diffs - spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java a30d8cbbaae9d25b1cffdc286b546f549e439545 spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 795d62c776cec5e9da2a24b7d40bc749a03186ab Diff: https://reviews.apache.org/r/29954/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
> On Jan. 16, 2015, 7:14 p.m., Xuefu Zhang wrote: > > One additional question for my understanding: > > > > Originally Hive has to poll to get job ID after submitting a spark job, in > > RemoteSparkJobStatus.getSparkJobInfo(). With this patch, do we still need > > to do this. Yeah, that's still needed. I thought about adding a `onSparkJobStarted` callback or something. If there's interest in that I can add it, should be easy. > On Jan. 16, 2015, 7:14 p.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java, > > line 442 > > <https://reviews.apache.org/r/29954/diff/1/?file=823288#file823288line442> > > > > This method, together with other existing handl() methods, are invoked > > using reflection, which makes the code hard to understand. I'm wondering if > > this can be improved. The alternative is having cascading `if..else if..else` blocks with a bunch of `instanceof` checks, as was done in the akka-based code before. I think that's much uglier and harder to read. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/#review68430 --- On Jan. 16, 2015, 1:05 a.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29954/ > --- > > (Updated Jan. 16, 2015, 1:05 a.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9179 > https://issues.apache.org/jira/browse/HIVE-9179 > > > Repository: hive-git > > > Description > --- > > HIVE-9179. Add listener API to JobHandle. > > > Diffs > - > > spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java > e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > a30d8cbbaae9d25b1cffdc286b546f549e439545 > spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > 795d62c776cec5e9da2a24b7d40bc749a03186ab > > Diff: https://reviews.apache.org/r/29954/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/ --- (Updated Jan. 16, 2015, 9:22 p.m.) Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9179 https://issues.apache.org/jira/browse/HIVE-9179 Repository: hive-git Description --- HIVE-9179. Add listener API to JobHandle. Diffs (updated) - spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java a30d8cbbaae9d25b1cffdc286b546f549e439545 spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 795d62c776cec5e9da2a24b7d40bc749a03186ab Diff: https://reviews.apache.org/r/29954/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
> On Jan. 16, 2015, 10:35 p.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, > > line 179 > > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179> > > > > Here sparkJobIds.add() is in the synchronized block. However, we have > > code accessing the same variable (sparkJobIds) such as in > > RemoteSparkJobStatus class. Does that also needs protection? No, we don't. The job id list itself is thread-safe. The synchronization happens here so that we notify all listeners of everything. We don't want a listener being registered concurrently with a new spark job arriving to miss that event. (That reminds me that I probably should switch the order of events around if a listener is added after the handle is in a final state. Stay tuned.) - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/#review68492 ----------- On Jan. 16, 2015, 9:22 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29954/ > --- > > (Updated Jan. 16, 2015, 9:22 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9179 > https://issues.apache.org/jira/browse/HIVE-9179 > > > Repository: hive-git > > > Description > --- > > HIVE-9179. Add listener API to JobHandle. > > > Diffs > - > > spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java > e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > a30d8cbbaae9d25b1cffdc286b546f549e439545 > spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > 795d62c776cec5e9da2a24b7d40bc749a03186ab > > Diff: https://reviews.apache.org/r/29954/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
> On Jan. 16, 2015, 10:35 p.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, > > line 179 > > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179> > > > > Here sparkJobIds.add() is in the synchronized block. However, we have > > code accessing the same variable (sparkJobIds) such as in > > RemoteSparkJobStatus class. Does that also needs protection? > > Marcelo Vanzin wrote: > No, we don't. The job id list itself is thread-safe. The synchronization > happens here so that we notify all listeners of everything. We don't want a > listener being registered concurrently with a new spark job arriving to miss > that event. > > (That reminds me that I probably should switch the order of events around > if a listener is added after the handle is in a final state. Stay tuned.) > > Xuefu Zhang wrote: > In that case, can we move sparkJobIds.add() outside the sync block? I don't think that works well. That can cause two different conditions depending on where "outside" means: - if you do it before the synchronized block, the listener may be notified twice of the same Spark job - if you do it after the synchronized block, the listener will be called with a Spark job that is not yet listed in `handle.getSparkJobIds()`. Since I don't belived this will cause any performance issue at all, I'd rather keep the behavior consistent. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/#review68492 --- On Jan. 16, 2015, 9:22 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29954/ > --- > > (Updated Jan. 16, 2015, 9:22 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9179 > https://issues.apache.org/jira/browse/HIVE-9179 > > > Repository: hive-git > > > Description > --- > > HIVE-9179. Add listener API to JobHandle. > > > Diffs > - > > spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java > e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > a30d8cbbaae9d25b1cffdc286b546f549e439545 > spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > 795d62c776cec5e9da2a24b7d40bc749a03186ab > > Diff: https://reviews.apache.org/r/29954/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/ --- (Updated Jan. 16, 2015, 11:24 p.m.) Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9179 https://issues.apache.org/jira/browse/HIVE-9179 Repository: hive-git Description --- HIVE-9179. Add listener API to JobHandle. Diffs (updated) - spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java a30d8cbbaae9d25b1cffdc286b546f549e439545 spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 795d62c776cec5e9da2a24b7d40bc749a03186ab Diff: https://reviews.apache.org/r/29954/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
> On Jan. 17, 2015, 12:19 a.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, > > line 179 > > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179> > > > > Sorry I didn't get it, but why? > > Clarity but not perf is my concern. Here we are notifying listeners > > with a new Spark job ID, which is done in the for loop, which is > > synchronized. This means no listener may be added or removed from the > > listeners. On the other hand, sparkJobIds.add(sparkJobId) seems irrelevant > > to any changes to listeners, unless I missed anything. I don't understand > > why either of the two cases might happen as you suggested. Threads: T1 updating the job handle, T2 adding a listener Case 1: Statement 1 (S1): sparkJobIds.add(sparkJobId); Statement 2 (S2): synchronized (listeners) { /* call onSparkJobStarted(newSparkJobId) on every listener */ } Timeline: T1: executes S1 T2: calls addListener(), new listener is notified of the sparkJobId added above T1: executes S2. New listener is notified again of new spark job ID. Case 2: Invert S1 and S2. T2: calls addListener() T1: executes S1. Listener is called with the current state of the handle and new Spark job ID. Listener checks `handle.getSparkJobIDs().contains(newSparkJobId)`, check fails. Those seem pretty easy to understand to me. The current code avoids both of them. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/#review68513 ----------- On Jan. 16, 2015, 11:24 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29954/ > --- > > (Updated Jan. 16, 2015, 11:24 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9179 > https://issues.apache.org/jira/browse/HIVE-9179 > > > Repository: hive-git > > > Description > --- > > HIVE-9179. Add listener API to JobHandle. > > > Diffs > - > > spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java > e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > a30d8cbbaae9d25b1cffdc286b546f549e439545 > spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > 795d62c776cec5e9da2a24b7d40bc749a03186ab > > Diff: https://reviews.apache.org/r/29954/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Re: Review Request 29954: HIVE-9179. Add listener API to JobHandle.
> On Jan. 17, 2015, 12:19 a.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java, > > line 179 > > <https://reviews.apache.org/r/29954/diff/1-2/?file=823286#file823286line179> > > > > Sorry I didn't get it, but why? > > Clarity but not perf is my concern. Here we are notifying listeners > > with a new Spark job ID, which is done in the for loop, which is > > synchronized. This means no listener may be added or removed from the > > listeners. On the other hand, sparkJobIds.add(sparkJobId) seems irrelevant > > to any changes to listeners, unless I missed anything. I don't understand > > why either of the two cases might happen as you suggested. > > Marcelo Vanzin wrote: > Threads: T1 updating the job handle, T2 adding a listener > > Case 1: >Statement 1 (S1): sparkJobIds.add(sparkJobId); >Statement 2 (S2): synchronized (listeners) { /* call > onSparkJobStarted(newSparkJobId) on every listener */ } > > Timeline: > T1: executes S1 > T2: calls addListener(), new listener is notified of the sparkJobId added > above > T1: executes S2. New listener is notified again of new spark job ID. > > > Case 2: > Invert S1 and S2. > > T2: calls addListener() > T1: executes S1. Listener is called with the current state of the handle > and new Spark job ID. Listener checks > `handle.getSparkJobIDs().contains(newSparkJobId)`, check fails. > > > Those seem pretty easy to understand to me. The current code avoids both > of them. > > Xuefu Zhang wrote: > I see. So the shared state of the job handler consists of state, > listeners, and sparkJobIds, which needs to be protected. Thus, I'd suggest we > change synchronize(listeners) to synchronized(this) or declare the method as > "synchronized". No essential difference, but for better clarity. The synchronization is *only* needed because of the listeners. It's there so that when you add a listener, you never miss an event - if they didn't exist, you wouldn't need any synchronization anywhere in this class. So it makes better sense to synchronize on the listeners. - Marcelo ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29954/#review68513 --- On Jan. 16, 2015, 11:24 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29954/ > --- > > (Updated Jan. 16, 2015, 11:24 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9179 > https://issues.apache.org/jira/browse/HIVE-9179 > > > Repository: hive-git > > > Description > --- > > HIVE-9179. Add listener API to JobHandle. > > > Diffs > - > > spark-client/pom.xml 77016df61a0bcbd94058bcbd2825c6c210a70e14 > spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java > f9c10b196ab47b5b4f4c0126ad455869ab68f0ca > spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java > e760ce35d92bedf4d301b08ec57d1c2dc37a39f0 > spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java > 1b8feedb0b23aa7897dc6ac37ea5c0209e71d573 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > a30d8cbbaae9d25b1cffdc286b546f549e439545 > spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java > 795d62c776cec5e9da2a24b7d40bc749a03186ab > > Diff: https://reviews.apache.org/r/29954/diff/ > > > Testing > --- > > > Thanks, > > Marcelo Vanzin > >
Review Request 30385: Use SASL to establish the remote context connection.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30385/ --- Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9487 https://issues.apache.org/jira/browse/HIVE-9487 Repository: hive-git Description --- Instead of the insecure, ad-hoc auth mechanism currently used, perform a SASL negotiation to establish trust. This requires the secret to be distributed through some secure channel (just like before). Using SASL with DIGEST-MD5 (or GSSAPI, which hasn't been tested and probably wouldn't work well here) also allows us to add encryption without the need for SSL (yay?). Only DIGEST-MD5 has been really tested. Supporting other mechanisms will probably mean adding new callback handlers in the client and server portions, but shouldn't be hard if desired. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d4d98d7c0c28cdb1d19c700e20537ef405be2e01 spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java ce2f9b6b132dc47f899798e47d18a1f6b0dd707f spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 3a7149341bac086e5efe931595143d3bebbdb5db spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be658a855cc15c576f1a98376fcd85475e3b7 spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java 0c29c9441fb3e9daf690510a2c9b5716671e2571 spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md 2c858a121aaeca6af20f5e332de207694348a030 spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java fffe24b3cbe6a5d7387e751adbc65f5b140c9089 spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java eff640f7b24348043dbce734510698d9294579c6 spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 5e18a3c0b5ea4f1b9c83f78faa3408e2dd479c2c spark-client/src/main/java/org/apache/hive/spark/client/rpc/SaslHandler.java PRE-CREATION spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java af534375a3ed86a3a9ad57c2f21a9a8bf6113714 spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java ec7842398d3c4112f83f00e8cd3e5d4f9fdf8ca9 Diff: https://reviews.apache.org/r/30385/diff/ Testing --- Unit tests. Thanks, Marcelo Vanzin
Re: Review Request 30385: Use SASL to establish the remote context connection.
> On Jan. 29, 2015, 12:36 a.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java, line > > 20 > > <https://reviews.apache.org/r/30385/diff/1/?file=839319#file839319line20> > > > > Nit: if you need to submit another patch, let's not auto reorg the > > imports. I changed this because someone broke it... now it's in line with the usual order you see in the rest of Hive code. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30385/#review70119 ------- On Jan. 28, 2015, 11:22 p.m., Marcelo Vanzin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30385/ > --- > > (Updated Jan. 28, 2015, 11:22 p.m.) > > > Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. > > > Bugs: HIVE-9487 > https://issues.apache.org/jira/browse/HIVE-9487 > > > Repository: hive-git > > > Description > --- > > Instead of the insecure, ad-hoc auth mechanism currently used, perform > a SASL negotiation to establish trust. This requires the secret to be > distributed through some secure channel (just like before). > > Using SASL with DIGEST-MD5 (or GSSAPI, which hasn't been tested and > probably wouldn't work well here) also allows us to add encryption > without the need for SSL (yay?). > > Only DIGEST-MD5 has been really tested. Supporting other mechanisms > will probably mean adding new callback handlers in the client and > server portions, but shouldn't be hard if desired. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > d4d98d7c0c28cdb1d19c700e20537ef405be2e01 > spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java > ce2f9b6b132dc47f899798e47d18a1f6b0dd707f > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java > 3a7149341bac086e5efe931595143d3bebbdb5db > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 5f9be658a855cc15c576f1a98376fcd85475e3b7 > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java > 0c29c9441fb3e9daf690510a2c9b5716671e2571 > spark-client/src/main/java/org/apache/hive/spark/client/rpc/README.md > 2c858a121aaeca6af20f5e332de207694348a030 > spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java > fffe24b3cbe6a5d7387e751adbc65f5b140c9089 > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java > eff640f7b24348043dbce734510698d9294579c6 > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > 5e18a3c0b5ea4f1b9c83f78faa3408e2dd479c2c > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/SaslHandler.java > PRE-CREATION > > spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestKryoMessageCodec.java > af534375a3ed86a3a9ad57c2f21a9a8bf6113714 > spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java > ec7842398d3c4112f83f00e8cd3e5d4f9fdf8ca9 > > Diff: https://reviews.apache.org/r/30385/diff/ > > > Testing > --- > > Unit tests. > > > Thanks, > > Marcelo Vanzin > >
Review Request 32631: [HIVE-10143] Properly clean up client state when client times out.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32631/ --- Review request for hive, Szehon Ho and Xuefu Zhang. Repository: hive-git Description --- Clean up needs to occur whenever the client future fails, not just when it's explicitly cancelled. Diffs - spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java b923acf78c8459cf49d47268233b328957a1ae6e spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java 8207514342bed544e1a01fc41c892825f330cf3c Diff: https://reviews.apache.org/r/32631/diff/ Testing --- Thanks, Marcelo Vanzin
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81103 --- spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java <https://reviews.apache.org/r/33422/#comment131349> This will throw an exception if the child process exits with a non-zero status after the RSC connects back to HS2. I don't think you want that. spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java <https://reviews.apache.org/r/33422/#comment131351> While the only current call site reflects the error message, this method seems more generic than that. Maybe pass the error message as a parameter to the method? - Marcelo Vanzin On April 22, 2015, 12:30 a.m., Chao Sun wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33422/ > --- > > (Updated April 22, 2015, 12:30 a.m.) > > > Review request for hive and Marcelo Vanzin. > > > Bugs: HIVE-10434 > https://issues.apache.org/jira/browse/HIVE-10434 > > > Repository: hive-git > > > Description > --- > > This patch cancels the connection from HS2 to remote process once the latter > has failed and exited with error code, to > avoid potential long timeout. > It add a new public method cancelClient to the RpcServer class - not sure > whether there's an easier way to do this.. > > > Diffs > - > > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 71e432d > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > 32d4c46 > > Diff: https://reviews.apache.org/r/33422/diff/ > > > Testing > --- > > Tested on my own cluster, and it worked. > > > Thanks, > > Chao Sun > >
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81328 --- Ship it! Just a minor thing left to fix. spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java <https://reviews.apache.org/r/33422/#comment131664> To avoid races, I'd do: final ClientInfo cinfo = pendingClients.remove(clientId); if (cinfo == null) { /* nothing to do */ } - Marcelo Vanzin On April 22, 2015, 1:25 a.m., Chao Sun wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33422/ > --- > > (Updated April 22, 2015, 1:25 a.m.) > > > Review request for hive and Marcelo Vanzin. > > > Bugs: HIVE-10434 > https://issues.apache.org/jira/browse/HIVE-10434 > > > Repository: hive-git > > > Description > --- > > This patch cancels the connection from HS2 to remote process once the latter > has failed and exited with error code, to > avoid potential long timeout. > It add a new public method cancelClient to the RpcServer class - not sure > whether there's an easier way to do this.. > > > Diffs > - > > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 71e432d > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > 32d4c46 > > Diff: https://reviews.apache.org/r/33422/diff/ > > > Testing > --- > > Tested on my own cluster, and it worked. > > > Thanks, > > Chao Sun > >
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
> On April 23, 2015, 6:22 p.m., Xuefu Zhang wrote: > > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, > > line 176 > > <https://reviews.apache.org/r/33422/diff/2/?file=939013#file939013line176> > > > > I'm wondering if cinfo can be null here. After the contains() check > > above, things might have changed. So, cinfo is not guaranteed to be not > > null. Yeah, that was my suggestion above. Don't use `containsKey`, instead just remove and check for null. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81361 --- On April 23, 2015, 6:11 p.m., Chao Sun wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33422/ > --- > > (Updated April 23, 2015, 6:11 p.m.) > > > Review request for hive and Marcelo Vanzin. > > > Bugs: HIVE-10434 > https://issues.apache.org/jira/browse/HIVE-10434 > > > Repository: hive-git > > > Description > --- > > This patch cancels the connection from HS2 to remote process once the latter > has failed and exited with error code, to > avoid potential long timeout. > It add a new public method cancelClient to the RpcServer class - not sure > whether there's an easier way to do this.. > > > Diffs > - > > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 71e432d > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > 32d4c46 > > Diff: https://reviews.apache.org/r/33422/diff/ > > > Testing > --- > > Tested on my own cluster, and it worked. > > > Thanks, > > Chao Sun > >
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81520 --- Ship it! Ship It! - Marcelo Vanzin On April 23, 2015, 6:54 p.m., Chao Sun wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33422/ > --- > > (Updated April 23, 2015, 6:54 p.m.) > > > Review request for hive and Marcelo Vanzin. > > > Bugs: HIVE-10434 > https://issues.apache.org/jira/browse/HIVE-10434 > > > Repository: hive-git > > > Description > --- > > This patch cancels the connection from HS2 to remote process once the latter > has failed and exited with error code, to > avoid potential long timeout. > It add a new public method cancelClient to the RpcServer class - not sure > whether there's an easier way to do this.. > > > Diffs > - > > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 71e432d > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > 32d4c46 > > Diff: https://reviews.apache.org/r/33422/diff/ > > > Testing > --- > > Tested on my own cluster, and it worked. > > > Thanks, > > Chao Sun > >
[jira] [Created] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
Marcelo Vanzin created HIVE-8528: Summary: Add remote Spark client to Hive [Spark Branch] Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: 0001-HIVE-8528-Add-Spark-Client.patch > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Attachments: 0001-HIVE-8528-Add-Spark-Client.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: HIVE-8528-spark-client.patch > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Attachments: HIVE-8528-spark-client.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: (was: 0001-HIVE-8528-Add-Spark-Client.patch) > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Attachments: HIVE-8528-spark-client.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: HIVE-8528.1-spark-client.patch > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: (was: HIVE-8528-spark-client.patch) > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178575#comment-14178575 ] Marcelo Vanzin commented on HIVE-8528: -- I upgraded because my tests use {{assertNotEquals}} which was added in 4.11. I'll revert that change and change the test to see if it fixes the issues. > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178613#comment-14178613 ] Marcelo Vanzin commented on HIVE-8528: -- Ah, I'll also have to update the code to match changes in the Spark API, so it will take a little longer... > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: HIVE-8528.2-spark.patch > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178656#comment-14178656 ] Marcelo Vanzin commented on HIVE-8528: -- Hmmm, seems the rest of the Spark-related code in Hive needs to be updated to match the recent changes in Spark... > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178937#comment-14178937 ] Marcelo Vanzin commented on HIVE-8528: -- https://reviews.apache.org/r/26993/ > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8528: - Attachment: HIVE-8528.3-spark.patch > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181576#comment-14181576 ] Marcelo Vanzin commented on HIVE-8528: -- Yeah, the metrics code in general is a little hacky and sort of ugly to use. I need to spend more time thinking about it. > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8574) Enhance metrics gathering in Spark Client
Marcelo Vanzin created HIVE-8574: Summary: Enhance metrics gathering in Spark Client Key: HIVE-8574 URL: https://issues.apache.org/jira/browse/HIVE-8574 Project: Hive Issue Type: Sub-task Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin The current implementation of metrics gathering in the Spark client is a little hacky. First, it's awkward to use (and the implementation is also pretty ugly). Second, it will just collect metrics indefinitely, so in the long term it turns into a huge memory leak. We need a simplified interface and some mechanism for disposing of old metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181753#comment-14181753 ] Marcelo Vanzin commented on HIVE-8548: -- BTW, you can still use the client for local mode. It just means the "remote" context and executors will be on the same machine (but still on a different process, which is still a gain). Might actually be better, since it will mean tests still go through the remote interface. > Integrate with remote Spark context after HIVE-8528 [Spark Branch] > -- > > Key: HIVE-8548 > URL: https://issues.apache.org/jira/browse/HIVE-8548 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chengxiang Li > > With HIVE-8528, HiverSever2 should use remote Spark context to submit job and > monitor progress, etc. This is necessary if Hive runs on standalone cluster, > Yarn, or Mesos. If Hive runs with spark.master=local, we should continue > using SparkContext in current way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182278#comment-14182278 ] Marcelo Vanzin commented on HIVE-8528: -- Hi Lefty, what kind of documentation are you looking for? This is, at the moment, targeted at internal Hive use only, so having nice end-user documentation is not currently a goal. (In fact, I should probably go and add those annotations to the classes.) > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Fix For: spark-branch > > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183060#comment-14183060 ] Marcelo Vanzin commented on HIVE-8528: -- Actually, Left, that's a good point, this might need some end-user documentation since the recommended setup is to have a full Spark installation available on the HS2 node. I don't know if the plan is to somehow package that with HS2 or leave it as a configuration step. > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: spark-branch > > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183079#comment-14183079 ] Marcelo Vanzin commented on HIVE-8528: -- It is optional, but I don't really think we should encourage that. A full install should be the recommended setup. > Add remote Spark client to Hive [Spark Branch] > -- > > Key: HIVE-8528 > URL: https://issues.apache.org/jira/browse/HIVE-8528 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: spark-branch > > Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, > HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch > > > For the time being, at least, we've decided to build the Spark client (see > SPARK-3215) inside Hive. This task tracks merging the ongoing work into the > Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]
Marcelo Vanzin created HIVE-8599: Summary: Add InterfaceAudience annotations to spark-client [Spark Branch] Key: HIVE-8599 URL: https://issues.apache.org/jira/browse/HIVE-8599 Project: Hive Issue Type: Sub-task Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8599: - Attachment: HIVE-8599.1-spark.patch > Add InterfaceAudience annotations to spark-client [Spark Branch] > > > Key: HIVE-8599 > URL: https://issues.apache.org/jira/browse/HIVE-8599 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-8599.1-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8599) Add InterfaceAudience annotations to spark-client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185457#comment-14185457 ] Marcelo Vanzin commented on HIVE-8599: -- There isn't really any code in the change, but well: https://reviews.apache.org/r/27235/ > Add InterfaceAudience annotations to spark-client [Spark Branch] > > > Key: HIVE-8599 > URL: https://issues.apache.org/jira/browse/HIVE-8599 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: HIVE-8599.1-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208576#comment-14208576 ] Marcelo Vanzin commented on HIVE-8548: -- Suhas, I'm not sure I understand your question. First, you can't use yarn in unit tests since Spark does not publish the classes needed to run against yarn in any artifact that you can depend on. Second, the choice of "client" vs. "cluster" should make no difference when running unit tests. A "local" master is not Spark standalone; it's Spark running in a single JVM, with no cluster manager. A "local-cluster" master is standalone mode, similar to running a MiniYARNCluster, and should support both client and cluster mode. > Integrate with remote Spark context after HIVE-8528 [Spark Branch] > -- > > Key: HIVE-8548 > URL: https://issues.apache.org/jira/browse/HIVE-8548 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chengxiang Li > > With HIVE-8528, HiverSever2 should use remote Spark context to submit job and > monitor progress, etc. This is necessary if Hive runs on standalone cluster, > Yarn, or Mesos. If Hive runs with spark.master=local, we should continue > using SparkContext in current way. > We take this as root JIRA to track all Remote Spark Context integration > related subtasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210079#comment-14210079 ] Marcelo Vanzin commented on HIVE-8854: -- Hmmm. The {{Optional}} mess is Spark's doing and shading doesn't help there, since Spark exposes it in its public API. I think the easier solution here is for the remote client to not use {{Optional}} at all in its communication (and, in general, avoid third-party libraries for types that are serialized). > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Labels: Spark-M3 > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) >
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210080#comment-14210080 ] Marcelo Vanzin commented on HIVE-8854: -- Actually, the exception looks a little weird, thinking about it, since spark-core should contain both {{Optional}} and {{Absent}} from the same Guava version. [~chengxiang li] could you run the Hive driver with {{-verbose:class}} and provide the generated output? I want to see where the classes are being loaded from to see if Spark's shading is somehow missing something. > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Labels: Spark-M3 > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessa
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212660#comment-14212660 ] Marcelo Vanzin commented on HIVE-8854: -- bq. Note:com.google.common.base.Absent does not exist in guava11, it's counterpart should be com.google.common.base.Optional$Absent. Damn. That's why then... let me change the spark-client code to not use Guava classes in types that will be serialized. > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Labels: Spark-M3 > Attachments: hive-dirver-classloader-info.output > > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
[jira] [Assigned] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned HIVE-8854: Assignee: Marcelo Vanzin > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Assignee: Marcelo Vanzin > Labels: Spark-M3 > Attachments: hive-dirver-classloader-info.output > > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) > at > akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoi
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212676#comment-14212676 ] Marcelo Vanzin commented on HIVE-8854: -- Not sure. I'm looking at HIVE-8833 (https://reviews.apache.org/r/27987), and if you really want to support in-process SparkContext like that, then you need Guava 14. > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Marcelo Vanzin > Labels: Spark-M3 > Attachments: hive-dirver-classloader-info.output > > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.pa
[jira] [Commented] (HIVE-8833) Unify spark client API and implement remote spark client.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212720#comment-14212720 ] Marcelo Vanzin commented on HIVE-8833: -- bq. SparkClientImpl ignore spark driver parameters while submit job through SparkSubmit class to spark standalone cluster, I'm not sure why. Can you clarify what you mean here? What exactly is the launch path here (in-process, spark client directly executing SparkSubmit, or spark client executing out-of-process spark-submit script)? In the first two cases, there are some driver options that may not take, since the driver will be executing in the same process as the caller. > Unify spark client API and implement remote spark client.[Spark Branch] > --- > > Key: HIVE-8833 > URL: https://issues.apache.org/jira/browse/HIVE-8833 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: Spark-M3 > Attachments: HIVE-8833.1-spark.patch, HIVE-8833.2-spark.patch > > > Hive would support submitting spark job through both local spark client and > remote spark client. we should unify the spark client API, and implement > remote spark client through Remote Spark Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8854: - Attachment: HIVE-8854.1-spark.patch > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Assignee: Marcelo Vanzin > Labels: Spark-M3 > Attachments: HIVE-8854.1-spark.patch, > hive-dirver-classloader-info.output > > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) > at > akka.remote.EndpointReader$$anonf
[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8854: - Status: Patch Available (was: Open) > Guava dependency conflict between hive driver and remote spark context[Spark > Branch] > > > Key: HIVE-8854 > URL: https://issues.apache.org/jira/browse/HIVE-8854 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li > Assignee: Marcelo Vanzin > Labels: Spark-M3 > Attachments: HIVE-8854.1-spark.patch, > hive-dirver-classloader-info.output > > > Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark > context depends on guava 14.0.1, It should be JobMetrics deserialize failed > on Hive driver side since Absent is used in Metrics, here is the hive driver > log: > {noformat} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Optional.()V from class > com.google.common.base.Absent > at com.google.common.base.Absent.(Absent.java:35) > at com.google.common.base.Absent.(Absent.java:33) > at sun.misc.Unsafe.ensureClassInitialized(Native Method) > at > sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) > at > sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) > at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) > at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) > at java.lang.reflect.Field.getLong(Field.java:591) > at > java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) > at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > at scala.util.Try$.apply(Try.scala:161) > at > akka.serialization.Serialization.deserialize(Serialization.scala:98) > at > akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) > at > akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) > at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) > at > akka.remote.EndpointReader$$anonf
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223383#comment-14223383 ] Marcelo Vanzin commented on HIVE-8574: -- Haven't had a chance to look at this yet. Hopefully this week. > Enhance metrics gathering in Spark Client [Spark Branch] > > > Key: HIVE-8574 > URL: https://issues.apache.org/jira/browse/HIVE-8574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The current implementation of metrics gathering in the Spark client is a > little hacky. First, it's awkward to use (and the implementation is also > pretty ugly). Second, it will just collect metrics indefinitely, so in the > long term it turns into a huge memory leak. > We need a simplified interface and some mechanism for disposing of old > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223388#comment-14223388 ] Marcelo Vanzin commented on HIVE-8951: -- `SparkClientImpl` has `stop()`, which should be cleaning things up and properly stopping the driver. Are you calling it? > Spark remote context doesn't work with local-cluster [Spark Branch] > --- > > Key: HIVE-8951 > URL: https://issues.apache.org/jira/browse/HIVE-8951 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > What I did: > {code} > set spark.home=/home/xzhang/apache/spark; > set spark.master=local-cluster[2,1,2048]; > set hive.execution.engine=spark; > set spark.executor.memory=2g; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; > select name, avg(value) as v from dec group by name order by v; > {code} > Exeptions seen: > {code} > 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark > 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master > spark://xzdt.local:55151... > 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark > 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID > app-20141123104215- > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20141123104215- > 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676 > 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager > 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager > xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, > 41676) > 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:174) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:102) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.SparkContext.(SparkContext.scala:267) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) >
[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223507#comment-14223507 ] Marcelo Vanzin commented on HIVE-8951: -- That `BindException` should not be fatal; Spark just retried on a different port when it happens. So something else must be going wrong. > Spark remote context doesn't work with local-cluster [Spark Branch] > --- > > Key: HIVE-8951 > URL: https://issues.apache.org/jira/browse/HIVE-8951 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > What I did: > {code} > set spark.home=/home/xzhang/apache/spark; > set spark.master=local-cluster[2,1,2048]; > set hive.execution.engine=spark; > set spark.executor.memory=2g; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; > select name, avg(value) as v from dec group by name order by v; > {code} > Exeptions seen: > {code} > 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark > 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master > spark://xzdt.local:55151... > 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark > 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID > app-20141123104215- > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20141123104215- > 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676 > 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager > 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager > xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, > 41676) > 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:174) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:102) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.SparkContext.(SparkContext.scala:267) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) >
[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223661#comment-14223661 ] Marcelo Vanzin commented on HIVE-8951: -- Not from just those logs. Is this easily reproduced via some unit test? (Feel free to send me an e-mail with reproduction steps so I can try it myself.) > Spark remote context doesn't work with local-cluster [Spark Branch] > --- > > Key: HIVE-8951 > URL: https://issues.apache.org/jira/browse/HIVE-8951 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > What I did: > {code} > set spark.home=/home/xzhang/apache/spark; > set spark.master=local-cluster[2,1,2048]; > set hive.execution.engine=spark; > set spark.executor.memory=2g; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; > select name, avg(value) as v from dec group by name order by v; > {code} > Exeptions seen: > {code} > 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark > 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master > spark://xzdt.local:55151... > 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark > 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID > app-20141123104215- > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20141123104215- > 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676 > 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager > 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager > xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, > 41676) > 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:174) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:102) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.SparkContext.(SparkContext.scala:267) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616)
[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224022#comment-14224022 ] Marcelo Vanzin commented on HIVE-8951: -- Xuefu, increasing the timeout should be fine, but you also mentioned that the child driver stuck around after the timeout. If that's the case we should still fix that bug. > Spark remote context doesn't work with local-cluster [Spark Branch] > --- > > Key: HIVE-8951 > URL: https://issues.apache.org/jira/browse/HIVE-8951 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-8951.1-spark.patch > > > What I did: > {code} > set spark.home=/home/xzhang/apache/spark; > set spark.master=local-cluster[2,1,2048]; > set hive.execution.engine=spark; > set spark.executor.memory=2g; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; > select name, avg(value) as v from dec group by name order by v; > {code} > Exeptions seen: > {code} > 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark > 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master > spark://xzdt.local:55151... > 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark > 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID > app-20141123104215- > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20141123104215- > 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676 > 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager > 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager > xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, > 41676) > 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:174) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:102) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.SparkContext.(SparkContext.scala:267) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAcc
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225049#comment-14225049 ] Marcelo Vanzin commented on HIVE-8836: -- I talked briefly with Brock about this, but the main thing here is that, right now, Spark is not very friendly to applications that are trying to embed it. As you've noticed, the assembly jar, which contains almost everything you need to run Spark, is not published in maven or anywhere. And not all artifacts used to build the assembly are published - for example, the Yarn backend cannot be found anywhere in maven, so without the assembly you cannot submit jobs to Yarn. I've suggested it in the past, but I think right now, or until Spark makes itself more friendly to such use cases, Hive should require a full Spark install to work. If desired we could use the hacks I added to the remote client to not need the full install for unit tests, but even those are very limited; it probably only works with a "local" master as some of you may have noticed. > Enable automatic tests with remote spark client.[Spark Branch] > -- > > Key: HIVE-8836 > URL: https://issues.apache.org/jira/browse/HIVE-8836 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Rui Li > Labels: Spark-M3 > Attachments: HIVE-8836-brock-1.patch, HIVE-8836-brock-2.patch, > HIVE-8836-brock-3.patch, HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, > HIVE-8836.3-spark.patch > > > In real production environment, remote spark client should be used to submit > spark job for Hive mostly, we should enable automatic test with remote spark > client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225082#comment-14225082 ] Marcelo Vanzin commented on HIVE-8956: -- This is ok if it unblocks something right now. For the code, I'd suggest using {{System.nanoTime()}} to calculate durations, since it's monotonic. And use {{long}} instead of {{int}}. But I think a better approach is needed here. Currently the {{JobSubmitted}} message seems to only be sent when you use Spark's async APIs to submit a Spark job. A remote client job that does not use those APIs would never generate that message. Also, the backend uses a thread pool to execute jobs - so if you're queueing up multiple jobs, you may hit this timeout. I think we need more fine-grained remote client-level events for tracking job progress. e.g., adding {{JobReceived}} and {{JobStarted}} messages would be a good start ({{JobResult}} already covers the "job finished" case). I think these two extra messages should be enough to cover the problems described in this bug. > Hive hangs while some error/exception happens beyond job execution[Spark > Branch] > > > Key: HIVE-8956 > URL: https://issues.apache.org/jira/browse/HIVE-8956 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Rui Li > Labels: Spark-M3 > Attachments: HIVE-8956.1-spark.patch > > > Remote spark client communicate with remote spark context asynchronously, if > error/exception is throw out during job execution in remote spark context, it > would be wrapped and send back to remote spark client, but if error/exception > is throw out beyond job execution, such as job serialized failed, remote > spark client would never know what's going on in remote spark context, and it > would hangs there. > Set a timeout in remote spark client side may not a great idea, as we are not > sure how long the query executed in spark cluster. we need find a way to > check whether job has failed(whole life cycle) in remote spark context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225276#comment-14225276 ] Marcelo Vanzin commented on HIVE-8574: -- Hey [~chengxiang li], I'd like to have a better understanding of how these metrics will be used by Hive to come up with the proper fix here. I see two approaches: * Add an API to clean up the metrics. This keeps the current "collect all metrics" approach, but adds APIs that will to delete the metrics. This assumes that Hive will always process metrics of finished jobs, even if just to ask them to be deleted. * Suggested by [~xuefuz]: add a timeout after a job is finished for cleaning up the metrics. This means that Hive has some time after a job finished where this data will be available, but after that, it's gone. I could also add some internal checks so that the collection doesn't keep acumulating data indefinitely if data is never deleted; like track only the last "x" finished jobs, evicting the oldest when a new job starts. What do you think? > Enhance metrics gathering in Spark Client [Spark Branch] > > > Key: HIVE-8574 > URL: https://issues.apache.org/jira/browse/HIVE-8574 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The current implementation of metrics gathering in the Spark client is a > little hacky. First, it's awkward to use (and the implementation is also > pretty ugly). Second, it will just collect metrics indefinitely, so in the > long term it turns into a huge memory leak. > We need a simplified interface and some mechanism for disposing of old > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226504#comment-14226504 ] Marcelo Vanzin commented on HIVE-8956: -- I haven't looked at akka in that much detail to see if there is some API to catch those. You can enable akka logging (set {{spark.akka.logLifecycleEvents}} to true) and that will print these errors to the logs. Spark tries to serialize data before sending it to akka, to try to catch serialization issues, but that adds overhead, and it also doesn't help in the deserialization path... > Hive hangs while some error/exception happens beyond job execution [Spark > Branch] > - > > Key: HIVE-8956 > URL: https://issues.apache.org/jira/browse/HIVE-8956 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Rui Li > Labels: Spark-M3 > Fix For: spark-branch > > Attachments: HIVE-8956.1-spark.patch > > > Remote spark client communicate with remote spark context asynchronously, if > error/exception is throw out during job execution in remote spark context, it > would be wrapped and send back to remote spark client, but if error/exception > is throw out beyond job execution, such as job serialized failed, remote > spark client would never know what's going on in remote spark context, and it > would hangs there. > Set a timeout in remote spark client side may not a great idea, as we are not > sure how long the query executed in spark cluster. we need find a way to > check whether job has failed(whole life cycle) in remote spark context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226515#comment-14226515 ] Marcelo Vanzin commented on HIVE-8957: -- I think a fix here will be a little more complicated than that. Let me look at the code and think about it. > Remote spark context needs to clean up itself in case of connection timeout > [Spark Branch] > -- > > Key: HIVE-8957 > URL: https://issues.apache.org/jira/browse/HIVE-8957 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-8957.1-spark.patch > > > In the current SparkClient implementation (class SparkClientImpl), the > constructor does some initialization and in the end waits for the remote > driver to connect. In case of timeout, it just throws an exception without > cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226517#comment-14226517 ] Marcelo Vanzin commented on HIVE-8574: -- Actually, after a quick look at the code again, this might not be a problem. Metrics are kept per-job handle. Job handles are managed by the code submitting jobs - leave them for garbage collection and metrics go away. So unless we're worried about a single job creating so many tasks that it will run the driver out of memory with all the metrics data, this shouldn't really be an issue. > Enhance metrics gathering in Spark Client [Spark Branch] > > > Key: HIVE-8574 > URL: https://issues.apache.org/jira/browse/HIVE-8574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The current implementation of metrics gathering in the Spark client is a > little hacky. First, it's awkward to use (and the implementation is also > pretty ugly). Second, it will just collect metrics indefinitely, so in the > long term it turns into a huge memory leak. > We need a simplified interface and some mechanism for disposing of old > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226668#comment-14226668 ] Marcelo Vanzin commented on HIVE-8574: -- Rounding up, each task metrics data structure will take around 256 bytes. So ~25MB? > Enhance metrics gathering in Spark Client [Spark Branch] > > > Key: HIVE-8574 > URL: https://issues.apache.org/jira/browse/HIVE-8574 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > > The current implementation of metrics gathering in the Spark client is a > little hacky. First, it's awkward to use (and the implementation is also > pretty ugly). Second, it will just collect metrics indefinitely, so in the > long term it turns into a huge memory leak. > We need a simplified interface and some mechanism for disposing of old > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230227#comment-14230227 ] Marcelo Vanzin commented on HIVE-8991: -- Hi [~lirui], the patch looks good if it unblocks the unit tests. I have to think a bit about whether it would work in a real deployment scenario, since IIRC hive-exec shades a lot of dependencies and it might cause problems with Spark. But the main one (Guava) should be solved in Spark, so hopefully there won't be other cases like that. > Fix custom_input_output_format [Spark Branch] > - > > Key: HIVE-8991 > URL: https://issues.apache.org/jira/browse/HIVE-8991 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-8991.1-spark.patch > > > After HIVE-8836, {{custom_input_output_format}} fails because of missing > hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230254#comment-14230254 ] Marcelo Vanzin commented on HIVE-8995: -- The three threads are from akka; I wonder if the test code is failing to properly shut down clients or the library itself (i.e. call {{SparkClientFactory.stop()}}). > Find thread leak in RSC Tests > - > > Key: HIVE-8995 > URL: https://issues.apache.org/jira/browse/HIVE-8995 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland > > I was regenerating output as part of the merge: > {noformat} > mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true > -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q > > auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q > > bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q > > join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q > > join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q > > mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q > > ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q > > skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230273#comment-14230273 ] Marcelo Vanzin commented on HIVE-8995: -- You don't need to call that method for every session. The pattern here is: * Call {{SparkClientFactory.initialize()}} once * Create / use as many clients as you want * When app shuts down, call {{SparkClientFactory.stop()}} So this should work nicely for HS2 (call initialize during bring up, call stop during shut down). I see {{RemoteHiveSparkClient}} calls initialize; that seems wrong, if my understanding of that class is correct (that it will be instantiated once for each session). Another option is to make {{initialize}} idempotent; right now it will just leak the old akka actor system, which is bad. This should be a trivial change (just add a check for {{initialized}}). > Find thread leak in RSC Tests > - > > Key: HIVE-8995 > URL: https://issues.apache.org/jira/browse/HIVE-8995 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland > > I was regenerating output as part of the merge: > {noformat} > mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true > -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q > > auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q > > bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q > > join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q > > join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q > > mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q > > ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoin
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230478#comment-14230478 ] Marcelo Vanzin commented on HIVE-8957: -- If you don't mind the bug remaining unattended for several days, sure. I have my hands full with all sorts of other things at the moment. > Remote spark context needs to clean up itself in case of connection timeout > [Spark Branch] > -- > > Key: HIVE-8957 > URL: https://issues.apache.org/jira/browse/HIVE-8957 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-8957.1-spark.patch > > > In the current SparkClient implementation (class SparkClientImpl), the > constructor does some initialization and in the end waits for the remote > driver to connect. In case of timeout, it just throws an exception without > cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231866#comment-14231866 ] Marcelo Vanzin commented on HIVE-8991: -- I didn't mean to stop you guys from checking in this patch. I just said that while this may fix the test, it's an indication of something that we need to understand better (i.e. how to properly add jars to the Spark job's classpath without causing conflicts). > Fix custom_input_output_format [Spark Branch] > - > > Key: HIVE-8991 > URL: https://issues.apache.org/jira/browse/HIVE-8991 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-8991.1-spark.patch > > > After HIVE-8836, {{custom_input_output_format}} fails because of missing > hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved HIVE-8574. -- Resolution: Not a Problem I'll close this as "not a problem" for now. If we decide the overhead is too much, we can revisit it. As for the ugly API, currently I couldn't think of a way to avoid it. Spark's API is just not very friendly in this area. > Enhance metrics gathering in Spark Client [Spark Branch] > > > Key: HIVE-8574 > URL: https://issues.apache.org/jira/browse/HIVE-8574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The current implementation of metrics gathering in the Spark client is a > little hacky. First, it's awkward to use (and the implementation is also > pretty ugly). Second, it will just collect metrics indefinitely, so in the > long term it turns into a huge memory leak. > We need a simplified interface and some mechanism for disposing of old > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
Marcelo Vanzin created HIVE-9036: Summary: Replace akka for remote spark client RPC [Spark Branch] Key: HIVE-9036 URL: https://issues.apache.org/jira/browse/HIVE-9036 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin We've had weird issues with akka, especially when something goes wrong and it becomes a little hard to debug. Let's replace it with a simple(r) RPC system built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.1-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Status: Patch Available (was: Open) Patch is rather large but shouldn't be too complicated; and there are unit tests! (Plus I've run some of the qtests.) > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236447#comment-14236447 ] Marcelo Vanzin commented on HIVE-9036: -- I'll look at why the patch isn't applying later... probably need to rebase my branch. > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.2-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: (was: HIVE-9036.2-spark.patch) > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.2-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Status: Open (was: Patch Available) > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Status: Patch Available (was: Open) > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.3-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, > HIVE-9036.3-spark.patch > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238670#comment-14238670 ] Marcelo Vanzin commented on HIVE-9036: -- I have a live job in that state, should be better for debugging. > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, > HIVE-9036.3-spark.patch, rsc-problem-1.tar.gz > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.4-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, > HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, rsc-problem-1.tar.gz > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.5-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, > HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, HIVE-9036.5-spark.patch, > rsc-problem-1.tar.gz > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9036) Replace akka for remote spark client RPC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-9036: - Attachment: HIVE-9036.6-spark.patch > Replace akka for remote spark client RPC [Spark Branch] > --- > > Key: HIVE-9036 > URL: https://issues.apache.org/jira/browse/HIVE-9036 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: HIVE-9036.1-spark.patch, HIVE-9036.2-spark.patch, > HIVE-9036.3-spark.patch, HIVE-9036.4-spark.patch, HIVE-9036.5-spark.patch, > HIVE-9036.6-spark.patch, rsc-problem-1.tar.gz > > > We've had weird issues with akka, especially when something goes wrong and it > becomes a little hard to debug. Let's replace it with a simple(r) RPC system > built on top of netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9085) Spark Client RPC should have larger default max message size [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243516#comment-14243516 ] Marcelo Vanzin commented on HIVE-9085: -- LGTM (as discussed by e-mail). > Spark Client RPC should have larger default max message size [Spark Branch] > --- > > Key: HIVE-9085 > URL: https://issues.apache.org/jira/browse/HIVE-9085 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Brock Noland > Attachments: HIVE-9085-spark.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9085) Spark Client RPC should have larger default max message size [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243535#comment-14243535 ] Marcelo Vanzin commented on HIVE-9085: -- When an exception is thrown in the write path, it's not safe to use the RPC channel anymore. Partial data may have been written to the socket and may cause both endpoints to get out of sync. Right now the approach the code has taken is "close the socket on any error". If we'd prefer, in the long term, a more resilient approach, more modifications will have to be made. > Spark Client RPC should have larger default max message size [Spark Branch] > --- > > Key: HIVE-9085 > URL: https://issues.apache.org/jira/browse/HIVE-9085 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Brock Noland > Attachments: HIVE-9085-spark.1.patch, HIVE-9085-spark.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244944#comment-14244944 ] Marcelo Vanzin commented on HIVE-9017: -- These files are created by Spark when downloading resources for the app (e.g. application jars). In standalone mode, by default, these files will end up in /tmp (java.io.tmpdir). The problem is that the app doesn't clean up these files; in fact, it can't, because they are supposed to be shared in case multiple executors run on the same host - so one executor cannot unilaterally decide to delete them. (That's not entirely true; I guess it could, but then it would cause other executors to re-download the file when needed, so more overhead.) This is not a problem in Yarn mode, since the temp dir is under a Yarn-managed directory that is deleted when the app shuts down. So, while I think of a clean way to fix this in Spark, the following can be done on the Hive side: - create an app-specific temp directory before launching the Spark app - set {{spark.local.dir}} to that location - delete the directory when the client shuts down > Clean up temp files of RSC [Spark Branch] > - > > Key: HIVE-9017 > URL: https://issues.apache.org/jira/browse/HIVE-9017 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li > > Currently RSC will leave a lot of temp files in {{/tmp}}, including > {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc. > We should clean up these files or it will exhaust disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244948#comment-14244948 ] Marcelo Vanzin commented on HIVE-9017: -- P.S.: that solution will probably not work very well in real standalone mode, since {{spark.local.dir}} would have to be created / deleted on every node in the cluster, and the client probably doesn't have the means to do that. > Clean up temp files of RSC [Spark Branch] > - > > Key: HIVE-9017 > URL: https://issues.apache.org/jira/browse/HIVE-9017 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li > > Currently RSC will leave a lot of temp files in {{/tmp}}, including > {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc. > We should clean up these files or it will exhaust disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244951#comment-14244951 ] Marcelo Vanzin commented on HIVE-9017: -- Correct. All files written by spark will end up under that directory (right now they all end up in /tmp since it's not set). > Clean up temp files of RSC [Spark Branch] > - > > Key: HIVE-9017 > URL: https://issues.apache.org/jira/browse/HIVE-9017 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li > > Currently RSC will leave a lot of temp files in {{/tmp}}, including > {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc. > We should clean up these files or it will exhaust disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)