Re: Problem with current spark

2015-05-15 Thread Shixiong Zhu
Could your provide the full driver log? Looks like a bug. Thank you!

Best Regards,
Shixiong Zhu

2015-05-13 14:02 GMT-07:00 Giovanni Paolo Gibilisco gibb...@gmail.com:

 Hi,
 I'm trying to run an application that uses a Hive context to perform some
 queries over JSON files.
 The code of the application is here:
 https://github.com/GiovanniPaoloGibilisco/spark-log-processor/tree/fca93d95a227172baca58d51a4d799594a0429a1

 I can run it on Spark 1.3.1 after rebuilding it with hive support
 using: mvn -Phive -Phive-thriftserver -DskipTests clean package
 but when I try to run the same application on the one built fromt he
 current master branch (at this commit of today
 https://github.com/apache/spark/tree/bec938f777a2e18757c7d04504d86a5342e2b49e)
 again built with hive support I get an error at Stage 2 that is not
 submitted, and after a while the application is killed.
 The logs look like this:

 15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
 15/05/13 16:54:37 INFO DAGScheduler: Got job 2 (run at unknown:0) with 2
 output partitions (allowLocal=false)
 15/05/13 16:54:37 INFO DAGScheduler: Final stage: ResultStage 4(run at
 unknown:0)
 15/05/13 16:54:37 INFO DAGScheduler: Parents of final stage: List()
 15/05/13 16:54:37 INFO Exchange: Using SparkSqlSerializer2.
 15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
 15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
 15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
 ^C15/05/13 16:54:42 INFO SparkContext: Invoking stop() from shutdown hook
 15/05/13 16:54:42 INFO SparkUI: Stopped Spark web UI at
 http://192.168.230.130:4040
 15/05/13 16:54:42 INFO DAGScheduler: Stopping DAGScheduler
 15/05/13 16:54:42 INFO SparkDeploySchedulerBackend: Shutting down all
 executors
 15/05/13 16:54:42 INFO SparkDeploySchedulerBackend: Asking each executor
 to shut down
 15/05/13 16:54:52 INFO
 OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
 OutputCommitCoordinator stopped!
 15/05/13 16:54:52 ERROR TaskSchedulerImpl: Lost executor 0 on
 192.168.230.130: remote Rpc client disassociated
 15/05/13 16:54:53 INFO AppClient$ClientActor: Executor updated:
 app-20150513165402-/0 is now EXITED (Command exited with code 0)
 15/05/13 16:54:53 INFO SparkDeploySchedulerBackend: Executor
 app-20150513165402-/0 removed: Command exited with code 0
 15/05/13 16:54:53 ERROR SparkDeploySchedulerBackend: Asked to remove
 non-existent executor 0
 15/05/13 16:56:42 WARN AkkaRpcEndpointRef: Error sending message [message
 = StopExecutors] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [120
 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:107)
 at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
 at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
 at
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
 at
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266)
 at
 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:95)
 at
 org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416)
 at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1404)
 at org.apache.spark.SparkContext.stop(SparkContext.scala:1562)
 at
 org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:551)
 at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2252)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1764)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:)
 at scala.util.Try$.apply(Try.scala:161)
 at org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:)
 at
 org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2204)
 at
 org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

 Should I submit an Issue for this?
 What is the best way to do it?
 Best





Problem with current spark

2015-05-13 Thread Giovanni Paolo Gibilisco
Hi,
I'm trying to run an application that uses a Hive context to perform some
queries over JSON files.
The code of the application is here:
https://github.com/GiovanniPaoloGibilisco/spark-log-processor/tree/fca93d95a227172baca58d51a4d799594a0429a1

I can run it on Spark 1.3.1 after rebuilding it with hive support
using: mvn -Phive -Phive-thriftserver -DskipTests clean package
but when I try to run the same application on the one built fromt he
current master branch (at this commit of today
https://github.com/apache/spark/tree/bec938f777a2e18757c7d04504d86a5342e2b49e)
again built with hive support I get an error at Stage 2 that is not
submitted, and after a while the application is killed.
The logs look like this:

15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
15/05/13 16:54:37 INFO DAGScheduler: Got job 2 (run at unknown:0) with 2
output partitions (allowLocal=false)
15/05/13 16:54:37 INFO DAGScheduler: Final stage: ResultStage 4(run at
unknown:0)
15/05/13 16:54:37 INFO DAGScheduler: Parents of final stage: List()
15/05/13 16:54:37 INFO Exchange: Using SparkSqlSerializer2.
15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
15/05/13 16:54:37 INFO SparkContext: Starting job: run at unknown:0
^C15/05/13 16:54:42 INFO SparkContext: Invoking stop() from shutdown hook
15/05/13 16:54:42 INFO SparkUI: Stopped Spark web UI at
http://192.168.230.130:4040
15/05/13 16:54:42 INFO DAGScheduler: Stopping DAGScheduler
15/05/13 16:54:42 INFO SparkDeploySchedulerBackend: Shutting down all
executors
15/05/13 16:54:42 INFO SparkDeploySchedulerBackend: Asking each executor to
shut down
15/05/13 16:54:52 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
15/05/13 16:54:52 ERROR TaskSchedulerImpl: Lost executor 0 on
192.168.230.130: remote Rpc client disassociated
15/05/13 16:54:53 INFO AppClient$ClientActor: Executor updated:
app-20150513165402-/0 is now EXITED (Command exited with code 0)
15/05/13 16:54:53 INFO SparkDeploySchedulerBackend: Executor
app-20150513165402-/0 removed: Command exited with code 0
15/05/13 16:54:53 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 0
15/05/13 16:56:42 WARN AkkaRpcEndpointRef: Error sending message [message =
StopExecutors] in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:95)
at
org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1404)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1562)
at
org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:551)
at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2252)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1764)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:)
at
org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2204)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Should I submit an Issue for this?
What is the best way to do it?
Best