Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues
Thank you for your help Akhil! We found that it is no longer working from our laptop to remotely connect to the remote Spark cluster, but it works if the client is on the remote cluster as well, starting from the version 1.2.0 and beyond (v1.1.1 and below are fine). Not sure if this is related that Spark's internal communication got upgraded to a netty based implementation which may not fit our firewall / network setup between laptop and remote servers: https://issues.apache.org/jira/browse/SPARK-2468 in v1.2.0. This is not very good for project development debugging since for every little change we need to recompile the entire jar and upload to remote server then execute, instead of running it right the way on local machine, but at least it works now. Best, Eason On Thu, Mar 19, 2015 at 11:35 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you submitting your application from local to a remote host? If you want to run the spark application from a remote machine, then you have to at least set the following configurations properly. - *spark.driver.host* - points to the ip/host from where you are submitting the job (make sure you are able to ping this from the cluster) - *spark.driver.port* - set it to a port number which is accessible from the spark cluster. You can look at more configuration options over here. http://spark.apache.org/docs/latest/configuration.html#networking Thanks Best Regards On Fri, Mar 20, 2015 at 4:02 AM, Eason Hu eas...@gmail.com wrote: Hi Akhil, Thank you for your help. I just found that the problem is related to my local spark application, since I ran it in IntelliJ and I didn't reload the project after I recompile the jar via maven. If I didn't reload, it will use some local cache data to run the application which leads to two different versions. After I reloaded the project and reran, it was running fine for v1.1.1 and I no longer saw that class incompatible issues. However, I now encounter a new issue starting from v1.2.0 and above. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/19 01:10:17 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/03/19 01:10:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/19 01:10:17 INFO SecurityManager: Changing view acls to: hduser,eason.hu 15/03/19 01:10:17 INFO SecurityManager: Changing modify acls to: hduser,eason.hu 15/03/19 01:10:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser, eason.hu); users with modify permissions: Set(hduser, eason.hu) 15/03/19 01:10:18 INFO Slf4jLogger: Slf4jLogger started 15/03/19 01:10:18 INFO Remoting: Starting remoting 15/03/19 01:10:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@hduser-07:59122] 15/03/19 01:10:18 INFO Utils: Successfully started service 'driverPropsFetcher' on port 59122. 15/03/19 01:10:21 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@192.168.1.53:65001] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkDriver@192.168.1.53:65001]]. 15/03/19 01:10:48 ERROR UserGroupInformation: PriviledgedActionException as:eason.hu (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] Exception in thread main java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) ... 4 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result
Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues
Hi Akhil, Thank you for your help. I just found that the problem is related to my local spark application, since I ran it in IntelliJ and I didn't reload the project after I recompile the jar via maven. If I didn't reload, it will use some local cache data to run the application which leads to two different versions. After I reloaded the project and reran, it was running fine for v1.1.1 and I no longer saw that class incompatible issues. However, I now encounter a new issue starting from v1.2.0 and above. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/19 01:10:17 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/03/19 01:10:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/19 01:10:17 INFO SecurityManager: Changing view acls to: hduser,eason.hu 15/03/19 01:10:17 INFO SecurityManager: Changing modify acls to: hduser,eason.hu 15/03/19 01:10:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser, eason.hu); users with modify permissions: Set(hduser, eason.hu) 15/03/19 01:10:18 INFO Slf4jLogger: Slf4jLogger started 15/03/19 01:10:18 INFO Remoting: Starting remoting 15/03/19 01:10:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@hduser-07:59122] 15/03/19 01:10:18 INFO Utils: Successfully started service 'driverPropsFetcher' on port 59122. 15/03/19 01:10:21 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@192.168.1.53:65001] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkDriver@192.168.1.53:65001]]. 15/03/19 01:10:48 ERROR UserGroupInformation: PriviledgedActionException as:eason.hu (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] Exception in thread main java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) ... 4 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:144) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) ... 7 more Do you have any clues why it happens only after v1.2.0 and above? Nothing else changes. Thanks, Eason On Tue, Mar 17, 2015 at 8:39 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its clearly saying: java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = -7366074099953117729 Version incompatibility, can you double check your version? On 18 Mar 2015 06:08, Eason Hu eas...@gmail.com wrote: Hi Akhil, sc.parallelize(1 to 1).collect() in the Spark shell on Spark v1.2.0 runs fine. However, if I do the following remotely, it will throw exception: val sc : SparkContext = new SparkContext(conf) val NUM_SAMPLES = 10 val count = sc.parallelize(1 to NUM_SAMPLES).map{i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) Exception: 15/03/17 17:33:52 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on hcompute32228.sjc9.service-now.com: remote Akka client disassociated 15/03/17 17:33:52 INFO scheduler.TaskSetManager: Re-queueing tasks
Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues
Hi Akhil, sc.parallelize(1 to 1).collect() in the Spark shell on Spark v1.2.0 runs fine. However, if I do the following remotely, it will throw exception: val sc : SparkContext = new SparkContext(conf) val NUM_SAMPLES = 10 val count = sc.parallelize(1 to NUM_SAMPLES).map{i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) Exception: 15/03/17 17:33:52 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on hcompute32228.sjc9.service-now.com: remote Akka client disassociated 15/03/17 17:33:52 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/03/17 17:33:52 WARN scheduler.TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3, hcompute32228): ExecutorLostFailure (executor lost) 15/03/17 17:33:52 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 3) 15/03/17 17:33:52 INFO storage.BlockManagerMasterActor: Trying to remove executor 1 from BlockManagerMaster. 15/03/17 17:33:52 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 15/03/17 17:34:39 ERROR Remoting: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = -7366074099953117729 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = -7366074099953117729 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:604) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515) v1.1.0 is totally fine, but v1.1.1 and v1.2.0+ are not. Are there any special instruction to be Spark cluster for later versions? Do you know if there are anything I'm missing? Thank you for your help, Eason On Mon, Mar 16, 2015 at 11:51 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Could you tell me what all you did to change the version of spark? Can you fireup a spark-shell and write this line and see what happens: sc.parallelize(1 to 1).collect() Thanks Best Regards On Mon, Mar 16, 2015 at 11:13 PM, Eason Hu eas...@gmail.com wrote: Hi Akhil, Yes, I did change both versions on the project and the cluster. Any clues? Even the sample code from Spark website failed to work. Thanks, Eason On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Did you change both the versions? The one in your build file of your project and the spark version of your cluster? Thanks Best Regards On Sat, Mar 14, 2015 at 6:47 AM, EH eas...@gmail.com wrote: Hi all, I've been using Spark 1.1.0 for a while, and now would like to upgrade to Spark 1.1.1 or above. However, it throws the following errors: 18:05:31.522 [sparkDriver-akka.actor.default-dispatcher-3hread] ERROR TaskSchedulerImpl - Lost executor 37 on hcompute001: remote Akka client disassociated 18:05:31.530 [sparkDriver-akka.actor.default-dispatcher-3hread] WARN TaskSetManager - Lost task 0.0 in stage 1.0 (TID 0, hcompute001): ExecutorLostFailure (executor lost) 18:05:31.567 [sparkDriver-akka.actor.default-dispatcher-2hread] ERROR TaskSchedulerImpl - Lost executor 3 on hcompute001: remote Akka client disassociated 18:05:31.568 [sparkDriver-akka.actor.default-dispatcher-2hread] WARN TaskSetManager - Lost task 1.0 in stage 1.0 (TID 1, hcompute001): ExecutorLostFailure (executor lost) 18:05:31.988 [sparkDriver-akka.actor.default-dispatcher-23hread] ERROR TaskSchedulerImpl - Lost executor 24 on hcompute001: remote Akka client disassociated Do you know what may go wrong? I didn't change any codes, just changed the version of Spark. Thank you all, Eason -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Upgrade-from-Spark-1-1-0-to-1-1-1-Issues-tp22045.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues
Hi Akhil, Yes, I did change both versions on the project and the cluster. Any clues? Even the sample code from Spark website failed to work. Thanks, Eason On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Did you change both the versions? The one in your build file of your project and the spark version of your cluster? Thanks Best Regards On Sat, Mar 14, 2015 at 6:47 AM, EH eas...@gmail.com wrote: Hi all, I've been using Spark 1.1.0 for a while, and now would like to upgrade to Spark 1.1.1 or above. However, it throws the following errors: 18:05:31.522 [sparkDriver-akka.actor.default-dispatcher-3hread] ERROR TaskSchedulerImpl - Lost executor 37 on hcompute001: remote Akka client disassociated 18:05:31.530 [sparkDriver-akka.actor.default-dispatcher-3hread] WARN TaskSetManager - Lost task 0.0 in stage 1.0 (TID 0, hcompute001): ExecutorLostFailure (executor lost) 18:05:31.567 [sparkDriver-akka.actor.default-dispatcher-2hread] ERROR TaskSchedulerImpl - Lost executor 3 on hcompute001: remote Akka client disassociated 18:05:31.568 [sparkDriver-akka.actor.default-dispatcher-2hread] WARN TaskSetManager - Lost task 1.0 in stage 1.0 (TID 1, hcompute001): ExecutorLostFailure (executor lost) 18:05:31.988 [sparkDriver-akka.actor.default-dispatcher-23hread] ERROR TaskSchedulerImpl - Lost executor 24 on hcompute001: remote Akka client disassociated Do you know what may go wrong? I didn't change any codes, just changed the version of Spark. Thank you all, Eason -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Upgrade-from-Spark-1-1-0-to-1-1-1-Issues-tp22045.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org