from:"Eason Hu"

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

2015-03-21 Thread Eason Hu

Thank you for your help Akhil!  We found that it is no longer working from
our laptop to remotely connect to the remote Spark cluster, but it works if
the client is on the remote cluster as well, starting from the version
1.2.0 and beyond (v1.1.1 and below are fine).  Not sure if this is related
that Spark's internal communication got upgraded to a netty based
implementation which may not fit our firewall / network setup between
laptop and remote servers: https://issues.apache.org/jira/browse/SPARK-2468
in v1.2.0.  This is not very good for project development  debugging since
for every little change we need to recompile the entire jar and upload to
remote server then execute, instead of running it right the way on local
machine, but at least it works now.

Best,
Eason

On Thu, Mar 19, 2015 at 11:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Are you submitting your application from local to a remote host?
 If you want to run the spark application from a remote machine, then you have
 to at least set the following configurations properly.

  - *spark.driver.host* - points to the ip/host from where you are
 submitting
  the job (make sure you are able to ping this from the cluster)

  - *spark.driver.port* - set it to a port number which is accessible from
  the spark cluster.

  You can look at more configuration options over here.
 http://spark.apache.org/docs/latest/configuration.html#networking


 Thanks
 Best Regards

 On Fri, Mar 20, 2015 at 4:02 AM, Eason Hu eas...@gmail.com wrote:

 Hi Akhil,

 Thank you for your help.  I just found that the problem is related to my
 local spark application, since I ran it in IntelliJ and I didn't reload the
 project after I recompile the jar via maven.  If I didn't reload, it will
 use some local cache data to run the application which leads to two
 different versions.  After I reloaded the project and reran, it was running
 fine for v1.1.1 and I no longer saw that class incompatible issues.

 However, I now encounter a new issue starting from v1.2.0 and above.

 Using Spark's default log4j profile: 
 org/apache/spark/log4j-defaults.properties
 15/03/19 01:10:17 INFO CoarseGrainedExecutorBackend: Registered signal 
 handlers for [TERM, HUP, INT]
 15/03/19 01:10:17 WARN NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 15/03/19 01:10:17 INFO SecurityManager: Changing view acls to: 
 hduser,eason.hu
 15/03/19 01:10:17 INFO SecurityManager: Changing modify acls to: 
 hduser,eason.hu
 15/03/19 01:10:17 INFO SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(hduser, 
 eason.hu); users with modify permissions: Set(hduser, eason.hu)
 15/03/19 01:10:18 INFO Slf4jLogger: Slf4jLogger started
 15/03/19 01:10:18 INFO Remoting: Starting remoting
 15/03/19 01:10:18 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://driverPropsFetcher@hduser-07:59122]
 15/03/19 01:10:18 INFO Utils: Successfully started service 
 'driverPropsFetcher' on port 59122.
 15/03/19 01:10:21 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkDriver@192.168.1.53:65001] has failed, address is 
 now gated for [5000] ms. Reason is: [Association failed with 
 [akka.tcp://sparkDriver@192.168.1.53:65001]].
 15/03/19 01:10:48 ERROR UserGroupInformation: PriviledgedActionException 
 as:eason.hu (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: 
 Futures timed out after [30 seconds]
 Exception in thread main java.lang.reflect.UndeclaredThrowableException: 
 Unknown exception in doAs
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
  at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
  at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128)
  at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224)
  at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
 Caused by: java.security.PrivilegedActionException: 
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
  ... 4 more
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
 [30 seconds]
  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
  at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
  at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  at scala.concurrent.Await$.result

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

2015-03-19 Thread Eason Hu

Hi Akhil,

Thank you for your help.  I just found that the problem is related to my
local spark application, since I ran it in IntelliJ and I didn't reload the
project after I recompile the jar via maven.  If I didn't reload, it will
use some local cache data to run the application which leads to two
different versions.  After I reloaded the project and reran, it was running
fine for v1.1.1 and I no longer saw that class incompatible issues.

However, I now encounter a new issue starting from v1.2.0 and above.

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/19 01:10:17 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
15/03/19 01:10:17 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
15/03/19 01:10:17 INFO SecurityManager: Changing view acls to: hduser,eason.hu
15/03/19 01:10:17 INFO SecurityManager: Changing modify acls to: hduser,eason.hu
15/03/19 01:10:17 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hduser, eason.hu); users with modify permissions:
Set(hduser, eason.hu)
15/03/19 01:10:18 INFO Slf4jLogger: Slf4jLogger started
15/03/19 01:10:18 INFO Remoting: Starting remoting
15/03/19 01:10:18 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://driverPropsFetcher@hduser-07:59122]
15/03/19 01:10:18 INFO Utils: Successfully started service
'driverPropsFetcher' on port 59122.
15/03/19 01:10:21 WARN ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkDriver@192.168.1.53:65001] has failed,
address is now gated for [5000] ms. Reason is: [Association failed
with [akka.tcp://sparkDriver@192.168.1.53:65001]].
15/03/19 01:10:48 ERROR UserGroupInformation:
PriviledgedActionException as:eason.hu (auth:SIMPLE)
cause:java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds]
Exception in thread main
java.lang.reflect.UndeclaredThrowableException: Unknown exception in
doAs
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException:
java.util.concurrent.TimeoutException: Futures timed out after [30
seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out
after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:144)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
... 7 more

Do you have any clues why it happens only after v1.2.0 and above?  Nothing
else changes.

Thanks,
Eason

On Tue, Mar 17, 2015 at 8:39 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Its clearly saying:

 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
 local class incompatible: stream classdesc serialVersionUID =
 2439208141545036836, local class serialVersionUID = -7366074099953117729

 Version incompatibility, can you double check your version?
 On 18 Mar 2015 06:08, Eason Hu eas...@gmail.com wrote:

 Hi Akhil,

 sc.parallelize(1 to 1).collect() in the Spark shell on Spark v1.2.0
 runs fine.  However, if I do the following remotely, it will throw
 exception:

 val sc : SparkContext = new SparkContext(conf)

   val NUM_SAMPLES = 10
   val count = sc.parallelize(1 to NUM_SAMPLES).map{i =
 val x = Math.random()
 val y = Math.random()
 if (x*x + y*y  1) 1 else 0
   }.reduce(_ + _)
   println(Pi is roughly  + 4.0 * count / NUM_SAMPLES)

 Exception:
 15/03/17 17:33:52 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on
 hcompute32228.sjc9.service-now.com: remote Akka client disassociated
 15/03/17 17:33:52 INFO scheduler.TaskSetManager: Re-queueing tasks

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

2015-03-17 Thread Eason Hu

Hi Akhil,

sc.parallelize(1 to 1).collect() in the Spark shell on Spark v1.2.0
runs fine.  However, if I do the following remotely, it will throw
exception:

val sc : SparkContext = new SparkContext(conf)

  val NUM_SAMPLES = 10
  val count = sc.parallelize(1 to NUM_SAMPLES).map{i =
val x = Math.random()
val y = Math.random()
if (x*x + y*y  1) 1 else 0
  }.reduce(_ + _)
  println(Pi is roughly  + 4.0 * count / NUM_SAMPLES)

Exception:
15/03/17 17:33:52 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on
hcompute32228.sjc9.service-now.com: remote Akka client disassociated
15/03/17 17:33:52 INFO scheduler.TaskSetManager: Re-queueing tasks for 1
from TaskSet 0.0
15/03/17 17:33:52 WARN scheduler.TaskSetManager: Lost task 1.1 in stage 0.0
(TID 3, hcompute32228): ExecutorLostFailure (executor lost)
15/03/17 17:33:52 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 3)
15/03/17 17:33:52 INFO storage.BlockManagerMasterActor: Trying to remove
executor 1 from BlockManagerMaster.
15/03/17 17:33:52 INFO storage.BlockManagerMaster: Removed 1 successfully
in removeExecutor
15/03/17 17:34:39 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:604)
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)

v1.1.0 is totally fine, but v1.1.1 and v1.2.0+ are not.  Are there any
special instruction to be Spark cluster for later versions?  Do you know if
there are anything I'm missing?


Thank you for your help,
Eason





On Mon, Mar 16, 2015 at 11:51 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Could you tell me what all you did to change the version of spark?

 Can you fireup a spark-shell and write this line and see what happens:

 sc.parallelize(1 to 1).collect()


 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 11:13 PM, Eason Hu eas...@gmail.com wrote:

 Hi Akhil,

 Yes, I did change both versions on the project and the cluster.  Any
 clues?

 Even the sample code from Spark website failed to work.

 Thanks,
 Eason

 On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Did you change both the versions? The one in your build file of your
 project and the spark version of your cluster?

 Thanks
 Best Regards

 On Sat, Mar 14, 2015 at 6:47 AM, EH eas...@gmail.com wrote:

 Hi all,

 I've been using Spark 1.1.0 for a while, and now would like to upgrade
 to
 Spark 1.1.1 or above.  However, it throws the following errors:

 18:05:31.522 [sparkDriver-akka.actor.default-dispatcher-3hread] ERROR
 TaskSchedulerImpl - Lost executor 37 on hcompute001: remote Akka client
 disassociated
 18:05:31.530 [sparkDriver-akka.actor.default-dispatcher-3hread] WARN
 TaskSetManager - Lost task 0.0 in stage 1.0 (TID 0, hcompute001):
 ExecutorLostFailure (executor lost)
 18:05:31.567 [sparkDriver-akka.actor.default-dispatcher-2hread] ERROR
 TaskSchedulerImpl - Lost executor 3 on hcompute001: remote Akka client
 disassociated
 18:05:31.568 [sparkDriver-akka.actor.default-dispatcher-2hread] WARN
 TaskSetManager - Lost task 1.0 in stage 1.0 (TID 1, hcompute001):
 ExecutorLostFailure (executor lost)
 18:05:31.988 [sparkDriver-akka.actor.default-dispatcher-23hread] ERROR
 TaskSchedulerImpl - Lost executor 24 on hcompute001: remote Akka client
 disassociated

 Do you know what may go wrong?  I didn't change any codes, just changed
 the
 version of Spark.

 Thank you all,
 Eason



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Upgrade-from-Spark-1-1-0-to-1-1-1-Issues-tp22045.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

2015-03-16 Thread Eason Hu

Hi Akhil,

Yes, I did change both versions on the project and the cluster.  Any clues?

Even the sample code from Spark website failed to work.

Thanks,
Eason

On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Did you change both the versions? The one in your build file of your
 project and the spark version of your cluster?

 Thanks
 Best Regards

 On Sat, Mar 14, 2015 at 6:47 AM, EH eas...@gmail.com wrote:

 Hi all,

 I've been using Spark 1.1.0 for a while, and now would like to upgrade to
 Spark 1.1.1 or above.  However, it throws the following errors:

 18:05:31.522 [sparkDriver-akka.actor.default-dispatcher-3hread] ERROR
 TaskSchedulerImpl - Lost executor 37 on hcompute001: remote Akka client
 disassociated
 18:05:31.530 [sparkDriver-akka.actor.default-dispatcher-3hread] WARN
 TaskSetManager - Lost task 0.0 in stage 1.0 (TID 0, hcompute001):
 ExecutorLostFailure (executor lost)
 18:05:31.567 [sparkDriver-akka.actor.default-dispatcher-2hread] ERROR
 TaskSchedulerImpl - Lost executor 3 on hcompute001: remote Akka client
 disassociated
 18:05:31.568 [sparkDriver-akka.actor.default-dispatcher-2hread] WARN
 TaskSetManager - Lost task 1.0 in stage 1.0 (TID 1, hcompute001):
 ExecutorLostFailure (executor lost)
 18:05:31.988 [sparkDriver-akka.actor.default-dispatcher-23hread] ERROR
 TaskSchedulerImpl - Lost executor 24 on hcompute001: remote Akka client
 disassociated

 Do you know what may go wrong?  I didn't change any codes, just changed
 the
 version of Spark.

 Thank you all,
 Eason



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Upgrade-from-Spark-1-1-0-to-1-1-1-Issues-tp22045.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

4 matches

Site Navigation

Mail list logo

Footer information