Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

Eason Hu Sat, 21 Mar 2015 14:01:04 -0700

Thank you for your help Akhil!  We found that it is no longer working from
our laptop to remotely connect to the remote Spark cluster, but it works if
the client is on the remote cluster as well, starting from the version
1.2.0 and beyond (v1.1.1 and below are fine).  Not sure if this is related
that Spark's internal communication got upgraded to a netty based
implementation which may not fit our firewall / network setup between
laptop and remote servers: https://issues.apache.org/jira/browse/SPARK-2468
in v1.2.0.  This is not very good for project development & debugging since
for every little change we need to recompile the entire jar and upload to
remote server then execute, instead of running it right the way on local
machine, but at least it works now.


Best,
Eason

On Thu, Mar 19, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Are you submitting your application from local to a remote host?
> If you want to run the spark application from a remote machine, then you have
> to at least set the following configurations properly.
>
>  - *spark.driver.host* - points to the ip/host from where you are
> submitting
>  the job (make sure you are able to ping this from the cluster)
>
>  - *spark.driver.port* - set it to a port number which is accessible from
>  the spark cluster.
>
>  You can look at more configuration options over here.
> <http://spark.apache.org/docs/latest/configuration.html#networking>
>
>
> Thanks
> Best Regards
>
> On Fri, Mar 20, 2015 at 4:02 AM, Eason Hu <eas...@gmail.com> wrote:
>
>> Hi Akhil,
>>
>> Thank you for your help.  I just found that the problem is related to my
>> local spark application, since I ran it in IntelliJ and I didn't reload the
>> project after I recompile the jar via maven.  If I didn't reload, it will
>> use some local cache data to run the application which leads to two
>> different versions.  After I reloaded the project and reran, it was running
>> fine for v1.1.1 and I no longer saw that class incompatible issues.
>>
>> However, I now encounter a new issue starting from v1.2.0 and above.
>>
>> Using Spark's default log4j profile: 
>> org/apache/spark/log4j-defaults.properties
>> 15/03/19 01:10:17 INFO CoarseGrainedExecutorBackend: Registered signal 
>> handlers for [TERM, HUP, INT]
>> 15/03/19 01:10:17 WARN NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 15/03/19 01:10:17 INFO SecurityManager: Changing view acls to: 
>> hduser,eason.hu
>> 15/03/19 01:10:17 INFO SecurityManager: Changing modify acls to: 
>> hduser,eason.hu
>> 15/03/19 01:10:17 INFO SecurityManager: SecurityManager: authentication 
>> disabled; ui acls disabled; users with view permissions: Set(hduser, 
>> eason.hu); users with modify permissions: Set(hduser, eason.hu)
>> 15/03/19 01:10:18 INFO Slf4jLogger: Slf4jLogger started
>> 15/03/19 01:10:18 INFO Remoting: Starting remoting
>> 15/03/19 01:10:18 INFO Remoting: Remoting started; listening on addresses 
>> :[akka.tcp://driverPropsFetcher@hduser-07:59122]
>> 15/03/19 01:10:18 INFO Utils: Successfully started service 
>> 'driverPropsFetcher' on port 59122.
>> 15/03/19 01:10:21 WARN ReliableDeliverySupervisor: Association with remote 
>> system [akka.tcp://sparkDriver@192.168.1.53:65001] has failed, address is 
>> now gated for [5000] ms. Reason is: [Association failed with 
>> [akka.tcp://sparkDriver@192.168.1.53:65001]].
>> 15/03/19 01:10:48 ERROR UserGroupInformation: PriviledgedActionException 
>> as:eason.hu (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: 
>> Futures timed out after [30 seconds]
>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: 
>> Unknown exception in doAs
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
>>      at 
>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
>>      at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128)
>>      at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224)
>>      at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>> Caused by: java.security.PrivilegedActionException: 
>> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>      ... 4 more
>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
>> [30 seconds]
>>      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>      at 
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>      at 
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>      at scala.concurrent.Await$.result(package.scala:107)
>>      at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:144)
>>      at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>>      at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
>>      ... 7 more
>>
>> Do you have any clues why it happens only after v1.2.0 and above?
>> Nothing else changes.
>>
>> Thanks,
>> Eason
>>
>> On Tue, Mar 17, 2015 at 8:39 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Its clearly saying:
>>>
>>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>>> local class incompatible: stream classdesc serialVersionUID =
>>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>>>
>>> Version incompatibility, can you double check your version?
>>> On 18 Mar 2015 06:08, "Eason Hu" <eas...@gmail.com> wrote:
>>>
>>>> Hi Akhil,
>>>>
>>>> sc.parallelize(1 to 10000).collect() in the Spark shell on Spark v1.2.0
>>>> runs fine.  However, if I do the following remotely, it will throw
>>>> exception:
>>>>
>>>> val sc : SparkContext = new SparkContext(conf)
>>>>
>>>>   val NUM_SAMPLES = 10
>>>>   val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
>>>>     val x = Math.random()
>>>>     val y = Math.random()
>>>>     if (x*x + y*y < 1) 1 else 0
>>>>   }.reduce(_ + _)
>>>>   println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>>>>
>>>> Exception:
>>>> 15/03/17 17:33:52 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on
>>>> hcompute32228.sjc9.service-now.com: remote Akka client disassociated
>>>> 15/03/17 17:33:52 INFO scheduler.TaskSetManager: Re-queueing tasks for
>>>> 1 from TaskSet 0.0
>>>> 15/03/17 17:33:52 WARN scheduler.TaskSetManager: Lost task 1.1 in stage
>>>> 0.0 (TID 3, hcompute32228): ExecutorLostFailure (executor lost)
>>>> 15/03/17 17:33:52 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch
>>>> 3)
>>>> 15/03/17 17:33:52 INFO storage.BlockManagerMasterActor: Trying to
>>>> remove executor 1 from BlockManagerMaster.
>>>> 15/03/17 17:33:52 INFO storage.BlockManagerMaster: Removed 1
>>>> successfully in removeExecutor
>>>> 15/03/17 17:34:39 ERROR Remoting:
>>>> org.apache.spark.storage.BlockManagerId; local class incompatible: stream
>>>> classdesc serialVersionUID = 2439208141545036836, local class
>>>> serialVersionUID = -7366074099953117729
>>>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>>>> local class incompatible: stream classdesc serialVersionUID =
>>>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>>>>     at
>>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:604)
>>>>     at
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
>>>>     at
>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
>>>>
>>>> v1.1.0 is totally fine, but v1.1.1 and v1.2.0+ are not.  Are there any
>>>> special instruction to be Spark cluster for later versions?  Do you know if
>>>> there are anything I'm missing?
>>>>
>>>>
>>>> Thank you for your help,
>>>> Eason
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 16, 2015 at 11:51 PM, Akhil Das <ak...@sigmoidanalytics.com
>>>> > wrote:
>>>>
>>>>> Could you tell me what all you did to change the version of spark?
>>>>>
>>>>> Can you fireup a spark-shell and write this line and see what happens:
>>>>>
>>>>> sc.parallelize(1 to 10000).collect()
>>>>>
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Mon, Mar 16, 2015 at 11:13 PM, Eason Hu <eas...@gmail.com> wrote:
>>>>>
>>>>>> Hi Akhil,
>>>>>>
>>>>>> Yes, I did change both versions on the project and the cluster.  Any
>>>>>> clues?
>>>>>>
>>>>>> Even the sample code from Spark website failed to work.
>>>>>>
>>>>>> Thanks,
>>>>>> Eason
>>>>>>
>>>>>> On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das <
>>>>>> ak...@sigmoidanalytics.com> wrote:
>>>>>>
>>>>>>> Did you change both the versions? The one in your build file of your
>>>>>>> project and the spark version of your cluster?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Best Regards
>>>>>>>
>>>>>>> On Sat, Mar 14, 2015 at 6:47 AM, EH <eas...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've been using Spark 1.1.0 for a while, and now would like to
>>>>>>>> upgrade to
>>>>>>>> Spark 1.1.1 or above.  However, it throws the following errors:
>>>>>>>>
>>>>>>>> 18:05:31.522 [sparkDriver-akka.actor.default-dispatcher-3hread]
>>>>>>>> ERROR
>>>>>>>> TaskSchedulerImpl - Lost executor 37 on hcompute001: remote Akka
>>>>>>>> client
>>>>>>>> disassociated
>>>>>>>> 18:05:31.530 [sparkDriver-akka.actor.default-dispatcher-3hread] WARN
>>>>>>>> TaskSetManager - Lost task 0.0 in stage 1.0 (TID 0, hcompute001):
>>>>>>>> ExecutorLostFailure (executor lost)
>>>>>>>> 18:05:31.567 [sparkDriver-akka.actor.default-dispatcher-2hread]
>>>>>>>> ERROR
>>>>>>>> TaskSchedulerImpl - Lost executor 3 on hcompute001: remote Akka
>>>>>>>> client
>>>>>>>> disassociated
>>>>>>>> 18:05:31.568 [sparkDriver-akka.actor.default-dispatcher-2hread] WARN
>>>>>>>> TaskSetManager - Lost task 1.0 in stage 1.0 (TID 1, hcompute001):
>>>>>>>> ExecutorLostFailure (executor lost)
>>>>>>>> 18:05:31.988 [sparkDriver-akka.actor.default-dispatcher-23hread]
>>>>>>>> ERROR
>>>>>>>> TaskSchedulerImpl - Lost executor 24 on hcompute001: remote Akka
>>>>>>>> client
>>>>>>>> disassociated
>>>>>>>>
>>>>>>>> Do you know what may go wrong?  I didn't change any codes, just
>>>>>>>> changed the
>>>>>>>> version of Spark.
>>>>>>>>
>>>>>>>> Thank you all,
>>>>>>>> Eason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Upgrade-from-Spark-1-1-0-to-1-1-1-Issues-tp22045.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

Reply via email to