subject:"Spark worker abruptly dying after 2 days"

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur

Yes you are right I initially started from master node but what happened
suddenly after 2 days that workers dies is what I am interested in knowing
, is it possible that workers got disconnected because of some network
issue and then they tried tried starting themselves but kept failing ?

On Sun, Feb 14, 2016 at 11:21 PM, Prabhu Joseph 
wrote:

> Kartik,
>
>  Spark Workers won't start if SPARK_MASTER_IP is wrong, maybe you
> would have used start_slaves.sh from Master node to start all worker nodes,
> where Workers would have got correct SPARK_MASTER_IP initially. Later any
> restart from slave nodes would have failed because of wrong SPARK_MASTER_IP
> at worker nodes.
>
>Check the logs of other workers running to see what SPARK_MASTER_IP it
> has connected, I don't think it is using a wrong Master IP.
>
>
> Thanks,
> Prabhu Joseph
>
> On Mon, Feb 15, 2016 at 12:34 PM, Kartik Mathur 
> wrote:
>
>> Thanks Prabhu ,
>>
>> I had wrongly configured spark_master_ip in worker nodes to `hostname -f`
>> which is the worker and not master ,
>>
>> but now the question is *why the cluster was up initially for 2 days*
>> and workers realized of this invalid configuration after 2 days ? And why
>> other workers are still up even through they have the same setting ?
>>
>> Really appreciate your help
>>
>> Thanks,
>> Kartik
>>
>> On Sun, Feb 14, 2016 at 10:53 PM, Prabhu Joseph <
>> prabhujose.ga...@gmail.com> wrote:
>>
>>> Kartik,
>>>
>>>The exception stack trace
>>> *java.util.concurrent.RejectedExecutionException* will happen if
>>> SPARK_MASTER_IP in worker nodes are configured wrongly like if
>>> SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
>>> to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
>>> exactly the same as what Spark Master GUI shows.
>>>
>>>
>>> Thanks,
>>> Prabhu Joseph
>>>
>>> On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur 
>>> wrote:
>>>
 on spark 1.5.2
 I have a spark standalone cluster with 6 workers , I left the cluster
 idle for 3 days and after 3 days I saw only 4 workers on the spark master
 UI , 2 workers died with the same exception -

 Strange part is cluster was running stable for 2 days but on third day
 2 workers abruptly died . I am see this error in one of the affected worker
 . No job ran for 2 days.

 2016-02-14 01:12:59 ERROR Worker:75 - Connection to master failed!
 Waiting for master to reconnect...2016-02-14 01:12:59 ERROR Worker:75 -
 Connection to master failed! Waiting for master to reconnect...2016-02-14
 01:13:10 ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in
 thread
 Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]java.util.concurrent.RejectedExecutionException:
 Task java.util.concurrent.FutureTask@514b13ad rejected from
 java.util.concurrent.ThreadPoolExecutor@17f8ec8d[Running, pool size =
 1, active threads = 1, queued tasks = 0, completed tasks = 3]at
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at
 java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at
 org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:269)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
  at 
 org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
at
 org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
at 
 org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at
 org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at 
 org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at
 org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Prabhu Joseph

Kartik,

 Spark Workers won't start if SPARK_MASTER_IP is wrong, maybe you would
have used start_slaves.sh from Master node to start all worker nodes, where
Workers would have got correct SPARK_MASTER_IP initially. Later any restart
from slave nodes would have failed because of wrong SPARK_MASTER_IP at
worker nodes.

   Check the logs of other workers running to see what SPARK_MASTER_IP it
has connected, I don't think it is using a wrong Master IP.


Thanks,
Prabhu Joseph

On Mon, Feb 15, 2016 at 12:34 PM, Kartik Mathur  wrote:

> Thanks Prabhu ,
>
> I had wrongly configured spark_master_ip in worker nodes to `hostname -f`
> which is the worker and not master ,
>
> but now the question is *why the cluster was up initially for 2 days* and
> workers realized of this invalid configuration after 2 days ? And why other
> workers are still up even through they have the same setting ?
>
> Really appreciate your help
>
> Thanks,
> Kartik
>
> On Sun, Feb 14, 2016 at 10:53 PM, Prabhu Joseph <
> prabhujose.ga...@gmail.com> wrote:
>
>> Kartik,
>>
>>The exception stack trace
>> *java.util.concurrent.RejectedExecutionException* will happen if
>> SPARK_MASTER_IP in worker nodes are configured wrongly like if
>> SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
>> to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
>> exactly the same as what Spark Master GUI shows.
>>
>>
>> Thanks,
>> Prabhu Joseph
>>
>> On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur 
>> wrote:
>>
>>> on spark 1.5.2
>>> I have a spark standalone cluster with 6 workers , I left the cluster
>>> idle for 3 days and after 3 days I saw only 4 workers on the spark master
>>> UI , 2 workers died with the same exception -
>>>
>>> Strange part is cluster was running stable for 2 days but on third day 2
>>> workers abruptly died . I am see this error in one of the affected worker .
>>> No job ran for 2 days.
>>>
>>>
>>>
>>> 2016-02-14 01:12:59 ERROR Worker:75 - Connection to master failed!
>>> Waiting for master to reconnect...2016-02-14 01:12:59 ERROR Worker:75 -
>>> Connection to master failed! Waiting for master to reconnect...2016-02-14
>>> 01:13:10 ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in
>>> thread
>>> Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]java.util.concurrent.RejectedExecutionException:
>>> Task java.util.concurrent.FutureTask@514b13ad rejected from
>>> java.util.concurrent.ThreadPoolExecutor@17f8ec8d[Running, pool size =
>>> 1, active threads = 1, queued tasks = 0, completed tasks = 3]at
>>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>>>at
>>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>>>at
>>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>>>at
>>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
>>>at
>>> org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:269)
>>>at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>>>  at 
>>> org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
>>>at
>>> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
>>>at 
>>> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
>>>at
>>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
>>>at 
>>> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
>>>at
>>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
>>>at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>at
>>> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
>>>at
>>> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>>>at
>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>  at
>>> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>>>at akka.actor.Actor$class.aroundReceive(Actor.scala:467)at
>>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
>>>at

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur

Thanks Prabhu ,

I had wrongly configured spark_master_ip in worker nodes to `hostname -f`
which is the worker and not master ,

but now the question is *why the cluster was up initially for 2 days* and
workers realized of this invalid configuration after 2 days ? And why other
workers are still up even through they have the same setting ?

Really appreciate your help

Thanks,
Kartik

On Sun, Feb 14, 2016 at 10:53 PM, Prabhu Joseph 
wrote:

> Kartik,
>
>The exception stack trace
> *java.util.concurrent.RejectedExecutionException* will happen if
> SPARK_MASTER_IP in worker nodes are configured wrongly like if
> SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
> to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
> exactly the same as what Spark Master GUI shows.
>
>
> Thanks,
> Prabhu Joseph
>
> On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur 
> wrote:
>
>> on spark 1.5.2
>> I have a spark standalone cluster with 6 workers , I left the cluster
>> idle for 3 days and after 3 days I saw only 4 workers on the spark master
>> UI , 2 workers died with the same exception -
>>
>> Strange part is cluster was running stable for 2 days but on third day 2
>> workers abruptly died . I am see this error in one of the affected worker .
>> No job ran for 2 days.
>>
>>
>>
>> 2016-02-14 01:12:59 ERROR Worker:75 - Connection to master failed!
>> Waiting for master to reconnect...2016-02-14 01:12:59 ERROR Worker:75 -
>> Connection to master failed! Waiting for master to reconnect...2016-02-14
>> 01:13:10 ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in
>> thread
>> Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]java.util.concurrent.RejectedExecutionException:
>> Task java.util.concurrent.FutureTask@514b13ad rejected from
>> java.util.concurrent.ThreadPoolExecutor@17f8ec8d[Running, pool size = 1,
>> active threads = 1, queued tasks = 0, completed tasks = 3]at
>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>>at
>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>>at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>>at
>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
>>at
>> org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:269)
>>at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>>  at 
>> org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
>>at
>> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
>>at 
>> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
>>at
>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
>>at 
>> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
>>at
>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
>>at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>at
>> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
>>at
>> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>>at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>  at
>> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>>at akka.actor.Actor$class.aroundReceive(Actor.scala:467)at
>> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
>>at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>  at akka.actor.ActorCell.invoke(ActorCell.scala:487)at
>> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)at
>> akka.dispatch.Mailbox.run(Mailbox.scala:220)at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>>at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>  at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>  at
>>

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Prabhu Joseph

Kartik,

   The exception stack trace
*java.util.concurrent.RejectedExecutionException* will happen if
SPARK_MASTER_IP in worker nodes are configured wrongly like if
SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
exactly the same as what Spark Master GUI shows.


Thanks,
Prabhu Joseph

On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur  wrote:

> on spark 1.5.2
> I have a spark standalone cluster with 6 workers , I left the cluster idle
> for 3 days and after 3 days I saw only 4 workers on the spark master UI , 2
> workers died with the same exception -
>
> Strange part is cluster was running stable for 2 days but on third day 2
> workers abruptly died . I am see this error in one of the affected worker .
> No job ran for 2 days.
>
>
>
> 2016-02-14 01:12:59 ERROR Worker:75 - Connection to master failed! Waiting
> for master to reconnect...2016-02-14 01:12:59 ERROR Worker:75 - Connection
> to master failed! Waiting for master to reconnect...2016-02-14 01:13:10
> ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in thread
> Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]java.util.concurrent.RejectedExecutionException:
> Task java.util.concurrent.FutureTask@514b13ad rejected from
> java.util.concurrent.ThreadPoolExecutor@17f8ec8d[Running, pool size = 1,
> active threads = 1, queued tasks = 0, completed tasks = 3]at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
>at
> org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:269)
>at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>  at 
> org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
>at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
>at 
> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
>at
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
>at 
> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
>at
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
>at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>at
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
>at
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>  at
> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>at akka.actor.Actor$class.aroundReceive(Actor.scala:467)at
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
>at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>  at akka.actor.ActorCell.invoke(ActorCell.scala:487)at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)at
> akka.dispatch.Mailbox.run(Mailbox.scala:220)at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> down votefavorite
> 
>
>

Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur

on spark 1.5.2
I have a spark standalone cluster with 6 workers , I left the cluster idle
for 3 days and after 3 days I saw only 4 workers on the spark master UI , 2
workers died with the same exception -

Strange part is cluster was running stable for 2 days but on third day 2
workers abruptly died . I am see this error in one of the affected worker .
No job ran for 2 days.



2016-02-14 01:12:59 ERROR Worker:75 - Connection to master failed! Waiting
for master to reconnect...2016-02-14 01:12:59 ERROR Worker:75 - Connection
to master failed! Waiting for master to reconnect...2016-02-14 01:13:10
ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in thread
Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]java.util.concurrent.RejectedExecutionException:
Task java.util.concurrent.FutureTask@514b13ad rejected from
java.util.concurrent.ThreadPoolExecutor@17f8ec8d[Running, pool size = 1,
active threads = 1, queued tasks = 0, completed tasks = 3]at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
   at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
   at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
   at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
   at
org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:269)
   at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
 at 
org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
   at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
   at 
org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
   at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
   at 
org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
   at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
   at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
   at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:467)at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)at
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)at
akka.dispatch.Mailbox.run(Mailbox.scala:220)at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
   at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



down votefavorite

Re: Spark worker abruptly dying after 2 days

Re: Spark worker abruptly dying after 2 days

Re: Spark worker abruptly dying after 2 days

Re: Spark worker abruptly dying after 2 days

Spark worker abruptly dying after 2 days

5 matches

Site Navigation

Mail list logo

Footer information