Hi Ningjun

I just wanted to check that the master didn't "kick out" the worker, as the "Disassociated" can come from the master.

Here it looks like the worker killed the executor before shutting down itself.

What's the Spark version ?

Regards
JB

On 10/14/2015 04:42 PM, Wang, Ningjun (LNG-NPV) wrote:
I checked master log before and did not find anything wrong. Unfortunately I 
have lost the master log now.

So you think master log will tell you why executor is down?

Regards,

Ningjun Wang


-----Original Message-----
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Tuesday, October 13, 2015 10:42 AM
To: user@spark.apache.org
Subject: Re: Why is my spark executor is terminated?

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:
We use spark on windows 2008 R2 servers. We use one spark context
which create one spark executor. We run spark master, slave, driver,
executor on one single machine.

  From time to time, we found that the executor JAVA process was
terminated. I cannot fig out why it was terminated. Can anybody help
me on how to find out why the executor was terminated?

The spark slave log. It shows that it kill the executor process

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-0000/0

But why does it do that?

Here is the detailed logs from spark slave

2015-10-13 09:58:04,915 WARN
[sparkWorker-akka.actor.default-dispatcher-16]
remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71))
- Association with remote system
[akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388]
to
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with
configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [3] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [4] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-0000/0

2015-10-13 09:58:06,103 INFO  [ExecutorRunner for
app-20151009201453-0000/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Runner thread for executor
app-20151009201453-0000/0 interrupted

2015-10-13 09:58:06,118 INFO  [ExecutorRunner for
app-20151009201453-0000/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Killing process!

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Executor app-20151009201453-0000/0
finished with state KILLED exitStatus 1

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Cleaning up local directories for
application app-20151009201453-0000

Thanks

Ningjun Wang


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to