Re: Spark Executor Lost issue

2016-09-28 Thread Sushrut Ikhar
Can you add more details like are you using rdds/datasets/sql ..; are you
doing group by/ joins ; is your input splittable?
btw, you can pass the config the same way you are passing memryOverhead:
e.g.
--conf spark.default.parallelism=1000 or through spark-context in code

Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar



On Wed, Sep 28, 2016 at 7:30 PM, Aditya 
wrote:

> Hi All,
>
> Any updates on this?
>
> On Wednesday 28 September 2016 12:22 PM, Sushrut Ikhar wrote:
>
> Try with increasing the parallelism by repartitioning and also you may
> increase - spark.default.parallelism
> You can also try with decreasing num-executor cores.
> Basically, this happens when the executor is using quite large memory than
> it asked; and yarn kills the executor.
>
> Regards,
>
> Sushrut Ikhar
> [image: https://]about.me/sushrutikhar
> 
>
>
> On Wed, Sep 28, 2016 at 12:17 PM, Aditya  co.in> wrote:
>
>> I have a spark job which runs fine for small data. But when data
>> increases it gives executor lost error.My executor and driver memory are
>> set at its highest point. I have also tried increasing --conf
>> spark.yarn.executor.memoryOverhead=600 but still not able to fix the
>> problem. Is there any other solution to fix the problem?
>>
>>
>
>
>


Re: Spark Executor Lost issue

2016-09-28 Thread Aditya

Hi All,

Any updates on this?

On Wednesday 28 September 2016 12:22 PM, Sushrut Ikhar wrote:
Try with increasing the parallelism by repartitioning and also you may 
increase - spark.default.parallelism

You can also try with decreasing num-executor cores.
Basically, this happens when the executor is using quite large memory 
than it asked; and yarn kills the executor.


Regards,

Sushrut Ikhar
https://about.me/sushrutikhar




On Wed, Sep 28, 2016 at 12:17 PM, Aditya 
> wrote:


I have a spark job which runs fine for small data. But when data
increases it gives executor lost error.My executor and driver
memory are set at its highest point. I have also tried
increasing--conf spark.yarn.executor.memoryOverhead=600but still
not able to fix the problem. Is there any other solution to fix
the problem?








Re: Spark Executor Lost issue

2016-09-28 Thread Aditya

:

Thanks Sushrut for the reply.

Currently I have not defined spark.default.parallelism property.
Can you let me know how much should I set it to?


Regards,
Aditya Calangutkar

On Wednesday 28 September 2016 12:22 PM, Sushrut Ikhar wrote:
Try with increasing the parallelism by repartitioning and also you 
may increase - spark.default.parallelism

You can also try with decreasing num-executor cores.
Basically, this happens when the executor is using quite large memory 
than it asked; and yarn kills the executor.


Regards,

Sushrut Ikhar
https://about.me/sushrutikhar




On Wed, Sep 28, 2016 at 12:17 PM, Aditya 
> wrote:


I have a spark job which runs fine for small data. But when data
increases it gives executor lost error.My executor and driver
memory are set at its highest point. I have also tried
increasing--conf spark.yarn.executor.memoryOverhead=600but still
not able to fix the problem. Is there any other solution to fix
the problem?











Spark Executor Lost issue

2016-09-28 Thread Aditya
I have a spark job which runs fine for small data. But when data 
increases it gives executor lost error.My executor and driver memory are 
set at its highest point. I have also tried increasing--conf 
spark.yarn.executor.memoryOverhead=600but still not able to fix the 
problem. Is there any other solution to fix the problem?





Re: Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread Ted Yu
Do you mind providing a bit more information ?

release of Spark

code snippet of your app

version of Java

Thanks

On Tue, Aug 18, 2015 at 8:57 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi this GC overhead limit error is making me crazy. I have 20 executors
 using
 25 GB each I dont understand at all how can it throw GC overhead I also
 dont
 that that big datasets. Once this GC error occurs in executor it will get
 lost and slowly other executors getting lost because of IOException, Rpc
 client disassociated, shuffle not found etc Please help me solve this I am
 getting mad as I am new to Spark. Thanks in advance.

 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373,
 myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded
 at
 org.apache.spark.sql.types.UTF8String.toString(UTF8String.scala:150)
 at

 org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:120)
 at
 org.apache.spark.sql.columnar.STRING$.actualSize(ColumnType.scala:312)
 at

 org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.gatherCompressibilityStats(compressionSchemes.scala:224)
 at

 org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.gatherCompressibilityStats(CompressibleColumnBuilder.scala:72)
 at

 org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:80)
 at

 org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
 at

 org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:148)
 at

 org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:124)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
 at
 org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
 at
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-GC-overhead-limit-exceeded-even-though-using-20-executors-using-25GB-h-tp24322.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread unk1102
Hi this GC overhead limit error is making me crazy. I have 20 executors using
25 GB each I dont understand at all how can it throw GC overhead I also dont
that that big datasets. Once this GC error occurs in executor it will get
lost and slowly other executors getting lost because of IOException, Rpc
client disassociated, shuffle not found etc Please help me solve this I am
getting mad as I am new to Spark. Thanks in advance.

WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373,
myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.sql.types.UTF8String.toString(UTF8String.scala:150)
at
org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:120)
at
org.apache.spark.sql.columnar.STRING$.actualSize(ColumnType.scala:312)
at
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.gatherCompressibilityStats(compressionSchemes.scala:224)
at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.gatherCompressibilityStats(CompressibleColumnBuilder.scala:72)
at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:80)
at
org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:148)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:124)
at
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-GC-overhead-limit-exceeded-even-though-using-20-executors-using-25GB-h-tp24322.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark executor lost because of time out even after setting quite long time out value 1000 seconds

2015-08-17 Thread Akhil Das
It could be stuck on a GC pause, Can you check a bit more in the executor
logs and see whats going on? Also from the driver UI you would get to know
at which stage it is being stuck etc.

Thanks
Best Regards

On Sun, Aug 16, 2015 at 11:45 PM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I have written Spark job which seems to be working fine for almost an
 hour
 and after that executor start getting lost because of timeout I see the
 following in log statement

 15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with
 no
 recent heartbeats: 1051638 ms exceeds timeout 100 ms

 I dont see any errors but I see above warning and because of it executor
 gets removed by YARN and I see Rpc client disassociated error and
 IOException connection refused and FetchFailedException

 After executor gets removed I see it is again getting added and starts
 working and some other executors fails again. My question is is it normal
 for executor getting lost? What happens to that task lost executors were
 working on? My Spark job keeps on running since it is long around 4-5 hours
 I have very good cluster with 1.2 TB memory and good no of CPU cores. To
 solve above time out issue I tried to increase time spark.akka.timeout to
 1000 seconds but no luck. I am using the following command to run my Spark
 job Please guide I am new to Spark. I am using Spark 1.4.1. Thanks in
 advance.

 /spark-submit --class com.xyz.abc.MySparkJob  --conf
 spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
 --driver-java-options
 -XX:MaxPermSize=512m --driver-memory 4g --master yarn-client
 --executor-memory 25G --executor-cores 8 --num-executors 5 --jars
 /path/to/spark-job.jar



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-time-out-even-after-setting-quite-long-time-out-value-1000-seconds-tp24289.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Spark executor lost because of time out even after setting quite long time out value 1000 seconds

2015-08-16 Thread unk1102
Hi I have written Spark job which seems to be working fine for almost an hour
and after that executor start getting lost because of timeout I see the
following in log statement

15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no
recent heartbeats: 1051638 ms exceeds timeout 100 ms 

I dont see any errors but I see above warning and because of it executor
gets removed by YARN and I see Rpc client disassociated error and
IOException connection refused and FetchFailedException

After executor gets removed I see it is again getting added and starts
working and some other executors fails again. My question is is it normal
for executor getting lost? What happens to that task lost executors were
working on? My Spark job keeps on running since it is long around 4-5 hours
I have very good cluster with 1.2 TB memory and good no of CPU cores. To
solve above time out issue I tried to increase time spark.akka.timeout to
1000 seconds but no luck. I am using the following command to run my Spark
job Please guide I am new to Spark. I am using Spark 1.4.1. Thanks in
advance.

/spark-submit --class com.xyz.abc.MySparkJob  --conf
spark.executor.extraJavaOptions=-XX:MaxPermSize=512M --driver-java-options
-XX:MaxPermSize=512m --driver-memory 4g --master yarn-client
--executor-memory 25G --executor-cores 8 --num-executors 5 --jars
/path/to/spark-job.jar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-time-out-even-after-setting-quite-long-time-out-value-1000-seconds-tp24289.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark executor lost

2014-12-04 Thread Akhil Das
It says connection refused, just make sure the network is configured
properly (open the ports between master and the worker nodes). If the ports
are configured correctly, then i assume the process is getting killed for
some reason and hence connection refused.

Thanks
Best Regards

On Fri, Dec 5, 2014 at 12:30 AM, S. Zhou myx...@yahoo.com.invalid wrote:

 Here is a sample exception I collected from a spark worker node: (there
 are many such errors across over work nodes). It looks to me that spark
 worker failed to communicate to executor locally.

 14/12/04 04:26:37 ERROR EndpointWriter: AssociationError
 [akka.tcp://sparkwor...@spark-prod1.xxx:7079] -
 [akka.tcp://sparkexecu...@spark-prod1.xxx:47710]: Error [Association
 failed with [akka.tcp://sparkexecu...@spark-prod1.xxx:47710]] [
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkexecu...@spark-prod1.xxx:47710]
 Caused by:
 akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
 Connection refused: spark-prod1.XXX/10.51.XX.XX:47710



   On Wednesday, December 3, 2014 5:05 PM, Ted Yu yuzhih...@gmail.com
 wrote:


 bq.  to get the logs from the data nodes

 Minor correction: the logs are collected from machines where node managers
 run.

 Cheers

 On Wed, Dec 3, 2014 at 3:39 PM, Ganelin, Ilya ilya.gane...@capitalone.com
  wrote:

  You want to look further up the stack (there are almost certainly other
 errors before this happens) and those other errors may give your better
 idea of what is going on. Also if you are running on yarn you can run yarn
 logs -applicationId yourAppId to get the logs from the data nodes.



 Sent with Good (www.good.com)


 -Original Message-
 *From: *S. Zhou [myx...@yahoo.com.INVALID]
 *Sent: *Wednesday, December 03, 2014 06:30 PM Eastern Standard Time
 *To: *user@spark.apache.org
 *Subject: *Spark executor lost

  We are using Spark job server to submit spark jobs (our spark version is
 0.91). After running the spark job server for a while, we often see the
 following errors (executor lost) in the spark job server log. As a
 consequence, the spark driver (allocated inside spark job server) gradually
 loses executors. And finally the spark job server no longer be able to
 submit jobs. We tried to google the solutions but so far no luck. Please
 help if you have any ideas. Thanks!

 [2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend []
 [akka://JobServer/user/context-supervisor/next-staging] - Executor 6
 disconnected, so removing it
 [2014-11-25 01:37:36,252] ERROR cheduler.TaskSchedulerImpl []
 [akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6
 on : remote Akka client disassociated
 [2014-11-25 01:37:36,252] INFO  ark.scheduler.DAGScheduler [] [] - *Executor
 lost*: 6 (epoch 8)
 [2014-11-25 01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying
 to remove executor 6 from BlockManagerMaster.
 [2014-11-25 01:37:36,252] INFO  storage.BlockManagerMaster [] [] - Removed
 6 successfully in removeExecutor
 [2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor []
 [akka://JobServer/user/context-supervisor/next-staging] - Executor updated:
 app-20141125002023-0037/6 is now FAILED (Command exited with code 143)



 --
 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates. The information
 transmitted herewith is intended only for use by the individual or entity
 to which it is addressed.  If the reader of this message is not the
 intended recipient, you are hereby notified that any review,
 retransmission, dissemination, distribution, copying or other use of, or
 taking of any action in reliance upon this information is strictly
 prohibited. If you have received this communication in error, please
 contact the sender and delete the material from your computer.







Spark executor lost

2014-12-03 Thread S. Zhou
We are using Spark job server to submit spark jobs (our spark version is 0.91). 
After running the spark job server for a while, we often see the following 
errors (executor lost) in the spark job server log. As a consequence, the spark 
driver (allocated inside spark job server) gradually loses executors. And 
finally the spark job server no longer be able to submit jobs. We tried to 
google the solutions but so far no luck. Please help if you have any ideas. 
Thanks!
[2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor 6 
disconnected, so removing it[2014-11-25 01:37:36,252] ERROR 
cheduler.TaskSchedulerImpl [] 
[akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6 on 
: remote Akka client disassociated[2014-11-25 01:37:36,252] INFO  
ark.scheduler.DAGScheduler [] [] - Executor lost: 6 (epoch 8)[2014-11-25 
01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying to remove 
executor 6 from BlockManagerMaster.[2014-11-25 01:37:36,252] INFO  
storage.BlockManagerMaster [] [] - Removed 6 successfully in 
removeExecutor[2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor updated: 
app-20141125002023-0037/6 is now FAILED (Command exited with code 143)



RE: Spark executor lost

2014-12-03 Thread Ganelin, Ilya
You want to look further up the stack (there are almost certainly other errors 
before this happens) and those other errors may give your better idea of what 
is going on. Also if you are running on yarn you can run yarn logs 
-applicationId yourAppId to get the logs from the data nodes.



Sent with Good (www.good.com)


-Original Message-
From: S. Zhou [myx...@yahoo.com.INVALIDmailto:myx...@yahoo.com.INVALID]
Sent: Wednesday, December 03, 2014 06:30 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Spark executor lost

We are using Spark job server to submit spark jobs (our spark version is 0.91). 
After running the spark job server for a while, we often see the following 
errors (executor lost) in the spark job server log. As a consequence, the spark 
driver (allocated inside spark job server) gradually loses executors. And 
finally the spark job server no longer be able to submit jobs. We tried to 
google the solutions but so far no luck. Please help if you have any ideas. 
Thanks!

[2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor 6 
disconnected, so removing it
[2014-11-25 01:37:36,252] ERROR cheduler.TaskSchedulerImpl [] 
[akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6 on 
: remote Akka client disassociated
[2014-11-25 01:37:36,252] INFO  ark.scheduler.DAGScheduler [] [] - Executor 
lost: 6 (epoch 8)
[2014-11-25 01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying to 
remove executor 6 from BlockManagerMaster.
[2014-11-25 01:37:36,252] INFO  storage.BlockManagerMaster [] [] - Removed 6 
successfully in removeExecutor
[2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor updated: 
app-20141125002023-0037/6 is now FAILED (Command exited with code 143)




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Spark executor lost

2014-12-03 Thread Ted Yu
bq.  to get the logs from the data nodes

Minor correction: the logs are collected from machines where node managers
run.

Cheers

On Wed, Dec 3, 2014 at 3:39 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:

  You want to look further up the stack (there are almost certainly other
 errors before this happens) and those other errors may give your better
 idea of what is going on. Also if you are running on yarn you can run yarn
 logs -applicationId yourAppId to get the logs from the data nodes.



 Sent with Good (www.good.com)


 -Original Message-
 *From: *S. Zhou [myx...@yahoo.com.INVALID]
 *Sent: *Wednesday, December 03, 2014 06:30 PM Eastern Standard Time
 *To: *user@spark.apache.org
 *Subject: *Spark executor lost

  We are using Spark job server to submit spark jobs (our spark version is
 0.91). After running the spark job server for a while, we often see the
 following errors (executor lost) in the spark job server log. As a
 consequence, the spark driver (allocated inside spark job server) gradually
 loses executors. And finally the spark job server no longer be able to
 submit jobs. We tried to google the solutions but so far no luck. Please
 help if you have any ideas. Thanks!

 [2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend []
 [akka://JobServer/user/context-supervisor/next-staging] - Executor 6
 disconnected, so removing it
 [2014-11-25 01:37:36,252] ERROR cheduler.TaskSchedulerImpl []
 [akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6
 on : remote Akka client disassociated
 [2014-11-25 01:37:36,252] INFO  ark.scheduler.DAGScheduler [] [] - *Executor
 lost*: 6 (epoch 8)
 [2014-11-25 01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying
 to remove executor 6 from BlockManagerMaster.
 [2014-11-25 01:37:36,252] INFO  storage.BlockManagerMaster [] [] - Removed
 6 successfully in removeExecutor
 [2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor []
 [akka://JobServer/user/context-supervisor/next-staging] - Executor updated:
 app-20141125002023-0037/6 is now FAILED (Command exited with code 143)



 --

 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates. The information
 transmitted herewith is intended only for use by the individual or entity
 to which it is addressed.  If the reader of this message is not the
 intended recipient, you are hereby notified that any review,
 retransmission, dissemination, distribution, copying or other use of, or
 taking of any action in reliance upon this information is strictly
 prohibited. If you have received this communication in error, please
 contact the sender and delete the material from your computer.