Re: ERROR TaskSchedulerImpl: Lost an executor

2015-01-19 Thread Suresh Lankipalli
Hi Yu,

I am able to run Spark-example's, I  am unable to run SparkR example (only
Pi example is running on SparkR).

Thank you

Regards
Suresh

On Mon, Jan 19, 2015 at 3:08 PM, Ted Yu  wrote:

> Have you seen this thread ?
> http://search-hadoop.com/m/JW1q5PgA7X
>
> What Spark release are you running ?
>
> Cheers
>
> On Mon, Jan 19, 2015 at 12:04 PM, suresh  wrote:
>
>> I am trying to run SparkR shell on aws
>>
>> I am unable to access worker nodes webUI access.
>>
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 0 (already
>> removed): remote Akka client disassociated
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 1 (already
>> removed): remote Akka client disassociated
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 2 (already
>> removed): remote Akka client disassociated
>>
>> > 15/01/19 19:57:50 ERROR Remoting:
>> org.apache.spark.storage.BlockManagerId;
>> > local class incompatible: stream classdesc serialVersionUID =
>> > 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> at
>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>> at
>> akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
>> at
>>
>> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
>> at scala.util.Try$.apply(Try.scala:161)
>> at
>> akka.serialization.Serialization.deserialize(Serialization.scala:98)
>> at
>> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
>> at
>>
>> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
>> at
>> akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
>> at
>> akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
>> at
>>
>> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 15/01/19 19:57:50 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> at
>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>>

Re: ERROR TaskSchedulerImpl: Lost an executor

2015-01-19 Thread Suresh Lankipalli
I am running Spark 1.2.0 version

On Mon, Jan 19, 2015 at 3:08 PM, Ted Yu  wrote:

> Have you seen this thread ?
> http://search-hadoop.com/m/JW1q5PgA7X
>
> What Spark release are you running ?
>
> Cheers
>
> On Mon, Jan 19, 2015 at 12:04 PM, suresh  wrote:
>
>> I am trying to run SparkR shell on aws
>>
>> I am unable to access worker nodes webUI access.
>>
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 0 (already
>> removed): remote Akka client disassociated
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 1 (already
>> removed): remote Akka client disassociated
>> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 2 (already
>> removed): remote Akka client disassociated
>>
>> > 15/01/19 19:57:50 ERROR Remoting:
>> org.apache.spark.storage.BlockManagerId;
>> > local class incompatible: stream classdesc serialVersionUID =
>> > 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> at
>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>> at
>> akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
>> at
>>
>> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
>> at scala.util.Try$.apply(Try.scala:161)
>> at
>> akka.serialization.Serialization.deserialize(Serialization.scala:98)
>> at
>> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
>> at
>>
>> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
>> at
>> akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
>> at
>> akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
>> at
>>
>> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 15/01/19 19:57:50 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
>> local class incompatible: stream classdesc serialVersionUID =
>> 2439208141545036836, local class serialVersionUID = -7366074099953117729
>> at
>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.

Re: ERROR TaskSchedulerImpl: Lost an executor

2015-01-19 Thread Ted Yu
Have you seen this thread ?
http://search-hadoop.com/m/JW1q5PgA7X

What Spark release are you running ?

Cheers

On Mon, Jan 19, 2015 at 12:04 PM, suresh  wrote:

> I am trying to run SparkR shell on aws
>
> I am unable to access worker nodes webUI access.
>
> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 0 (already
> removed): remote Akka client disassociated
> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 1 (already
> removed): remote Akka client disassociated
> 15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 2 (already
> removed): remote Akka client disassociated
>
> > 15/01/19 19:57:50 ERROR Remoting:
> org.apache.spark.storage.BlockManagerId;
> > local class incompatible: stream classdesc serialVersionUID =
> > 2439208141545036836, local class serialVersionUID = -7366074099953117729
> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
> local class incompatible: stream classdesc serialVersionUID =
> 2439208141545036836, local class serialVersionUID = -7366074099953117729
> at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at
> akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
> at
>
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
> at scala.util.Try$.apply(Try.scala:161)
> at
> akka.serialization.Serialization.deserialize(Serialization.scala:98)
> at
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
> at
>
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
> at
> akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
> at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
> at
>
> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15/01/19 19:57:50 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
> local class incompatible: stream classdesc serialVersionUID =
> 2439208141545036836, local class serialVersionUID = -7366074099953117729
> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
> local class incompatible: stream classdesc serialVersionUID =
> 2439208141545036836, local class serialVersionUID = -7366074099953117729
> at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   

Re: ERROR TaskSchedulerImpl: Lost an executor

2015-01-19 Thread suresh
I am trying to run SparkR shell on aws

I am unable to access worker nodes webUI access.

15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 0 (already
removed): remote Akka client disassociated
15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 1 (already
removed): remote Akka client disassociated
15/01/19 19:57:17 ERROR TaskSchedulerImpl: Lost an executor 2 (already
removed): remote Akka client disassociated

> 15/01/19 19:57:50 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
> local class incompatible: stream classdesc serialVersionUID =
> 2439208141545036836, local class serialVersionUID = -7366074099953117729
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
at
akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(Serialization.scala:98)
at 
akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
at
akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
at
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/01/19 19:57:50 ERROR Remoting: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
at
akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(Serialization.scala:98)
at 
akka.remote.MessageSerializer$.deserialize(Messa

Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread Parviz Deyhim
it means you're out of disk space. Check to see if you have enough free
disk space left your node(s).


On Wed, Apr 23, 2014 at 2:08 PM, jaeholee  wrote:

> After doing that, I ran my code once with a smaller example, and it worked.
> But ever since then, I get the "No space left on device" message for the
> same sample, even if I re-start the master...
>
> ERROR TaskSetManager: Task 29.0:20 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 29.0:20 failed 4 times
> (most recent failure: Exception failure: java.io.IOException: No space left
> on device)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4699.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread jaeholee
So if I am using GraphX on Spark and I created a graph, which gets called a
lot later, do I want to cache graph? Or do I want to cache the vertices and
edges (actual data) that I use to create the graph?

e.g.
val graph = Graph(vertices, edges)

graph.blahblahblah
graph.blahblahblah
graph.blahblahblah


FYI,
I wanted to measured the time it takes to run my algorithm, so once I create
the graph, I force it Spark to read the data in by calling
graph.vertices.count and graph.edges.count since it does the lazy
evalutation.

Then I run the actual algorithm with time measure on. But basically it
doesn't even get to the algorithm portion because it breaks at
graph.edges.count when it reads the data...



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4701.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread jaeholee
After doing that, I ran my code once with a smaller example, and it worked.
But ever since then, I get the "No space left on device" message for the
same sample, even if I re-start the master...

ERROR TaskSetManager: Task 29.0:20 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted: Task 29.0:20 failed 4 times
(most recent failure: Exception failure: java.io.IOException: No space left
on device)  
  
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)


at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)


at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
   
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)  
   
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at scala.Option.foreach(Option.scala:236)   
  
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

  
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
  
at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
  
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)  
  
at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
  
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4699.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread Parviz Deyhim
You need to set SPARK_MEM or SPARK_EXECUTOR_MEMORY (for Spark 1.0) to
amount of memory your application needs to consume at each node. Try
setting those variables (example: export SPARK_MEM=10g) or set it via
SparkConf.set as suggested by jholee.


On Tue, Apr 22, 2014 at 4:25 PM, jaeholee  wrote:

> Ok. I tried setting the partition number to 128 and numbers greater than
> 128,
> and now I get another error message about "Java heap space". Is it possible
> that there is something wrong with the setup of my Spark cluster to begin
> with? Or is it still an issue with partitioning my data? Or do I just need
> more worker nodes?
>
>
> ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
> (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
> heap space)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread Daniel Darabos
With the right program you can always exhaust any amount of memory :).
There is no silver bullet. You have to figure out what is happening in your
code that causes a high memory use and address that. I spent all of last
week doing this for a simple program of my own. Lessons I learned that may
or may not apply to your case:

 - If you don't cache (persist) an RDD, it is not stored. This can save
memory at the cost of possibly repeating computation. (I read around a TB
of files twice, for example, rather than cache them.)
 - Use combineByKey instead of groupByKey if you can process values one by
one. This means they do not need to be all stored.
 - If you have a lot of keys per partition, set mapSideCombine=false for
combineByKey. This avoids creating a large map per partition.
 - If you have a key with a disproportionate number of values (like the
empty string for a missing name), discard it before the computation.
 - Read https://spark.apache.org/docs/latest/tuning.html for more (and more
accurate) information.

Good luck.


On Wed, Apr 23, 2014 at 1:25 AM, jaeholee  wrote:

> Ok. I tried setting the partition number to 128 and numbers greater than
> 128,
> and now I get another error message about "Java heap space". Is it possible
> that there is something wrong with the setup of my Spark cluster to begin
> with? Or is it still an issue with partitioning my data? Or do I just need
> more worker nodes?
>
>
> ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
> (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
> heap space)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread wxhsdp
i have a similar question
i'am testing in standalone mode in only one pc. 
i use ./sbin/start-master.sh to start a master and
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu:7077
to connect to the master

from the web ui, i can see the local worker registered 

 

but when i run whatever applications(for example the SimpleApp.scala in
quick start with input file=4.5KB), it failed with
14/04/23 16:39:13 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/04/23 16:39:13 WARN scheduler.TaskSetManager: Loss was due to
java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
at 
org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at 
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

i use sbt to run the application
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled
-XX:MaxPermSize=256M"
java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread Praveen R
I guess you need to limit the heap size. Add the below line in spark-env.sh
and make sure to rsync it all workers.

SPARK_JAVA_OPTS+=" -Xms512m -Xmx512m "



On Wed, Apr 23, 2014 at 4:55 AM, jaeholee  wrote:

> Ok. I tried setting the partition number to 128 and numbers greater than
> 128,
> and now I get another error message about "Java heap space". Is it possible
> that there is something wrong with the setup of my Spark cluster to begin
> with? Or is it still an issue with partitioning my data? Or do I just need
> more worker nodes?
>
>
> ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
> (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
> heap space)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread jaeholee
Ok. I tried setting the partition number to 128 and numbers greater than 128,
and now I get another error message about "Java heap space". Is it possible
that there is something wrong with the setup of my Spark cluster to begin
with? Or is it still an issue with partitioning my data? Or do I just need
more worker nodes?


ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
(most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
heap space) 
   
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)


at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)


at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
   
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)  
   
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at scala.Option.foreach(Option.scala:236)   
  
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

  
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
  
at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
  
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)  
  
at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
  
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread Daniel Darabos
On Wed, Apr 23, 2014 at 12:06 AM, jaeholee  wrote:

> How do you determine the number of partitions? For example, I have 16
> workers, and the number of cores and the worker memory set in spark-env.sh
> are:
>
> CORE = 8
> MEMORY = 16g
>

So you have the capacity to work on 16 * 8 = 128 tasks at a time. If you
use less than 128 partitions, some CPUs will go unused, and the job will
complete more slowly than it could. But depending on what you are actually
doing with the data, you may want to use much more partitions than that. I
find it fairly difficult to estimate the memory use -- it is much easier to
just try with more partitions and see if it is enough :). The overhead for
using more partitions than necessary is pretty small.

The .csv data I have is about 500MB, but I am eventually going to use a file
> that is about 15GB.
>
> Is the MEMORY variable in spark-env.sh different from spark.executor.memory
> that you mentioned? If they're different, how do I set
> spark.executor.memory?
>

It is probably the same thing. Sorry for confusing you. To be sure, check
the Spark master web UI while the application is connected. You will see
how much RAM is used per worker on the main page. You can set
"spark.executor.memory" through SparkConf.set("spark.executor.memory",
"10g") when creating the SparkContext.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread jaeholee
How do you determine the number of partitions? For example, I have 16
workers, and the number of cores and the worker memory set in spark-env.sh
are:

CORE = 8
MEMORY = 16g

The .csv data I have is about 500MB, but I am eventually going to use a file
that is about 15GB.

Is the MEMORY variable in spark-env.sh different from spark.executor.memory
that you mentioned? If they're different, how do I set
spark.executor.memory?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4621.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread Daniel Darabos
Most likely the data is not "just too big". For most operations the data is
processed partition by partition. The partitions may be too big. This is
what your last question hints at too:

> val numWorkers = 10
> val data = sc.textFile("somedirectory/data.csv", numWorkers)

This will work, but not quite what you want to do. The second parameter to
textFile is the number of partitions you want. Given the error you are
seeing, I'd recommend asking for more partitions -- they will be smaller.

Also make sure you set spark.executor.memory to the capacity of the worker
machines.


On Tue, Apr 22, 2014 at 11:09 PM, jaeholee  wrote:

> Spark is running fine, but I get this message. Does this mean that my data
> is
> just too big?
>
> 14/04/22 17:06:20 ERROR TaskSchedulerImpl: Lost executor 2 on WORKER#2:
> OutOfMemoryError
> 14/04/22 17:06:20 ERROR TaskSetManager: Task 550.0:2 failed 4 times;
> aborting job
> org.apache.spark.SparkException: Job aborted: Task 550.0:2 failed 4 times
> (most recent failure: unknown)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4618.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread jaeholee
Spark is running fine, but I get this message. Does this mean that my data is
just too big?

14/04/22 17:06:20 ERROR TaskSchedulerImpl: Lost executor 2 on WORKER#2:
OutOfMemoryError
14/04/22 17:06:20 ERROR TaskSetManager: Task 550.0:2 failed 4 times;
aborting job
org.apache.spark.SparkException: Job aborted: Task 550.0:2 failed 4 times
(most recent failure: unknown)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)


at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)


at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
   
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)  
   
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 
at scala.Option.foreach(Option.scala:236)   
  
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4618.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread jaeholee
wow! it worked! thank you so much!

so now, all I need to do is to put the number of workers that I want to use
when I read the data right?

e.g.
val numWorkers = 10
val data = sc.textFile("somedirectory/data.csv", numWorkers)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4615.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread Praveen R
Could you try setting MASTER variable in spark-env.sh

export MASTER=spark://:7077

For starting the standalone cluster, ./sbin/start-all.sh should work as far
as you have password less access to all machines. Any error here ?




On Tue, Apr 22, 2014 at 10:10 PM, jaeholee  wrote:

> No, I am not using the aws. I am using one of the national lab's cluster.
> But
> as I mentioned, I am pretty new to computer science, so I might not be
> answering your question right... but 7077 is accessible.
>
> Maybe I got it wrong from the get-go? I will just write down what I did...
>
> Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked
> one node (call it #1) to be the master (and one of the workers)
>
> I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT,
> MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT
>
> I start the master on #1 with ./sbin/start-master.sh
>
> But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that
> ssh
> into the worker nodes (#1 ~ #10) and start the worker:
>
> for server in $(cat /somedirectory/hostnames.txt)
> do
> ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class
> org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT >
> /somedirectory/nohup.out & exit"
> done
>
>
> then I go to #1, and start ./bin/spark-shell and that's when I get that
> error message.
>
> Sorry if it got more confusing..
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-22 Thread jaeholee
No, I am not using the aws. I am using one of the national lab's cluster. But
as I mentioned, I am pretty new to computer science, so I might not be
answering your question right... but 7077 is accessible.

Maybe I got it wrong from the get-go? I will just write down what I did...

Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked
one node (call it #1) to be the master (and one of the workers)

I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT,
MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT

I start the master on #1 with ./sbin/start-master.sh

But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that ssh
into the worker nodes (#1 ~ #10) and start the worker:

for server in $(cat /somedirectory/hostnames.txt)
do
ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class
org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT >
/somedirectory/nohup.out & exit"
done


then I go to #1, and start ./bin/spark-shell and that's when I get that
error message.

Sorry if it got more confusing..



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-21 Thread Praveen R
Do have cluster deployed on aws? Could you try checking if 7077 port is
accessible from worker nodes.


On Tue, Apr 22, 2014 at 2:56 AM, jaeholee  wrote:

> Hi, I am trying to set up my own standalone Spark, and I started the master
> node and worker nodes. Then I ran ./bin/spark-shell, and I get this
> message:
>
> 14/04/21 16:31:51 ERROR TaskSchedulerImpl: Lost an executor 1 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:51 ERROR TaskSchedulerImpl: Lost an executor 2 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:51 ERROR TaskSchedulerImpl: Lost an executor 3 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:51 ERROR TaskSchedulerImpl: Lost an executor 5 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:52 ERROR TaskSchedulerImpl: Lost an executor 6 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:52 ERROR TaskSchedulerImpl: Lost an executor 8 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:52 ERROR TaskSchedulerImpl: Lost an executor 9 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:53 ERROR TaskSchedulerImpl: Lost an executor 7 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:54 ERROR TaskSchedulerImpl: Lost an executor 10 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:54 ERROR TaskSchedulerImpl: Lost an executor 12 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:54 ERROR TaskSchedulerImpl: Lost an executor 11 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:54 ERROR AppClient$ClientActor: Master removed our
> application: FAILED; stopping client
> 14/04/21 16:31:54 ERROR TaskSchedulerImpl: Lost an executor 13 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:55 ERROR TaskSchedulerImpl: Lost an executor 4 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:55 ERROR TaskSchedulerImpl: Lost an executor 0 (already
> removed): remote Akka client disassociated
> 14/04/21 16:31:56 ERROR TaskSchedulerImpl: Lost an executor 14 (already
> removed): remote Akka client disassociated
>
>
> I am pretty new to Spark, or even programming.. so I am not sure what is
> wrong.
> any idea what could be wrong?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>