Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread Parviz Deyhim
it means you're out of disk space. Check to see if you have enough free
disk space left your node(s).


On Wed, Apr 23, 2014 at 2:08 PM, jaeholee  wrote:

> After doing that, I ran my code once with a smaller example, and it worked.
> But ever since then, I get the "No space left on device" message for the
> same sample, even if I re-start the master...
>
> ERROR TaskSetManager: Task 29.0:20 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 29.0:20 failed 4 times
> (most recent failure: Exception failure: java.io.IOException: No space left
> on device)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4699.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: ERROR TaskSchedulerImpl: Lost an executor

2014-04-23 Thread Parviz Deyhim
You need to set SPARK_MEM or SPARK_EXECUTOR_MEMORY (for Spark 1.0) to
amount of memory your application needs to consume at each node. Try
setting those variables (example: export SPARK_MEM=10g) or set it via
SparkConf.set as suggested by jholee.


On Tue, Apr 22, 2014 at 4:25 PM, jaeholee  wrote:

> Ok. I tried setting the partition number to 128 and numbers greater than
> 128,
> and now I get another error message about "Java heap space". Is it possible
> that there is something wrong with the setup of my Spark cluster to begin
> with? Or is it still an issue with partitioning my data? Or do I just need
> more worker nodes?
>
>
> ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
> (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
> heap space)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: spark-0.9.1 compiled with Hadoop 2.3.0 doesn't work with S3?

2014-04-21 Thread Parviz Deyhim
I ran into the same issue. The problem seems to be with the jets3t library
that Spark uses in project/SparkBuild.scala.

change this:

"net.java.dev.jets3t"  % "jets3t"   % "0.7.1"

to

"net.java.dev.jets3t"  % "jets3t"   % "0.9.0"

"0.7.1" is not the right version of jets3t for Hadoop 2.3.0


On Mon, Apr 21, 2014 at 11:30 AM, Nan Zhu  wrote:

>  Hi, all
>
> I’m writing a Spark application to load S3 data to HDFS,
>
> the HDFS version is 2.3.0, so I have to compile Spark with Hadoop 2.3.0
>
> after I execute
>
> val allfiles = sc.textFile("s3n://abc/*.txt”)
>
> val output = allfiles.saveAsTextFile("hdfs://x.x.x.x:9000/dataset”)
>
> Spark throws exception: (actually related to Hadoop?)
>
> java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
>
> at
> org.apache.hadoop.fs.s3.S3FileSystem.createDefaultStore(S3FileSystem.java:100)
>
> at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:90)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
>
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
>
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
>
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
>
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
>
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:891)
>
> at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741)
>
> at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692)
>
> at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574)
>
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900)
>
> at $iwC$$iwC$$iwC$$iwC.(:14)
>
> at $iwC$$iwC$$iwC.(:19)
>
> at $iwC$$iwC.(:21)
>
> at $iwC.(:23)
>
> at (:25)
>
> at .(:29)
>
> at .()
>
> at .(:7)
>
> at .()
>
> at $print()
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
>
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
>
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
>
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793)
>
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838)
>
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750)
>
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:598)
>
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:605)
>
> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:608)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:931)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
>
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:881)
>
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:973)
>
> at org.apache.spark.repl.Main$.main(Main.scala:31)
>
> at org.apache.spark.repl.Main.main(Main.scala)
>
> Caused by: java.lang.ClassNotFoundException:
> org.jets3t.service.ServiceException
>
> at java.net.URLClas

Re: Spark Streaming source from Amazon Kinesis

2014-04-21 Thread Parviz Deyhim
sorry Matei. Will definitely start working on making the changes soon :)


On Mon, Apr 21, 2014 at 1:10 PM, Matei Zaharia wrote:

> There was a patch posted a few weeks ago (
> https://github.com/apache/spark/pull/223), but it needs a few changes in
> packaging because it uses a license that isn’t fully compatible with
> Apache. I’d like to get this merged when the changes are made though — it
> would be a good input source to support.
>
> Matei
>
>
> On Apr 21, 2014, at 1:00 PM, Nicholas Chammas 
> wrote:
>
> I'm looking to start experimenting with Spark Streaming, and I'd like to
> use Amazon Kinesis  as my data source.
> Looking at the list of supported Spark Streaming 
> sources,
> I don't see any mention of Kinesis.
>
> Is it possible to use Spark Streaming with Amazon Kinesis? If not, are
> there plans to add such support in the future?
>
> Nick
>
>
> --
> View this message in context: Spark Streaming source from Amazon 
> Kinesis
> Sent from the Apache Spark User List mailing list 
> archiveat
> Nabble.com.
>
>
>


Re: Spark Streaming source from Amazon Kinesis

2014-04-21 Thread Parviz Deyhim
it is possible Nick. Please take a look here:
https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923

the source code is here as a pull request:
https://github.com/apache/spark/pull/223

let me know if you have any questions.


On Mon, Apr 21, 2014 at 1:00 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I'm looking to start experimenting with Spark Streaming, and I'd like to
> use Amazon Kinesis  as my data source.
> Looking at the list of supported Spark Streaming 
> sources,
> I don't see any mention of Kinesis.
>
> Is it possible to use Spark Streaming with Amazon Kinesis? If not, are
> there plans to add such support in the future?
>
> Nick
>
>
> --
> View this message in context: Spark Streaming source from Amazon 
> Kinesis
> Sent from the Apache Spark User List mailing list 
> archiveat Nabble.com.
>


Re: JMX with Spark

2014-04-15 Thread Parviz Deyhim
home directory or $home/conf directory? works for me with
metrics.properties hosted under conf dir.


On Tue, Apr 15, 2014 at 6:08 PM, Paul Schooss wrote:

> Has anyone got this working? I have enabled the properties for it in the
> metrics.conf file and ensure that it is placed under spark's home
> directory. Any ideas why I don't see spark beans ?
>


Largest Spark Cluster

2014-04-04 Thread Parviz Deyhim
Spark community,


What's the size of the largest Spark cluster ever deployed? I've heard
Yahoo is running Spark on several hundred nodes but don't know the actual
number.

can someone share?

Thanks