Re: Error when testing with large sparse svm

2014-07-16 Thread Xiangrui Meng
Then it may be a new issue. Do you mind creating a JIRA to track this
issue? It would be great if you can help locate the line in
BinaryClassificationMetrics that caused the problem. Thanks! -Xiangrui

On Tue, Jul 15, 2014 at 10:56 PM, crater cq...@ucmerced.edu wrote:
 I don't really have my code, I was just running example program in :
 examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala

 What I did was simple try this example on a 13M sparse data, and I got the
 error I posted.
 Today I managed to ran it after I commented out the prediction part.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9884.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-16 Thread crater
I don't really know how to create JIRA :(

Specifically, the code I commented out are:

//val prediction = model.predict(test.map(_.features))
//val predictionAndLabel = prediction.zip(test.map(_.label))
//val prediction = model.predict(training.map(_.features))
//val predictionAndLabel = prediction.zip(training.map(_.label))

//val metrics = new BinaryClassificationMetrics(predictionAndLabel)

//println(sTest areaUnderPR = ${metrics.areaUnderPR()}.)
//println(sTest areaUnderROC = ${metrics.areaUnderROC()}.)

in
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p10010.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-15 Thread crater
I got a bit progress. I think the problem is with the
BinaryClassificationMetrics, 
as long as I comment out all the prediction related metrics, I can run the
svm example with my data.
So the problem should be there I guess.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-15 Thread Xiangrui Meng
crater, was the error message the same as what you posted before:

14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
Akka client disassociated
14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
Akka client disassociated
14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
Akka client disassociated
14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
Akka client disassociated
14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
Akka client disassociated
14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
Akka client disassociated
14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
job
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
host node6 failed for unknown reason
Driver stacktrace:

Could you paste your code on gist? It may help to identify the problem. Thanks!

Xiangrui

On Tue, Jul 15, 2014 at 2:51 PM, crater cq...@ucmerced.edu wrote:
 I got a bit progress. I think the problem is with the
 BinaryClassificationMetrics,
 as long as I comment out all the prediction related metrics, I can run the
 svm example with my data.
 So the problem should be there I guess.




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9832.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread Xiangrui Meng
You need to set a larger `spark.akka.frameSize`, e.g., 128, for the
serialized weight vector. There is a JIRA about switching
automatically between sending through akka or broadcast:
https://issues.apache.org/jira/browse/SPARK-2361 . -Xiangrui

On Mon, Jul 14, 2014 at 12:15 AM, crater cq...@ucmerced.edu wrote:
 Hi,

 I encounter an error when testing svm (example one) on very large sparse
 data. The dataset I ran on was a toy dataset with only ten examples but 13
 million sparse vector with a few thousands non-zero entries.

 The errors is showing below. I am wondering is this a bug or I am missing
 something?

 14/07/13 23:59:44 INFO SecurityManager: Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 14/07/13 23:59:44 INFO SecurityManager: Changing view acls to: chengjie
 14/07/13 23:59:44 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(chengjie)
 14/07/13 23:59:45 INFO Slf4jLogger: Slf4jLogger started
 14/07/13 23:59:45 INFO Remoting: Starting remoting
 14/07/13 23:59:45 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://spark@master:53173]
 14/07/13 23:59:45 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://spark@master:53173]
 14/07/13 23:59:45 INFO SparkEnv: Registering MapOutputTracker
 14/07/13 23:59:45 INFO SparkEnv: Registering BlockManagerMaster
 14/07/13 23:59:45 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20140713235945-c78f
 14/07/13 23:59:45 INFO MemoryStore: MemoryStore started with capacity 14.4
 GB.
 14/07/13 23:59:45 INFO ConnectionManager: Bound socket to port 37674 with id
 = ConnectionManagerId(master,37674)
 14/07/13 23:59:45 INFO BlockManagerMaster: Trying to register BlockManager
 14/07/13 23:59:45 INFO BlockManagerInfo: Registering block manager
 master:37674 with 14.4 GB RAM
 14/07/13 23:59:45 INFO BlockManagerMaster: Registered BlockManager
 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
 14/07/13 23:59:45 INFO HttpBroadcast: Broadcast server started at
 http://10.10.255.128:41838
 14/07/13 23:59:45 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-ac459d4b-a3c4-4577-bad4-576ac427d0bf
 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
 14/07/13 23:59:51 INFO SparkUI: Started SparkUI at http://master:4040
 14/07/13 23:59:51 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/07/13 23:59:52 INFO EventLoggingListener: Logging events to
 /tmp/spark-events/binaryclassification-with-params(hdfs---master-9001-splice.small,1,1.0,svm,l1,0.1)-1405317591776
 14/07/13 23:59:52 INFO SparkContext: Added JAR
 file:/home/chengjie/spark-1.0.1/examples/target/scala-2.10/spark-examples-1.0.1-hadoop2.3.0.jar
 at http://10.10.255.128:54689/jars/spark-examples-1.0.1-hadoop2.3.0.jar with
 timestamp 1405317592653
 14/07/13 23:59:52 INFO AppClient$ClientActor: Connecting to master
 spark://master:7077...
 14/07/14 00:00:08 WARN TaskSchedulerImpl: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient memory
 14/07/14 00:00:23 WARN TaskSchedulerImpl: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient memory
 14/07/14 00:00:38 WARN TaskSchedulerImpl: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient memory
 14/07/14 00:00:53 WARN TaskSchedulerImpl: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient memory
 Training: 10
 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
 com.github.fommil.netlib.NativeSystemBLAS
 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
 com.github.fommil.netlib.NativeRefBLAS
 *Exception in thread main org.apache.spark.SparkException: Job aborted due
 to stage failure: Serialized task 20:0 was 94453098 bytes which exceeds
 spark.akka.frameSize (10485760 bytes). Consider using broadcast variables
 for large values.*
 at
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at
 

Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi xiangrui,


Where can I set the spark.akka.frameSize ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9616.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
If you use Scala, you can do:

  val conf = new SparkConf()
 .setMaster(yarn-client)
 .setAppName(Logistic regression SGD fixed)
 .set(spark.akka.frameSize, 100)
 .setExecutorEnv(SPARK_JAVA_OPTS,  -Dspark.akka.frameSize=100)
var sc = new SparkContext(conf)


I have been struggling with this too. I was trying to run Spark on the
KDDB website which has about 29M features. It implodes and dies. Let
me know if you are able to figure out how to get things to work well
on really really wide datasets.

Regards,
Krishna

On Mon, Jul 14, 2014 at 10:18 AM, crater cq...@ucmerced.edu wrote:
 Hi xiangrui,


 Where can I set the spark.akka.frameSize ?



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9616.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi Krishna,

Thanks for your help. Are you able to get your 29M data running yet? I fix
the previous problem by setting larger spark.akka.frameSize, but now I get
some other errors below. Did you get these errors before?


14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
Akka client disassociated
14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
Akka client disassociated
14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
Akka client disassociated
14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
Akka client disassociated
14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
Akka client disassociated
14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
Akka client disassociated
14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
job
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
host node6 failed for unknown reason
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
That is exactly the same error that I got. I am still having no success.

Regards,
Krishna

On Mon, Jul 14, 2014 at 11:50 AM, crater cq...@ucmerced.edu wrote:
 Hi Krishna,

 Thanks for your help. Are you able to get your 29M data running yet? I fix
 the previous problem by setting larger spark.akka.frameSize, but now I get
 some other errors below. Did you get these errors before?


 14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
 Akka client disassociated
 14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
 14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
 Akka client disassociated
 14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
 14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
 Akka client disassociated
 14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
 14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
 Akka client disassociated
 14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
 14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
 Akka client disassociated
 14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
 Akka client disassociated
 14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
 job
 Exception in thread main org.apache.spark.SparkException: Job aborted due
 to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
 host node6 failed for unknown reason
 Driver stacktrace:
 at
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread Xiangrui Meng
Is it on a standalone server? There are several settings worthing checking:

1) number of partitions, which should match the number of cores
2) driver memory (you can see it from the executor tab of the Spark
WebUI and set it with --driver-memory 10g
3) the version of Spark you were running

Best,
Xiangrui

On Mon, Jul 14, 2014 at 12:14 PM, Srikrishna S srikrishna...@gmail.com wrote:
 That is exactly the same error that I got. I am still having no success.

 Regards,
 Krishna

 On Mon, Jul 14, 2014 at 11:50 AM, crater cq...@ucmerced.edu wrote:
 Hi Krishna,

 Thanks for your help. Are you able to get your 29M data running yet? I fix
 the previous problem by setting larger spark.akka.frameSize, but now I get
 some other errors below. Did you get these errors before?


 14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
 Akka client disassociated
 14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
 14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
 Akka client disassociated
 14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
 14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
 Akka client disassociated
 14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
 14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
 Akka client disassociated
 14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
 14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
 Akka client disassociated
 14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
 Akka client disassociated
 14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
 job
 Exception in thread main org.apache.spark.SparkException: Job aborted due
 to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
 host node6 failed for unknown reason
 Driver stacktrace:
 at
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
I am running Spark 1.0.1 on a 5 node yarn cluster. I have set the
driver memory to 8G and executor memory to about 12G.

Regards,
Krishna


On Mon, Jul 14, 2014 at 5:56 PM, Xiangrui Meng men...@gmail.com wrote:
 Is it on a standalone server? There are several settings worthing checking:

 1) number of partitions, which should match the number of cores
 2) driver memory (you can see it from the executor tab of the Spark
 WebUI and set it with --driver-memory 10g
 3) the version of Spark you were running

 Best,
 Xiangrui

 On Mon, Jul 14, 2014 at 12:14 PM, Srikrishna S srikrishna...@gmail.com 
 wrote:
 That is exactly the same error that I got. I am still having no success.

 Regards,
 Krishna

 On Mon, Jul 14, 2014 at 11:50 AM, crater cq...@ucmerced.edu wrote:
 Hi Krishna,

 Thanks for your help. Are you able to get your 29M data running yet? I fix
 the previous problem by setting larger spark.akka.frameSize, but now I get
 some other errors below. Did you get these errors before?


 14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
 Akka client disassociated
 14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
 14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
 Akka client disassociated
 14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
 14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
 Akka client disassociated
 14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
 14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
 Akka client disassociated
 14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
 14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
 Akka client disassociated
 14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
 Akka client disassociated
 14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
 14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
 job
 Exception in thread main org.apache.spark.SparkException: Job aborted due
 to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
 host node6 failed for unknown reason
 Driver stacktrace:
 at
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at 
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread crater


(1) What is number of partitions? Is it number of workers per node?
(2) I already set the driver memory pretty big, which is 25g.
(3) I am running Spark 1.0.1 in standalone cluster with 9 nodes, 1 one them
works as master, others are workers.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9706.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.