Re: Error when testing with large sparse svm
I don't really know how to create JIRA :( Specifically, the code I commented out are: //val prediction = model.predict(test.map(_.features)) //val predictionAndLabel = prediction.zip(test.map(_.label)) //val prediction = model.predict(training.map(_.features)) //val predictionAndLabel = prediction.zip(training.map(_.label)) //val metrics = new BinaryClassificationMetrics(predictionAndLabel) //println(sTest areaUnderPR = ${metrics.areaUnderPR()}.) //println(sTest areaUnderROC = ${metrics.areaUnderROC()}.) in examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p10010.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Error when testing with large sparse svm
I got a bit progress. I think the problem is with the BinaryClassificationMetrics, as long as I comment out all the prediction related metrics, I can run the svm example with my data. So the problem should be there I guess. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9832.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Error when testing with large sparse svm
Hi, I encounter an error when testing svm (example one) on very large sparse data. The dataset I ran on was a toy dataset with only ten examples but 13 million sparse vector with a few thousands non-zero entries. The errors is showing below. I am wondering is this a bug or I am missing something? 14/07/13 23:59:44 INFO SecurityManager: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/13 23:59:44 INFO SecurityManager: Changing view acls to: chengjie 14/07/13 23:59:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(chengjie) 14/07/13 23:59:45 INFO Slf4jLogger: Slf4jLogger started 14/07/13 23:59:45 INFO Remoting: Starting remoting 14/07/13 23:59:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@master:53173] 14/07/13 23:59:45 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@master:53173] 14/07/13 23:59:45 INFO SparkEnv: Registering MapOutputTracker 14/07/13 23:59:45 INFO SparkEnv: Registering BlockManagerMaster 14/07/13 23:59:45 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140713235945-c78f 14/07/13 23:59:45 INFO MemoryStore: MemoryStore started with capacity 14.4 GB. 14/07/13 23:59:45 INFO ConnectionManager: Bound socket to port 37674 with id = ConnectionManagerId(master,37674) 14/07/13 23:59:45 INFO BlockManagerMaster: Trying to register BlockManager 14/07/13 23:59:45 INFO BlockManagerInfo: Registering block manager master:37674 with 14.4 GB RAM 14/07/13 23:59:45 INFO BlockManagerMaster: Registered BlockManager 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server 14/07/13 23:59:45 INFO HttpBroadcast: Broadcast server started at http://10.10.255.128:41838 14/07/13 23:59:45 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ac459d4b-a3c4-4577-bad4-576ac427d0bf 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server 14/07/13 23:59:51 INFO SparkUI: Started SparkUI at http://master:4040 14/07/13 23:59:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/13 23:59:52 INFO EventLoggingListener: Logging events to /tmp/spark-events/binaryclassification-with-params(hdfs---master-9001-splice.small,1,1.0,svm,l1,0.1)-1405317591776 14/07/13 23:59:52 INFO SparkContext: Added JAR file:/home/chengjie/spark-1.0.1/examples/target/scala-2.10/spark-examples-1.0.1-hadoop2.3.0.jar at http://10.10.255.128:54689/jars/spark-examples-1.0.1-hadoop2.3.0.jar with timestamp 1405317592653 14/07/13 23:59:52 INFO AppClient$ClientActor: Connecting to master spark://master:7077... 14/07/14 00:00:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/14 00:00:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/14 00:00:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/14 00:00:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Training: 10 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS *Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 20:0 was 94453098 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.* at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
Re: Error when testing with large sparse svm
Hi xiangrui, Where can I set the spark.akka.frameSize ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9616.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Error when testing with large sparse svm
Hi Krishna, Thanks for your help. Are you able to get your 29M data running yet? I fix the previous problem by setting larger spark.akka.frameSize, but now I get some other errors below. Did you get these errors before? 14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote Akka client disassociated 14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0) 14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote Akka client disassociated 14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1) 14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote Akka client disassociated 14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0) 14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote Akka client disassociated 14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1) 14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote Akka client disassociated 14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0) 14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote Akka client disassociated 14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0) 14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting job Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on host node6 failed for unknown reason Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Error when testing with large sparse svm
(1) What is number of partitions? Is it number of workers per node? (2) I already set the driver memory pretty big, which is 25g. (3) I am running Spark 1.0.1 in standalone cluster with 9 nodes, 1 one them works as master, others are workers. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9706.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Putting block rdd failed when running example svm on large data
Hi, I am trying to run the example BinaryClassification (org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am constantly getting the messages looks like below, it is normal or I am missing something. 14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped from memory as it does not exist 14/07/12 09:49:04 WARN BlockManager: Putting block rdd_4_196 failed 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_201 could not be dropped from memory as it does not exist 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_201 failed 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_202 could not be dropped from memory as it does not exist 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_202 failed 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_198 could not be dropped from memory as it does not exist 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_198 failed 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_199 could not be dropped from memory as it does not exist 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_199 failed 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_204 could not be dropped from memory as it does not exist 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_204 failed 14/07/12 09:49:06 WARN BlockManager: Block rdd_4_203 could not be dropped from memory as it does not exist 14/07/12 09:49:06 WARN BlockManager: Putting block rdd_4_203 failed 14/07/12 09:49:07 WARN BlockManager: Block rdd_4_205 could not be dropped from memory as it does not exist 14/07/12 09:49:07 WARN BlockManager: Putting block rdd_4_205 failed Some info: 8 node cluster with 28G RAM per node, I configure 25G memory for spark. (So this does not seem to be fit in memory). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Putting block rdd failed when running example svm on large data
Hi Xiangrui, Thanks for the information. Also, it is possible to figure out the execution time per iteration for SVM? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html Sent from the Apache Spark User List mailing list archive at Nabble.com.