Re: Error when testing with large sparse svm

2014-07-16 Thread crater
I don't really know how to create JIRA :(

Specifically, the code I commented out are:

//val prediction = model.predict(test.map(_.features))
//val predictionAndLabel = prediction.zip(test.map(_.label))
//val prediction = model.predict(training.map(_.features))
//val predictionAndLabel = prediction.zip(training.map(_.label))

//val metrics = new BinaryClassificationMetrics(predictionAndLabel)

//println(sTest areaUnderPR = ${metrics.areaUnderPR()}.)
//println(sTest areaUnderROC = ${metrics.areaUnderROC()}.)

in
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p10010.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-15 Thread crater
I got a bit progress. I think the problem is with the
BinaryClassificationMetrics, 
as long as I comment out all the prediction related metrics, I can run the
svm example with my data.
So the problem should be there I guess.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Error when testing with large sparse svm

2014-07-14 Thread crater
Hi,

I encounter an error when testing svm (example one) on very large sparse
data. The dataset I ran on was a toy dataset with only ten examples but 13
million sparse vector with a few thousands non-zero entries.

The errors is showing below. I am wondering is this a bug or I am missing
something?

14/07/13 23:59:44 INFO SecurityManager: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/07/13 23:59:44 INFO SecurityManager: Changing view acls to: chengjie
14/07/13 23:59:44 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(chengjie)
14/07/13 23:59:45 INFO Slf4jLogger: Slf4jLogger started
14/07/13 23:59:45 INFO Remoting: Starting remoting
14/07/13 23:59:45 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@master:53173]
14/07/13 23:59:45 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@master:53173]
14/07/13 23:59:45 INFO SparkEnv: Registering MapOutputTracker
14/07/13 23:59:45 INFO SparkEnv: Registering BlockManagerMaster
14/07/13 23:59:45 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20140713235945-c78f
14/07/13 23:59:45 INFO MemoryStore: MemoryStore started with capacity 14.4
GB.
14/07/13 23:59:45 INFO ConnectionManager: Bound socket to port 37674 with id
= ConnectionManagerId(master,37674)
14/07/13 23:59:45 INFO BlockManagerMaster: Trying to register BlockManager
14/07/13 23:59:45 INFO BlockManagerInfo: Registering block manager
master:37674 with 14.4 GB RAM
14/07/13 23:59:45 INFO BlockManagerMaster: Registered BlockManager
14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
14/07/13 23:59:45 INFO HttpBroadcast: Broadcast server started at
http://10.10.255.128:41838
14/07/13 23:59:45 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-ac459d4b-a3c4-4577-bad4-576ac427d0bf
14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
14/07/13 23:59:51 INFO SparkUI: Started SparkUI at http://master:4040
14/07/13 23:59:51 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/13 23:59:52 INFO EventLoggingListener: Logging events to
/tmp/spark-events/binaryclassification-with-params(hdfs---master-9001-splice.small,1,1.0,svm,l1,0.1)-1405317591776
14/07/13 23:59:52 INFO SparkContext: Added JAR
file:/home/chengjie/spark-1.0.1/examples/target/scala-2.10/spark-examples-1.0.1-hadoop2.3.0.jar
at http://10.10.255.128:54689/jars/spark-examples-1.0.1-hadoop2.3.0.jar with
timestamp 1405317592653
14/07/13 23:59:52 INFO AppClient$ClientActor: Connecting to master
spark://master:7077...
14/07/14 00:00:08 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/07/14 00:00:23 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/07/14 00:00:38 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/07/14 00:00:53 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
Training: 10
14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
*Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Serialized task 20:0 was 94453098 bytes which exceeds
spark.akka.frameSize (10485760 bytes). Consider using broadcast variables
for large values.*
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at 

Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi xiangrui,


Where can I set the spark.akka.frameSize ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9616.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi Krishna,

Thanks for your help. Are you able to get your 29M data running yet? I fix
the previous problem by setting larger spark.akka.frameSize, but now I get
some other errors below. Did you get these errors before?


14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote
Akka client disassociated
14/07/14 11:32:20 WARN TaskSetManager: Lost TID 20 (task 13.0:0)
14/07/14 11:32:21 ERROR TaskSchedulerImpl: Lost executor 3 on node8: remote
Akka client disassociated
14/07/14 11:32:21 WARN TaskSetManager: Lost TID 21 (task 13.0:1)
14/07/14 11:32:23 ERROR TaskSchedulerImpl: Lost executor 6 on node3: remote
Akka client disassociated
14/07/14 11:32:23 WARN TaskSetManager: Lost TID 22 (task 13.0:0)
14/07/14 11:32:25 ERROR TaskSchedulerImpl: Lost executor 0 on node4: remote
Akka client disassociated
14/07/14 11:32:25 WARN TaskSetManager: Lost TID 23 (task 13.0:1)
14/07/14 11:32:26 ERROR TaskSchedulerImpl: Lost executor 5 on node1: remote
Akka client disassociated
14/07/14 11:32:26 WARN TaskSetManager: Lost TID 24 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSchedulerImpl: Lost executor 7 on node6: remote
Akka client disassociated
14/07/14 11:32:28 WARN TaskSetManager: Lost TID 26 (task 13.0:0)
14/07/14 11:32:28 ERROR TaskSetManager: Task 13.0:0 failed 4 times; aborting
job
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 13.0:0 failed 4 times, most recent failure: TID 26 on
host node6 failed for unknown reason
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Error when testing with large sparse svm

2014-07-14 Thread crater


(1) What is number of partitions? Is it number of workers per node?
(2) I already set the driver memory pretty big, which is 25g.
(3) I am running Spark 1.0.1 in standalone cluster with 9 nodes, 1 one them
works as master, others are workers.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9706.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Putting block rdd failed when running example svm on large data

2014-07-12 Thread crater
Hi,

I am trying to run the example BinaryClassification
(org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am
constantly getting the messages looks like below, it is normal or I am
missing something.

14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped
from memory as it does not exist
14/07/12 09:49:04 WARN BlockManager: Putting block rdd_4_196 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_201 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_201 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_202 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_202 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_198 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_198 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_199 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_199 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_204 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_204 failed
14/07/12 09:49:06 WARN BlockManager: Block rdd_4_203 could not be dropped
from memory as it does not exist
14/07/12 09:49:06 WARN BlockManager: Putting block rdd_4_203 failed
14/07/12 09:49:07 WARN BlockManager: Block rdd_4_205 could not be dropped
from memory as it does not exist
14/07/12 09:49:07 WARN BlockManager: Putting block rdd_4_205 failed

Some info:
8 node cluster with 28G RAM per node, I configure 25G memory for spark. (So
this does not seem to be fit in memory).




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Putting block rdd failed when running example svm on large data

2014-07-12 Thread crater
Hi Xiangrui, 

Thanks for the information. Also, it is possible to figure out the execution
time per iteration for SVM?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.