Re: Putting block rdd failed when running example svm on large data

2014-07-12 Thread Aaron Davidson
Also check the web ui for that. Each iteration will have one or more stages
associated with it in the driver web ui.


On Sat, Jul 12, 2014 at 6:47 PM, crater  wrote:

> Hi Xiangrui,
>
> Thanks for the information. Also, it is possible to figure out the
> execution
> time per iteration for SVM?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: Putting block rdd failed when running example svm on large data

2014-07-12 Thread crater
Hi Xiangrui, 

Thanks for the information. Also, it is possible to figure out the execution
time per iteration for SVM?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Putting block rdd failed when running example svm on large data

2014-07-12 Thread Xiangrui Meng
By default, Spark uses half of the memory for caching RDDs
(configurable by spark.storage.memoryFraction). That is about 25 * 8 /
2 = 100G for your setup, which is smaller than the 202G data size. So
you don't have enough memory to fully cache the RDD. You can confirm
it in the storage tab of the WebUI. SVM is still able to run, but
slower. -Xiangrui

On Sat, Jul 12, 2014 at 11:10 AM, crater  wrote:
> Hi,
>
> I am trying to run the example BinaryClassification
> (org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am
> constantly getting the messages looks like below, it is normal or I am
> missing something.
>
> 14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:04 WARN BlockManager: Putting block rdd_4_196 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_201 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_201 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_202 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_202 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_198 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_198 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_199 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_199 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_204 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_204 failed
> 14/07/12 09:49:06 WARN BlockManager: Block rdd_4_203 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:06 WARN BlockManager: Putting block rdd_4_203 failed
> 14/07/12 09:49:07 WARN BlockManager: Block rdd_4_205 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:07 WARN BlockManager: Putting block rdd_4_205 failed
>
> Some info:
> 8 node cluster with 28G RAM per node, I configure 25G memory for spark. (So
> this does not seem to be fit in memory).
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.