Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-08-21 Thread Ravi Mody
I've been able to almost halve my memory usage with no instability issues. I lowered my storage.memoryFraction and increased my shuffle.memoryFraction (essentially swapping them). I set spark.yarn.executor.memoryOverhead to 6GB. And I lowered executor-cores in case other jobs are using the

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack

2015-07-28 Thread Xiangrui Meng
It seems that the error happens before ALS iterations. Could you try `ratings.first()` right after `ratings = newrdd.map(lambda l: Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)`? -Xiangrui On Fri, Jun 26, 2015 at 2:28 PM, Ayman Farahat ayman.fara...@yahoo.com wrote: I tried something similar

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Try setting the yarn executor memory overhead to a higher value like 1g or 1.5g or more. Regards Sab On 28-Jun-2015 9:22 am, Ayman Farahat ayman.fara...@yahoo.com wrote: That's correct this is Yarn And spark 1.4 Also using the Anaconda tar for Numpy and other Libs Sent from my iPhone On

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Are you running on top of YARN? Plus pls provide your infrastructure details. Regards Sab On 28-Jun-2015 8:47 am, Ayman Farahat ayman.fara...@yahoo.com.invalid wrote: Hello; I tried to adjust the number of blocks by repartitioning the input. Here is How I do it; (I am partitioning by users )

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
Where do I do that ? Thanks Sent from my iPhone On Jun 27, 2015, at 8:59 PM, Sabarish Sasidharan sabarish.sasidha...@manthan.com wrote: Try setting the yarn executor memory overhead to a higher value like 1g or 1.5g or more. Regards Sab On 28-Jun-2015 9:22 am, Ayman Farahat

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Are you running on top of YARN? Plus pls provide your infrastructure details. Regards Sab On 28-Jun-2015 9:20 am, Sabarish Sasidharan sabarish.sasidha...@manthan.com wrote: Are you running on top of YARN? Plus pls provide your infrastructure details. Regards Sab On 28-Jun-2015 8:47 am,

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
That's correct this is Yarn And spark 1.4 Also using the Anaconda tar for Numpy and other Libs Sent from my iPhone On Jun 27, 2015, at 8:50 PM, Sabarish Sasidharan sabarish.sasidha...@manthan.com wrote: Are you running on top of YARN? Plus pls provide your infrastructure details.

Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
Hello; I tried to adjust the number of blocks by repartitioning the input. Here is How I do it; (I am partitioning by users ) tot = newrdd.map(lambda l: (l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache() ratings = tot.values() numIterations =8 rank = 80 model =

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
Forgot to mention: rank of 100 usually works ok, 120 consistently cannot finish. On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody rmody...@gmail.com wrote: 1. These are my settings: rank = 100 iterations = 12 users = ~20M items = ~2M training examples = ~500M-1B (I'm running into the issue even

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
I use the mllib not the ML. Does that make a difference ? Sent from my iPhone On Jun 26, 2015, at 7:19 AM, Ravi Mody rmody...@gmail.com wrote: Forgot to mention: rank of 100 usually works ok, 120 consistently cannot finish. On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody rmody...@gmail.com

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
1. These are my settings: rank = 100 iterations = 12 users = ~20M items = ~2M training examples = ~500M-1B (I'm running into the issue even with 500M training examples) 2. The memory storage never seems to go too high. The user blocks may go up to ~10Gb, and each executor will have a few GB used

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
No, they use the same implementation. On Fri, Jun 26, 2015 at 8:05 AM, Ayman Farahat ayman.fara...@yahoo.com wrote: I use the mllib not the ML. Does that make a difference ? Sent from my iPhone On Jun 26, 2015, at 7:19 AM, Ravi Mody rmody...@gmail.com wrote: Forgot to mention: rank of 100

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
Please see my comments inline. It would be helpful if you can attach the full stack trace. -Xiangrui On Fri, Jun 26, 2015 at 7:18 AM, Ravi Mody rmody...@gmail.com wrote: 1. These are my settings: rank = 100 iterations = 12 users = ~20M items = ~2M training examples = ~500M-1B (I'm running

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
Hello ; I checked on my partitions/storage and here is what I have I have 80 executors 5 G per executore. Do i need to set additional params say cores spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g #

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack

2015-06-26 Thread Ayman Farahat
I tried something similar and got oration error I had 10 executors and 10 8 cores ratings = newrdd.map(lambda l: Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50) mypart = ratings.getNumPartitions() mypart 50 numIterations =10 rank = 100 model = ALS.trainImplicit(ratings, rank,

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
I set the number of partitions on the input dataset at 50. The number of CPU cores I'm using is 84 (7 executors, 12 cores). I'll look into getting a full stack trace. Any idea what my errors mean, and why increasing memory causes them to go away? Thanks. On Fri, Jun 26, 2015 at 11:26 AM,

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
how do i set these partitons? is this is the call to ALS model = ALS.trainImplicit(ratings, rank, numIterations)? On Jun 26, 2015, at 12:33 PM, Xiangrui Meng men...@gmail.com wrote: So you have 100 partitions (blocks). This might be too many for your dataset. Try setting a smaller number of

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-25 Thread Ayman Farahat
was there any resolution to that problem? I am also having that with Pyspark 1.4 380 Million observations 100 factors and 5 iterations Thanks Ayman On Jun 23, 2015, at 6:20 PM, Xiangrui Meng men...@gmail.com wrote: It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-23 Thread Xiangrui Meng
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more information to guess what happened: 1. Could you share the ALS settings, e.g., number of blocks, rank and number of iterations, as well as number of users/items in your dataset? 2. If you monitor the progress in the WebUI,

Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-19 Thread Ravi Mody
Hi, I'm running implicit matrix factorization/ALS in Spark 1.3.1 on fairly large datasets (1+ billion input records). As I grow my dataset I often run into issues with a lot of failed stages and dropped executors, ultimately leading to the whole application failing. The errors are like