I've been able to almost halve my memory usage with no instability issues.
I lowered my storage.memoryFraction and increased my shuffle.memoryFraction
(essentially swapping them). I set spark.yarn.executor.memoryOverhead to
6GB. And I lowered executor-cores in case other jobs are using the
It seems that the error happens before ALS iterations. Could you try
`ratings.first()` right after `ratings = newrdd.map(lambda l:
Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)`? -Xiangrui
On Fri, Jun 26, 2015 at 2:28 PM, Ayman Farahat ayman.fara...@yahoo.com wrote:
I tried something similar
Try setting the yarn executor memory overhead to a higher value like 1g or
1.5g or more.
Regards
Sab
On 28-Jun-2015 9:22 am, Ayman Farahat ayman.fara...@yahoo.com wrote:
That's correct this is Yarn
And spark 1.4
Also using the Anaconda tar for Numpy and other Libs
Sent from my iPhone
On
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 8:47 am, Ayman Farahat ayman.fara...@yahoo.com.invalid
wrote:
Hello;
I tried to adjust the number of blocks by repartitioning the input.
Here is How I do it; (I am partitioning by users )
Where do I do that ?
Thanks
Sent from my iPhone
On Jun 27, 2015, at 8:59 PM, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Try setting the yarn executor memory overhead to a higher value like 1g or
1.5g or more.
Regards
Sab
On 28-Jun-2015 9:22 am, Ayman Farahat
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 9:20 am, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 8:47 am,
That's correct this is Yarn
And spark 1.4
Also using the Anaconda tar for Numpy and other Libs
Sent from my iPhone
On Jun 27, 2015, at 8:50 PM, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Are you running on top of YARN? Plus pls provide your infrastructure details.
Hello;
I tried to adjust the number of blocks by repartitioning the input.
Here is How I do it; (I am partitioning by users )
tot = newrdd.map(lambda l:
(l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache()
ratings = tot.values()
numIterations =8
rank = 80
model =
Forgot to mention: rank of 100 usually works ok, 120 consistently cannot
finish.
On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody rmody...@gmail.com wrote:
1. These are my settings:
rank = 100
iterations = 12
users = ~20M
items = ~2M
training examples = ~500M-1B (I'm running into the issue even
I use the mllib not the ML. Does that make a difference ?
Sent from my iPhone
On Jun 26, 2015, at 7:19 AM, Ravi Mody rmody...@gmail.com wrote:
Forgot to mention: rank of 100 usually works ok, 120 consistently cannot
finish.
On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody rmody...@gmail.com
1. These are my settings:
rank = 100
iterations = 12
users = ~20M
items = ~2M
training examples = ~500M-1B (I'm running into the issue even with 500M
training examples)
2. The memory storage never seems to go too high. The user blocks may go up
to ~10Gb, and each executor will have a few GB used
No, they use the same implementation.
On Fri, Jun 26, 2015 at 8:05 AM, Ayman Farahat ayman.fara...@yahoo.com wrote:
I use the mllib not the ML. Does that make a difference ?
Sent from my iPhone
On Jun 26, 2015, at 7:19 AM, Ravi Mody rmody...@gmail.com wrote:
Forgot to mention: rank of 100
Please see my comments inline. It would be helpful if you can attach
the full stack trace. -Xiangrui
On Fri, Jun 26, 2015 at 7:18 AM, Ravi Mody rmody...@gmail.com wrote:
1. These are my settings:
rank = 100
iterations = 12
users = ~20M
items = ~2M
training examples = ~500M-1B (I'm running
Hello ;
I checked on my partitions/storage and here is what I have
I have 80 executors
5 G per executore.
Do i need to set additional params
say cores
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
#
I tried something similar and got oration error
I had 10 executors and 10 8 cores
ratings = newrdd.map(lambda l:
Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)
mypart = ratings.getNumPartitions()
mypart
50
numIterations =10
rank = 100
model = ALS.trainImplicit(ratings, rank,
I set the number of partitions on the input dataset at 50. The number of
CPU cores I'm using is 84 (7 executors, 12 cores).
I'll look into getting a full stack trace. Any idea what my errors mean,
and why increasing memory causes them to go away? Thanks.
On Fri, Jun 26, 2015 at 11:26 AM,
how do i set these partitons? is this is the call to ALS
model = ALS.trainImplicit(ratings, rank, numIterations)?
On Jun 26, 2015, at 12:33 PM, Xiangrui Meng men...@gmail.com wrote:
So you have 100 partitions (blocks). This might be too many for your dataset.
Try setting a smaller number of
was there any resolution to that problem?
I am also having that with Pyspark 1.4
380 Million observations
100 factors and 5 iterations
Thanks
Ayman
On Jun 23, 2015, at 6:20 PM, Xiangrui Meng men...@gmail.com wrote:
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need
more
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need
more information to guess what happened:
1. Could you share the ALS settings, e.g., number of blocks, rank and
number of iterations, as well as number of users/items in your
dataset?
2. If you monitor the progress in the WebUI,
Hi, I'm running implicit matrix factorization/ALS in Spark 1.3.1 on fairly
large datasets (1+ billion input records). As I grow my dataset I often run
into issues with a lot of failed stages and dropped executors, ultimately
leading to the whole application failing. The errors are like
20 matches
Mail list logo