issues. Please let me know if you have other questions.
From: Bharath Ravi Kumar reachb...@gmail.com
Date: Thursday, November 27, 2014 at 1:30 PM
To: user@spark.apache.org user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
We're training a recommender with ALS
user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
We're training a recommender with ALS in mllib 1.1 against a
dataset
of
150M
users and 4.5K items, with the total number of training records
being
1.2
Billion (~30GB data). The input data is spread
@spark.apache.org user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
We're training a recommender with ALS in mllib 1.1 against a dataset
of
150M
users and 4.5K items, with the total number of training records being
1.2
Billion (~30GB data). The input data
problem but I’m sure
we’ve
seen similar issues. Please let me know if you have other questions.
From: Bharath Ravi Kumar reachb...@gmail.com
Date: Thursday, November 27, 2014 at 1:30 PM
To: user@spark.apache.org user@spark.apache.org
Subject: ALS failure with size
this exact problem but I’m sure
we’ve
seen similar issues. Please let me know if you have other questions.
From: Bharath Ravi Kumar reachb...@gmail.com
Date: Thursday, November 27, 2014 at 1:30 PM
To: user@spark.apache.org user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
To: user@spark.apache.org user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
We're training a recommender with ALS in mllib 1.1 against a dataset of
150M
users and 4.5K items, with the total number of training records being 1.2
Billion (~30GB data). The input data
questions.
From: Bharath Ravi Kumar reachb...@gmail.commailto:reachb...@gmail.com
Date: Thursday, November 27, 2014 at 1:30 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: ALS failure with size Integer.MAX_VALUE
We're training
Any suggestions to address the described problem? In particular, it appears
that considering the skewed degree of some of the item nodes in the graph,
I believe it should be possible to define better block sizes to reflect
that fact, but am unsure of the way of arriving at the sizes accordingly.
We're training a recommender with ALS in mllib 1.1 against a dataset of
150M users and 4.5K items, with the total number of training records being
1.2 Billion (~30GB data). The input data is spread across 1200 partitions
on HDFS. For the training, rank=10, and we've configured {number of user
data