Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And
yes, I've been following the JIRA for the new ALS implementation. I'll try
it out when it's ready for testing. .

On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <men...@gmail.com> wrote:

> Hi Bharath,
>
> You can try setting a small item blocks in this case. 1200 is
> definitely too large for ALS. Please try 30 or even smaller. I'm not
> sure whether this could solve the problem because you have 100 items
> connected with 10^8 users. There is a JIRA for this issue:
>
> https://issues.apache.org/jira/browse/SPARK-3735
>
> which I will try to implement in 1.3. I'll ping you when it is ready.
>
> Best,
> Xiangrui
>
> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <reachb...@gmail.com>
> wrote:
> > Yes, the issue appears to be due to the 2GB block size limitation. I am
> > hence looking for (user, product) block sizing suggestions to work around
> > the block size limitation.
> >
> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> (It won't be that, since you see that the error occur when reading a
> >> block from disk. I think this is an instance of the 2GB block size
> >> limitation.)
> >>
> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
> >> <ilya.gane...@capitalone.com> wrote:
> >> > Hi Bharath – I’m unsure if this is your problem but the
> >> > MatrixFactorizationModel in MLLIB which is the underlying component
> for
> >> > ALS
> >> > expects your User/Product fields to be integers. Specifically, the
> input
> >> > to
> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
> >> > wondering if
> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
> quick
> >> > check for that?
> >> >
> >> > I have been running a very similar use case to yours (with more
> >> > constrained
> >> > hardware resources) and I haven’t seen this exact problem but I’m sure
> >> > we’ve
> >> > seen similar issues. Please let me know if you have other questions.
> >> >
> >> > From: Bharath Ravi Kumar <reachb...@gmail.com>
> >> > Date: Thursday, November 27, 2014 at 1:30 PM
> >> > To: "user@spark.apache.org" <user@spark.apache.org>
> >> > Subject: ALS failure with size > Integer.MAX_VALUE
> >> >
> >> > We're training a recommender with ALS in mllib 1.1 against a dataset
> of
> >> > 150M
> >> > users and 4.5K items, with the total number of training records being
> >> > 1.2
> >> > Billion (~30GB data). The input data is spread across 1200 partitions
> on
> >> > HDFS. For the training, rank=10, and we've configured {number of user
> >> > data
> >> > blocks = number of item data blocks}. The number of user/item blocks
> was
> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
> 1200
> >> > blocks each), there are atleast a couple of tasks that end up shuffle
> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
> >> > with
> >> > the following exception:
> >> >
> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
> >> >         at
> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
> >> >         at
> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
> >> >         at
> >> >
> >> >
> org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
> >> >         at
> >> >
> >> >
> org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
> >> >
> >
> >
>

Reply via email to