from:"insperatum"

Scaling problem in RandomForest?

2015-03-11 Thread insperatum

Hi, the Random Forest implementation (1.2.1) is repeatably crashing when I increase the depth to 20. I generate random synthetic data (36 workers, 1,000,000 examples per worker, 30 features per example) as follows: val data = sc.parallelize(1 to 36, 36).mapPartitionsWithIndex((i, _) = {

Requested array size exceeds VM limit

2015-02-23 Thread insperatum

Hi,I'm using MLLib to train a random forest. It's working fine to depth 15, but if I use depth 20 I get a*java.lang.OutOfMemoryError: Requested array size exceeds VM limit* on the driver, from the collectAsMap operation in DecisionTree.scala, around line 642.It doesn't happen until a good hour

Caching RDDs with shared memory - bug or feature?

2014-12-09 Thread insperatum

If all RDD elements within a partition contain pointers to a single shared object, Spark persists as expected when the RDD is small. However, if the RDD is more than *200 elements* then Spark reports requiring much more memory than it actually does. This becomes a problem for large RDDs, as Spark

RDD with object shared across elements within a partition. Magic number 200?

2014-11-22 Thread insperatum

Hi all, I am trying to persist a spark RDD in which the elements of each partition all share access to a single, large object. However, this object seems get stored in memory several times. Reducing my problem down to the toy case of just a single partition with only 200 elements: *val*

Re: RDD with object shared across elements within a partition. Magic number 200?

2014-11-22 Thread insperatum

Some more details: Adding a println to the function reveals that it is indeed called only once. Furthermore, running: /rdd/.map(_.s.hashCode).min == /rdd/.map(_.s.hashCode).max // returns true ...reveals that all 1000 elements do indeed point to the same object, and so the data structure

Scaling problem in RandomForest?

Requested array size exceeds VM limit

Caching RDDs with shared memory - bug or feature?

RDD with object shared across elements within a partition. Magic number 200?

Re: RDD with object shared across elements within a partition. Magic number 200?

5 matches

Site Navigation

Mail list logo

Footer information