Do you have a small test case that can reproduce the out of memory error ? I have also seen some errors on large scale experiments but haven't managed to narrow it down.
Thanks Shivaram On Fri, Mar 13, 2015 at 6:20 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > It runs faster but there is some drawbacks. It seems to consume more > memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't > have a sufficient partitions for a fixed amount of memory. With the older > (ampcamp) implementation for the same data size I didn't get it. > > On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> >> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa <jaon...@gmail.com> >> wrote: >> >>> In fact, by activating netlib with native libraries it goes faster. >>> >>> Glad you got it work ! Better performance was one of the reasons we made >> the switch. >> >>> Thanks >>> >>> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>>> There are a couple of differences between the ml-matrix implementation >>>> and the one used in AMPCamp >>>> >>>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS >>>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib >>>> BLAS which is faster but needs some setup [1] to pick up native libraries. >>>> If native libraries are not found it falls back to a JVM implementation, so >>>> that might explain the slow down. >>>> >>>> - The other difference if you are comparing the whole image pipeline is >>>> that I think the AMPCamp version used NormalEquations which is around 2-3x >>>> faster (just in terms of number of flops) compared to TSQR. >>>> >>>> [1] >>>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries >>>> >>>> Thanks >>>> Shivaram >>>> >>>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <jaon...@gmail.com> >>>> wrote: >>>> >>>>> I'm trying to play with the implementation of least square solver (Ax >>>>> = b) in mlmatrix.TSQR where A is a 50000*1024 matrix and b a 50000*10 >>>>> matrix. It works but I notice >>>>> that it's 8 times slower than the implementation given in the latest >>>>> ampcamp : >>>>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html >>>>> . As far as I know these two implementations come from the same basis. >>>>> What is the difference between these two codes ? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < >>>>> shiva...@eecs.berkeley.edu> wrote: >>>>> >>>>>> There are couple of solvers that I've written that is part of the >>>>>> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if >>>>>> you are interested in porting them I'd be happy to review it >>>>>> >>>>>> Thanks >>>>>> Shivaram >>>>>> >>>>>> >>>>>> [1] >>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >>>>>> [2] >>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >>>>>> >>>>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <jaon...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> Is there a least square solver based on DistributedMatrix that we >>>>>>> can use out of the box in the current (or the master) version of spark ? >>>>>>> It seems that the only least square solver available in spark is >>>>>>> private to recommender package. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Jao >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >