Do you have a small test case that can reproduce the out of memory error ?
I have also seen some errors on large scale experiments but haven't managed
to narrow it down.

Thanks
Shivaram

On Fri, Mar 13, 2015 at 6:20 AM, Jaonary Rabarisoa <jaon...@gmail.com>
wrote:

> It runs faster but there is some drawbacks. It seems to consume more
> memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't
> have a sufficient partitions for a fixed amount of memory. With the older
> (ampcamp) implementation for the same data size I didn't get it.
>
> On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>>
>> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa <jaon...@gmail.com>
>> wrote:
>>
>>> In fact, by activating netlib with native libraries it goes faster.
>>>
>>> Glad you got it work ! Better performance was one of the reasons we made
>> the switch.
>>
>>> Thanks
>>>
>>> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
>>>> There are a couple of differences between the ml-matrix implementation
>>>> and the one used in AMPCamp
>>>>
>>>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS
>>>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib
>>>> BLAS which is faster but needs some setup [1] to pick up native libraries.
>>>> If native libraries are not found it falls back to a JVM implementation, so
>>>> that might explain the slow down.
>>>>
>>>> - The other difference if you are comparing the whole image pipeline is
>>>> that I think the AMPCamp version used NormalEquations which is around 2-3x
>>>> faster (just in terms of number of flops) compared to TSQR.
>>>>
>>>> [1]
>>>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <jaon...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm trying to play with the implementation of least square solver (Ax
>>>>> = b) in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10
>>>>> matrix. It works but I notice
>>>>> that it's 8 times slower than the implementation given in the latest
>>>>> ampcamp :
>>>>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
>>>>> . As far as I know these two implementations come from the same basis.
>>>>> What is the difference between these two codes ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
>>>>> shiva...@eecs.berkeley.edu> wrote:
>>>>>
>>>>>> There are couple of solvers that I've written that is part of the
>>>>>> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if
>>>>>> you are interested in porting them I'd be happy to review it
>>>>>>
>>>>>> Thanks
>>>>>> Shivaram
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>>>>>> [2]
>>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>>>>>
>>>>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <jaon...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Is there a least square solver based on DistributedMatrix that we
>>>>>>> can use out of the box in the current (or the master) version of spark ?
>>>>>>> It seems that the only least square solver available in spark is
>>>>>>> private to recommender package.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jao
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to