Re: Performance of ALS

Sean Owen Fri, 19 Apr 2013 01:38:37 -0700

JBlas works nicely, it was easy to drop in a replacement as a solver.
The nice thing is having the JNI code pre-packaged and bundled in the
JAR file. I am getting SIGSEGV a lot but will report that on the list.


I still find this is about 4x slower though, in solving Ax=B, than the
Commons Math QR decomposition. Of course, this is doing an SVD, so
that's probably the difference.

Indeed if I benchmark just the wrapped 'geqrf' function, which does
most of a QR decomposition, it's about 3x faster than Java. I'm still
looking at how exactly this works to properly hook it up, and hoping
that the extra work that completes this approach will still be faster.
I'll post the code if it does.

On Thu, Apr 18, 2013 at 10:19 PM, Robin Anil <robin.a...@gmail.com> wrote:
> Sean, please add a benchmark too in integration so we can track the
> progression.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Thu, Apr 18, 2013 at 4:12 PM, Sebastian Schelter <s...@apache.org> wrote:
>>
>> Let us know the results! :)
>>
>> I think in the case of ALS, we can even use Solve.solveSymmetric()
>>
>> Best,
>> Sebastian
>>
>> On 18.04.2013 23:07, Sean Owen wrote:
>> > Good lead -- from
>> >
>> > https://github.com/mikiobraun/jblas/blob/master/src/main/java/org/jblas/Solve.java
>> > it looks like it's an SVD. Definitely took a search to figure out what
>> > 'gelsd' does in LAPACK! I'll see if I can test-drive this too to see
>> > if it bumps performance. That would be great, JNI is a much smaller
>> > requirement than a GPU!
>> >
>> > On Thu, Apr 18, 2013 at 10:01 PM, Sebastian Schelter <s...@apache.org>
>> > wrote:
>> >> Hi Sean,
>> >>
>> >> I simply used the Solve.solve() method, I guess it uses a QR
>> >> decomposition internally. I can provide a copy of the code if you want
>> >> to have a look.
>> >>
>> >> Best,
>> >> Sebastian
>> >>
>> >> On 18.04.2013 22:56, Sean Owen wrote:
>> >>> I'm always interested in optimizing the bit where you solve Ax=B which
>> >>> I so recently went on about. That's a dense-matrix problem. Is there a
>> >>> QR decomposition available?
>> >>>
>> >>> I tried getting this part to run on a GPU, and it worked, but wasn't
>> >>> faster. Still somehow it was slower to push the smalish dense matrix
>> >>> onto the card so many times per second. Same issue is identified here
>> >>> so I'm interested to hear if this is a win by using the direct buffer
>> >>> approach.
>> >>>
>> >>> On Thu, Apr 18, 2013 at 9:51 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>> >>> wrote:
>> >>>> i've looked at jblas some time year or two ago.
>> >>>>
>> >>>> It's a fast bridge to LAPack and LAPack by far is hard to beat. But,
>> >>>> I
>> >>>> think i convinced myself it lacks support for sparse stuff. Which
>> >>>> will work
>> >>>> nice though still for many blockified algorithms such as ALS-WR with
>> >>>> try to
>> >>>> avoid doing blas level 3 operations on sparse data.
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 18, 2013 at 1:45 PM, Robin Anil <robin.a...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> BTW did this include the changes I made in the trunk recently? I
>> >>>>> would also
>> >>>>> like to profile that code and see if we can squeeze out our Vectors
>> >>>>> and
>> >>>>> Matrices more. Could you point me to how I can run the 1M example.
>> >>>>>
>> >>>>> Robin
>> >>>>>
>> >>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Apr 18, 2013 at 3:43 PM, Robin Anil <robin.a...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> I was just emailing something similar on Mahout(See my email). I
>> >>>>>> saw the
>> >>>>>> TU Berlin name and I thought you would do something about it :)
>> >>>>>> This is
>> >>>>>> excellent. One of the next gen work on Vectors is maybe
>> >>>>>> investigating
>> >>>>> this.
>> >>>>>>
>> >>>>>>
>> >>>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Apr 18, 2013 at 3:37 PM, Sebastian Schelter <s...@apache.org
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi there,
>> >>>>>>>
>> >>>>>>> with regard to Robin mentioning JBlas [1] recently when we talked
>> >>>>>>> about
>> >>>>>>> the performance of our vector operations, I ported the solving
>> >>>>>>> code for
>> >>>>>>> ALS to JBlas today and got some awesome results.
>> >>>>>>>
>> >>>>>>> For the movielens 1M dataset and a factorization of rank 100, the
>> >>>>>>> runtimes per iteration dropped from 50 seconds to less than 7
>> >>>>>>> seconds. I
>> >>>>>>> will run some tests with the distributed version and larger
>> >>>>>>> datasets in
>> >>>>>>> the next days, but from what I've seen we should really take a
>> >>>>>>> closer
>> >>>>>>> look at JBlas, at least for operations on dense matrices.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Sebastian
>> >>>>>>>
>> >>>>>>> [1] http://mikiobraun.github.io/jblas/
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>
>>
>

Re: Performance of ALS

Reply via email to