Yes. SVD.
http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-EC167C45-1A4E-4D3C-8652-9B48C788CDF0.htm On Thu, Apr 18, 2013 at 2:07 PM, Sean Owen <sro...@gmail.com> wrote: > Good lead -- from > > https://github.com/mikiobraun/jblas/blob/master/src/main/java/org/jblas/Solve.java > it looks like it's an SVD. Definitely took a search to figure out what > 'gelsd' does in LAPACK! I'll see if I can test-drive this too to see > if it bumps performance. That would be great, JNI is a much smaller > requirement than a GPU! > > On Thu, Apr 18, 2013 at 10:01 PM, Sebastian Schelter <s...@apache.org> > wrote: > > Hi Sean, > > > > I simply used the Solve.solve() method, I guess it uses a QR > > decomposition internally. I can provide a copy of the code if you want > > to have a look. > > > > Best, > > Sebastian > > > > On 18.04.2013 22:56, Sean Owen wrote: > >> I'm always interested in optimizing the bit where you solve Ax=B which > >> I so recently went on about. That's a dense-matrix problem. Is there a > >> QR decomposition available? > >> > >> I tried getting this part to run on a GPU, and it worked, but wasn't > >> faster. Still somehow it was slower to push the smalish dense matrix > >> onto the card so many times per second. Same issue is identified here > >> so I'm interested to hear if this is a win by using the direct buffer > >> approach. > >> > >> On Thu, Apr 18, 2013 at 9:51 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > >>> i've looked at jblas some time year or two ago. > >>> > >>> It's a fast bridge to LAPack and LAPack by far is hard to beat. But, I > >>> think i convinced myself it lacks support for sparse stuff. Which will > work > >>> nice though still for many blockified algorithms such as ALS-WR with > try to > >>> avoid doing blas level 3 operations on sparse data. > >>> > >>> > >>> On Thu, Apr 18, 2013 at 1:45 PM, Robin Anil <robin.a...@gmail.com> > wrote: > >>> > >>>> BTW did this include the changes I made in the trunk recently? I > would also > >>>> like to profile that code and see if we can squeeze out our Vectors > and > >>>> Matrices more. Could you point me to how I can run the 1M example. > >>>> > >>>> Robin > >>>> > >>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > >>>> > >>>> On Thu, Apr 18, 2013 at 3:43 PM, Robin Anil <robin.a...@gmail.com> > wrote: > >>>> > >>>>> I was just emailing something similar on Mahout(See my email). I saw > the > >>>>> TU Berlin name and I thought you would do something about it :) This > is > >>>>> excellent. One of the next gen work on Vectors is maybe investigating > >>>> this. > >>>>> > >>>>> > >>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>>> > >>>>> > >>>>> On Thu, Apr 18, 2013 at 3:37 PM, Sebastian Schelter <s...@apache.org > >>>>> wrote: > >>>>> > >>>>>> Hi there, > >>>>>> > >>>>>> with regard to Robin mentioning JBlas [1] recently when we talked > about > >>>>>> the performance of our vector operations, I ported the solving code > for > >>>>>> ALS to JBlas today and got some awesome results. > >>>>>> > >>>>>> For the movielens 1M dataset and a factorization of rank 100, the > >>>>>> runtimes per iteration dropped from 50 seconds to less than 7 > seconds. I > >>>>>> will run some tests with the distributed version and larger > datasets in > >>>>>> the next days, but from what I've seen we should really take a > closer > >>>>>> look at JBlas, at least for operations on dense matrices. > >>>>>> > >>>>>> Best, > >>>>>> Sebastian > >>>>>> > >>>>>> [1] http://mikiobraun.github.io/jblas/ > >>>>>> > >>>>> > >>>>> > >>>> > > >