Hi Andy, Lars. Here I was picking up a colleague's code, I think he used
pairwise_kernels just because it was handy. I agree that if I just compute
X*X.T I get the sparse result that I'm after, it was more that I was
confused that various methods in that call stack made a point of preserving
sparsity until the final step which deliberately took it away.

Anyhow, I've now sparsified this particular routine so I'm back in the
happy state that 16GB is not a bottleneck (at all).

Cheers, i.

On 27 November 2014 at 16:37, Lars Buitinck <[email protected]> wrote:

> 2014-11-27 17:26 GMT+01:00 Ian Ozsvald <[email protected]>:
> > If safe_sparse_dot is called with dense_output=False then I get a sparse
> > result and everything looks sensible with low RAM usage.
> >
> > I'm using 0.15, the current github shows the line:
> >
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/pairwise.py#L692
> >
> > Was there a design decision to force dense matrices at this point? Maybe
> > some call paths assume a dense result?
>
> Practically all the consumers of kernels/distances expect to get dense
> outputs. If you just want pairwise cosine similarities and you expect
> lots of zeros, try X_normalized * Y_normalized.T.
>
> But note that pairwise_distances('cosine') computes the "cosine
> distance", which is 1 - cosine similarity, so if the result of the
> matrix multiplication is sparse, the "distance" result is guaranteed
> not to be sparse.
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Ian Ozsvald (A.I. researcher)
[email protected]

http://IanOzsvald.com
http://ModelInsight.io
http://MorConsulting.com
http://Annotate.IO
http://SocialTiesApp.com
http://TheScreencastingHandbook.com
http://FivePoundApp.com
http://twitter.com/IanOzsvald
http://ShowMeDo.com
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to