I understand that LinearSVC is implemented using liblinear, which I thought
should work well with large datasets. However, when I pass LinearSVC.fit a
design matrix of size 40,000 x 14,400 (in float32 format, so 2.3 gigabytes)
it ends up using at least 8 additional gigabytes of RAM!
I know that the numpy array passed to scikits needs to be C contiguous to
avoid it being copied internally. I've checked and mine is, so that's not
the issue.
Is it normal for LinearSVC.fit to use so much memory? And if so, is this due
to some intrinsic requirement of the algorithm, the implementation of
liblinear, or the implementation of LinearSVC?
I'm using scikits.learn version 0.4 installed using apt-get in ubuntu 11.04,
if that's relevant.

Thanks in advance,
Ian
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to