Any ideas for online learning with Scikit? I have a data set that is > 20gb
that I want to train on.... I don't think I can do that easily, so what
should I do?

Thanks,
Shomiron Ghose


On 15 November 2012 15:45, Fred Mailhot <[email protected]> wrote:

> Dear list,
>
> I'm using GridSearchCV to do some simple model selection for a text
> classification task. I've got it working (see below for caveat), but I'm
> not convinced that I'm making the best use of this tool. If someone has the
> time/inclination, I'd love a set of eyes to check the following gist to see
> if I'm doing this correctly:
>
> https://gist.github.com/e2ca1910450819a8a28
>
> Also, for some reason this is throwing errors when I set n_jobs to
> anything other than 1. I'm on OS X 10.7.4, using sklearn 0.13. The
> traceback looks like:
>
> Process PoolWorker-1:
> Traceback (most recent call last):
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
> line 232, in _bootstrap
>     self.run()
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
> line 88, in run
>     self._target(*self._args, **self._kwargs)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py",
> line 59, in worker
>     task = get()
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py",
> line 352, in get
>     return recv()
> TypeError: ('data type not understood', <type 'numpy.dtype'>, ('S0', 0, 1))
> Process PoolWorker-2:
> [...etc etc ad infinitum]
>
> Has anyone else come across this, or perhaps have any insight into what's
> going on? Needless to say, this grid search is taking FOREVER (ca. 10hrs
> thus far, and only about halfway through), and I'd love to be able to
> parallelize it.
>
> Many thanks,
> Fred.
>
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to