Ahh.. sorry >_<. I thought I made a new thread... sigh.

On 16 November 2012 15:33, Fred Mailhot <[email protected]> wrote:

> Check out SGDClassifier and partial_fit()...I've used these to good effect.
>
> Also, PROTIP: if you want decent help, don't piggy-back on threads that
> have nothing to do with your question.
>
> Just sayin'.
>
>
>
> On 16 November 2012 12:23, Ronnie Ghose <[email protected]> wrote:
>
>> Any ideas for online learning with Scikit? I have a data set that is >
>> 20gb that I want to train on.... I don't think I can do that easily, so
>> what should I do?
>>
>> Thanks,
>> Shomiron Ghose
>>
>>
>> On 15 November 2012 15:45, Fred Mailhot <[email protected]> wrote:
>>
>>> Dear list,
>>>
>>> I'm using GridSearchCV to do some simple model selection for a text
>>> classification task. I've got it working (see below for caveat), but I'm
>>> not convinced that I'm making the best use of this tool. If someone has the
>>> time/inclination, I'd love a set of eyes to check the following gist to see
>>> if I'm doing this correctly:
>>>
>>> https://gist.github.com/e2ca1910450819a8a28
>>>
>>> Also, for some reason this is throwing errors when I set n_jobs to
>>> anything other than 1. I'm on OS X 10.7.4, using sklearn 0.13. The
>>> traceback looks like:
>>>
>>> Process PoolWorker-1:
>>> Traceback (most recent call last):
>>>   File
>>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
>>> line 232, in _bootstrap
>>>     self.run()
>>>   File
>>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
>>> line 88, in run
>>>     self._target(*self._args, **self._kwargs)
>>>   File
>>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py",
>>> line 59, in worker
>>>     task = get()
>>>   File
>>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py",
>>> line 352, in get
>>>     return recv()
>>> TypeError: ('data type not understood', <type 'numpy.dtype'>, ('S0', 0,
>>> 1))
>>> Process PoolWorker-2:
>>> [...etc etc ad infinitum]
>>>
>>> Has anyone else come across this, or perhaps have any insight into
>>> what's going on? Needless to say, this grid search is taking FOREVER (ca.
>>> 10hrs thus far, and only about halfway through), and I'd love to be able to
>>> parallelize it.
>>>
>>> Many thanks,
>>> Fred.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Monitor your physical, virtual and cloud infrastructure from a single
>>> web console. Get in-depth insight into apps, servers, databases, vmware,
>>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>>> Pricing starts from $795 for 25 servers or applications!
>>> http://p.sf.net/sfu/zoho_dev2dev_nov
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor your physical, virtual and cloud infrastructure from a single
>> web console. Get in-depth insight into apps, servers, databases, vmware,
>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>> Pricing starts from $795 for 25 servers or applications!
>> http://p.sf.net/sfu/zoho_dev2dev_nov
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to