I already know that things work with n_jobs=1. I just tried n_jobs=-1 with
a few smaller datasets (100 & 1000 items) and things seem to have worked
fine (without LinearSVC, see below). Possibly there's something wrong with
the larger dataset...investigating now.
A couple of points related to grid search:
1) there are a few LinearSVC options (penalty/loss, penalty/dual) for which
certain values are incompatible, but which are not documented as
such...this makes grid search a bit of a pain. Note, however, that the
errors thrown by this aren't the same as the ones I was getting previously.
2) how would I go about grid search over different vectorizers (e.g.
CountVectorizer(analyzer="word"), CountVectorizer(analyzer="char_wb"), and
a FeatureUnion of the two)?
Thanks!
Fred.
On 15 November 2012 14:18, Andreas Mueller <[email protected]> wrote:
> Are you sure the error is related to n_jobs, not a specific classifier?
> Could you run with n_jobs=1 and a very small training set (like 100
> examples or something)
> and see if it runs through?
> (Actually I'm totally clueless but that doesn't look like a
> multiprocessing error to me)
>
>
>
> On 11/15/2012 10:06 PM, Fred Mailhot wrote:
>
> Argh, copy-paste error:
>
> https://gist.github.com/e2ca1910450819a8a287
>
> As for Accelerate, I'm not 100% how to check that (I cloned & ran
> "setup.py build" and "setup.py install" without making any changes, if
> memory serves), but this leads me to think "yes":
>
> $ otool -L
> /Users/aboutuser/Development/Personal/scikit-learn/build/lib.macosx-10.7-intel-2.7/sklearn/svm/liblinear.so
>
> /Users/aboutuser/Development/Personal/scikit-learn/build/lib.macosx-10.7-intel-2.7/sklearn/svm/liblinear.so:
>
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> (compatibility version 1.0.0, current version 1.0.0)
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
> (compatibility version 1.0.0, current version 4.0.0)
> /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version
> 52.0.0)
> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
> 159.1.0)
>
> Thanks,
> Fred.
>
>
>
> On 15 November 2012 12:56, Andreas Mueller <[email protected]>wrote:
>
>> Hi Fred.
>> The link is dead for me.
>> Do you link against Accelerate (not sure if this is relevant)?
>>
>> Cheers,
>> Andy
>>
>>
>> On 11/15/2012 08:45 PM, Fred Mailhot wrote:
>>
>> Dear list,
>>
>> I'm using GridSearchCV to do some simple model selection for a text
>> classification task. I've got it working (see below for caveat), but I'm
>> not convinced that I'm making the best use of this tool. If someone has the
>> time/inclination, I'd love a set of eyes to check the following gist to see
>> if I'm doing this correctly:
>>
>> https://gist.github.com/e2ca1910450819a8a28
>>
>> Also, for some reason this is throwing errors when I set n_jobs to
>> anything other than 1. I'm on OS X 10.7.4, using sklearn 0.13. The
>> traceback looks like:
>>
>> Process PoolWorker-1:
>> Traceback (most recent call last):
>> File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
>> line 232, in _bootstrap
>> self.run()
>> File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py",
>> line 88, in run
>> self._target(*self._args, **self._kwargs)
>> File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py",
>> line 59, in worker
>> task = get()
>> File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py",
>> line 352, in get
>> return recv()
>> TypeError: ('data type not understood', <type 'numpy.dtype'>, ('S0', 0,
>> 1))
>> Process PoolWorker-2:
>> [...etc etc ad infinitum]
>>
>> Has anyone else come across this, or perhaps have any insight into
>> what's going on? Needless to say, this grid search is taking FOREVER (ca.
>> 10hrs thus far, and only about halfway through), and I'd love to be able to
>> parallelize it.
>>
>> Many thanks,
>> Fred.
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor your physical, virtual and cloud infrastructure from a single
>> web console. Get in-depth insight into apps, servers, databases, vmware,
>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>> Pricing starts from $795 for 25 servers or
>> applications!http://p.sf.net/sfu/zoho_dev2dev_nov
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing
>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor your physical, virtual and cloud infrastructure from a single
>> web console. Get in-depth insight into apps, servers, databases, vmware,
>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>> Pricing starts from $795 for 25 servers or applications!
>> http://p.sf.net/sfu/zoho_dev2dev_nov
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or
> applications!http://p.sf.net/sfu/zoho_dev2dev_nov
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general