Re: [Scikit-learn-general] Multilabel and differences betweeen 0.14 and Master

Arnaud Joly Tue, 10 Jun 2014 23:43:26 -0700

Hi,

Could you provide some minimal data as to reproduce this behavior?


Best regards,
Arnaud


On 10 Jun 2014, at 16:53, Miguel Fernando Cabrera <[email protected]> wrote:

> Hi Everyone,
> 
> This is my first post in the list. I have been using scikit-learn actively 
> for the last six month in my M.Sc. thesis and now at my new job  I want to 
> use it for some tasks. I hope I can eventually become collaborator to the 
> project.
> 
> But lets start with a question :) - I wasn't sure if I should use 
> StackOverflow for this. Please let me know if it so.
> 
> I am using Scikit-learn for doing some multilabel classificaiton. I was 
> trying to use both 0.14 and master. However, when using master I get an 
> error. Even when using MultilabelBinarizer.
> 
> So here's the code working in 0.14.
> 
> #I instantiate the label binarizer to get the possible labels
> lb = LabelBinarizer().fit()
> 
> # then I transfor the existing values (list of possible labels)
> y_train =  lb.transform(y_val)
> 
> 
> svm = LinearSVC()
> 
> ovr_svm = OneVsRestClassifier(svm)
> 
> C_range = 2.0 ** np.arange(-2, 7)
> 
> param_grid = dict(estimator__C=C_range)
> 
> grid = GridSearchCV(estimator=ovr_svm,
>                     param_grid=param_grid,
>                     n_jobs=1,
>                     scoring='f1',
>                     cv=StratifiedKFold(y=y_train, n_folds=3),
>                     verbose=2)
> 
> grid.fit(X_train, y_train)
> 
> # This works OK, however when switching to 0.15 and using MultilabelBinarizer 
> I get the following error:
> 
> 
> 
> /Users/miguel/anaconda/envs/hclassifier/lib/python2.7/site-packages/sklearn/cross_validation.pyc
>  in __init__(self, y, n_folds, indices, shuffle, random_state)
>     427         for test_fold_idx, per_label_splits in 
> enumerate(zip(*per_label_cvs)):
>     428             for label, (_, test_split) in zip(unique_labels, 
> per_label_splits):
> --> 429                 label_test_folds = test_folds[y == label]
>     430                 # the test split can be too big because we used
>     431                 # KFold(max(c, self.n_folds), self.n_folds) instead of
> 
> ValueError: boolean index array should have 1 dimension
> 
> 
> I have not been following the development of the 0.15 but based on the last 
> e-mails there was some changes on the Multilabel representation. Maybe is 
> related? What should I change to make my code work for 0.15?
> 
> 
> Thanks in advance,
> 
> 
> Cheers
> -- 
> Miguel Cabrera 
> http://mfcabrera.com
> "A los hombres fuertes les pasa lo que a los barriletes; se elevan cuando es
> mayor el viento que se opone a su ascenso." - José Ingenieros
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Multilabel and differences betweeen 0.14 and Master

Reply via email to