Re: [Scikit-learn-general] Possible bug in the prediction of linear classifier using Sparse Matrices

Lars Buitinck Sat, 22 Mar 2014 03:19:25 -0700

2014-03-22 0:04 GMT+01:00 Anitha Gollamudi <[email protected]>:
> Here the shape of X_train and X_test are obviously different.
>
>>>> print X_train.shape
> (11, 1617899)
>>>> print X_test.shape
> (3, 83715)
>>>>
>
> So an exception is raised:
>
> ValueError: X has 83715 features per sample; expecting 1617899
>
> Is this an expected behaviour?


Yes, or we wouldn't do this explicit check. The number of columns in X
should *always* be equal to the number at training time and the same
columns should be used to indicate the same features. The vectorizers
in sklearn.feature_extraction enfore this, so that the models
themselves can be kept agnostic of the meaning of the columns.

Question: how did you do the feature extraction?

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible bug in the prediction of linear classifier using Sparse Matrices

Reply via email to