[Scikit-learn-general] Possible bug in the prediction of linear classifier using Sparse Matrices

Anitha Gollamudi Fri, 21 Mar 2014 16:04:23 -0700

Hi,

I am using sparse matrices to train the logistic regression estimator
using OnevsRestClassifier.. The feature set is quite large
(~1.6million).


When the classifier has to predict, it raises an exception saying
number of features in "test dat" and "train data" are not equal.

I fail to understand how it can expect the number of features to be
equal when it comes to sparse matrix representation. For instance,
here is the snippet of my rudimentary code:

>>>
classifier = OneVsRestClassifier(LogisticRegression())
classifier = classifier.fit(X_train, y_train)

predicted = classifier.predict(X_test)
<<<

Here the shape of X_train and X_test are obviously different.

>>> print X_train.shape
(11, 1617899)
>>> print X_test.shape
(3, 83715)
>>>

So an exception is raised:

ValueError: X has 83715 features per sample; expecting 1617899


(Little source code probing says me that linear_model/base.py does
this comparison in decision_function())

Is this an expected behaviour?

-Anitha

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Possible bug in the prediction of linear classifier using Sparse Matrices

Reply via email to