Hello,
I am having difficulty with a cross validation problem, and any help would be
much appreciated.
I have a large number of research subjects from 15 different data collection
sites. I want to assess whether "site" has any influence on the data.
It occurred to me that one way to do this would be to perform a
cross-validation, via stratified k folds (stratified, because some sites have a
larger number of subjects than others). Unless I am mistaken, the results of
this analysis should reveal whether "site" has an influence on the data.
However, I am running into a problem because my training set is a different
shape than the test data, which causes the analysis to fail.
My data structure is pretty simple.
X is a 3 by 1000 matrix of datapoints (that is, 3 datapoints per subject)
y is a 1 by 1000 matrix indicating the site (expressed as an integer ranging
between 1 and 15).
Here is the code that I use, and below it is the error that is produced.
from sklearn import cross_validation
skf = cross_validation.StratifiedKFold(y, 15)
for train_index, test_index in skf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = svm.SVC(kernel='rbf', C=1.0)
clf.fit(X_train, X_test)
Traceback (most recent call last):
File "cross_val.py", line 132, in <module>
clf.fit(X_train, X_test)
File
"/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scikit_learn-0.13.1-py2.7-macosx-10.5-x86_64.egg/sklearn/svm/base.py",
line 166, in fit
(X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 966 samples, but y has 210.
Thanks for any help you can offer!
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general