Hello, 

I am having difficulty with a cross validation problem, and any help would be 
much appreciated. 

I have a large number of research subjects from 15 different data collection 
sites. I want to assess whether "site" has any influence on the data. 

It occurred to me that one way to do this would be to perform a 
cross-validation, via stratified k folds (stratified, because some sites have a 
larger number of subjects than others).  Unless I am mistaken, the results of 
this analysis should reveal whether "site" has an influence on the data.  
However, I am running into a problem because my training set is a different 
shape than the test data, which causes the analysis to fail. 

My data structure is pretty simple.  

X is a 3 by 1000 matrix of datapoints (that is, 3 datapoints per subject)
y is a 1 by 1000 matrix indicating the site (expressed as an integer ranging 
between 1 and 15). 


Here is the code that I use, and below it is the error that is produced. 


from sklearn import cross_validation
skf = cross_validation.StratifiedKFold(y, 15)

for train_index, test_index in skf:
        
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        clf = svm.SVC(kernel='rbf', C=1.0)
        clf.fit(X_train, X_test)




Traceback (most recent call last):
  File "cross_val.py", line 132, in <module>
    clf.fit(X_train, X_test)
  File 
"/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scikit_learn-0.13.1-py2.7-macosx-10.5-x86_64.egg/sklearn/svm/base.py",
 line 166, in fit
    (X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 966 samples, but y has 210.


Thanks for any help you can offer! 


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to