Hi Amy, On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5...@hotmail.com> wrote: > Hi Steve, > > Thank you very much for your reply. > > I’m trying to do something systematic/general in the program so that I can > try different datasets without changing much in the program (without knowing > the name of the class label that has different name from dataset to > another…) > > Could you please tell me your opinion about this code:- > > library(e1071) > > mydata<-read.delim("the_whole_dataset.txt") > > class_label <- names(mydata)[1] # I’ll always put the > class label in the first column. > > myformula <- formula(paste(class_label,"~ .")) > > x <- subset(mydata, select = - mydata[, 1]) > > mymodel<-(svm(myformula, x, cross=3)) > > summary(model) > > ################
Since you're not doing anything funky with the formula, a preference of mine is to just skip this way of calling SVM and go "straight" to the svm(x,y,...) method: R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) R> train.x <- mydata[,-1] R> train.y <- mydata[,1] R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") ## or R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") As an aside, I also like to be explicit about the type="" parameter to tell what I want my SVM to do (regression or classification). If it's not specified, the SVM picks which one to do based on whether or not your y vector is a vector of factors (does classification), or not (does regression) > Do I have to the same steps with testingset? i.e. the testing set must not > contain the label too? But contains the same structure as the training set? > Is it correct? I guess you'll want to report your accuracy/MSE/something on your model for your testing set? Just load the data in the same way then use `predict` to calculate the metric your after. You'll have to have the labels for your data to do that, though, eg: testdata <- as.matrix(read.delim('testdata.txt')) test.x <- testdata[,-1] test.y <- testdata[,1] preds <- predict(mymodel, test.x) Let's assume you're doing classification, so let's report the accuracy: acc <- sum(preds == test.y) / length(test.y) Does that help? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.