Dear R-list!
I am using the e1071 package in R to solve a binary classification problem
in a dataset of round 180 predictor variables (blood metabolites) of two
groups of subjects (patients and healthy controls). I am confused regarding
the correct way to assess the classification accuracy of the trained svm.
(A) The svm command allows to specificy via the 'cross=k' parameter to
specify a k-fold crossvalidation. This results in k values for
classification accuracy and their corresponding mean. (B) On the other hand
most textbooks and tutorials I was browsing, recommend separating the data
into a training and a test-set and then to assess prediction accuarcy by
checking the accuracy of the trained svm when applied to the test-set.
I am not sure whether (A) and (B) would be alternative ways to assess
prediction accuracy? Or is option (A) only the accuracy of the svm when
applied to the test set and therefore I should implement option (B) after I
used option (A)?
So would it be the correct way to use first (A) then do grid-search (via the
tune command) to find the best hyperparameters and then test the trained svm
by applying it to the test set? And in case I use a linear kernel instead of
RBF, I guess I do not need to run grid-search as there are no
hyperparameters to estimate?

BEst,
Jokel

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to