[R] Strange question/result about SVM

Noah Silverman Mon, 14 Sep 2009 10:03:55 -0700

Hello,

I have a very unusual situation with an SVM and wanted to get thegroup's opinion.

We developed an experiment where we train the SVM with one set of data(train data) and then test with a completely independent set of data(test data). The results were VERY good.

I found and error in how we generate one of or training variables. Wediscovered that it was indirectly influenced by future events. Clearlythat needed to be fixed. Fixing the variable immediately changed ourresults from good to terrible. (Not a surprise since the erroneousvariable had future influence.)

A friend, who knows NOTHING of statistics or math, innocently asked,"Why don't you just keep that variable since it seems to make yourresults so much better." The idea, while naive, led me to thinking. Wecan include future data in the training set, since it occurred in thepast, but what to do with the test data from today? As a test, I triedsimply setting the variable to the average of the value in the trainingdata. The results were great! Now since the data is scaled, and we setthe variable to the same value (constant from average of training data.)it scaled to 0. Still, great results.


To summarize:

Bad var in training + Bad var in testing = great results
Good var in training + Good var in testing = bad results
Bad var in training + Constant in testing = great results.

I'm not an expert with the internals of the SVM, but clearly the badvariable is setting some kind of threshhold or intercept when definingthe model. Can someone help me figure out why/how this is working?


Thanks!

--
N

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Strange question/result about SVM

Reply via email to