[R] Random Forests Variable Importance Question
I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: -mean raw importance score of variable x for class 0 -mean raw importance score of variable x for class 1 -MeanDecreaseAccuracy -MeanDecreaseGini Now I know what these mean as in I know their definitions. What I want to know is how to use them. What I am trying to figure out is what these values mean in only the context of how accurate they are, what is a good value, what is a bad value, what are the maximums and minimums, etc. If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does that mean it is important or unimportant? Also any information on the raw scores would be really helpful too. I want to know everything there is to know about these numbers that is relevant to the application of them. I don't really want a technical explanation that uses words like 'error', 'summation', or 'permutated', but rather a simpler explanation that didn't involve any discussion of how random forests works(I have read all about that and didn't find it very helpful.) Like if I wanted someone to explain to me how to use a radio, I wouldn't expect the explanation to involve how a radio converts radio waves into sound. If anyone can help me out at all it would be really great. I have read many many lectures on random forests and other data mining lectures but I have never found simple answers about how to read the variable importance measures. Thanks, Paul Fisch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help with stat functions(like adaboost, random forests and glm)
Ok, so basically I have a dataframe named data_frame data_frame contains: startdate startprice endpricethreshold1 endpricethreshold2 endpricethreshold3 all of these endpricethresholds are true/false binary vectors. They are true or false depending on whether the endprice was above or below whatever the endpricethreshold is. now I want to try to use lets say the general linear model to have it try and predict which endprice thresholds will be true or false dependent upon startdate and startprice. So I have a formula like: glm(endpricethreshold1 ~ ., data=data_frame[,c(1,2,3)], family=binomial(logit)); but, for the first term endpricethreshold1(since I really have tons of endpricethresholds and would like to make this a loop) I don't want to refer to it by its name but instead by its column indice like this: glm(data_frame[[3]] ~ ., data=data_frame[,c(1,2,3)], family=binomial(logit)); However, when I do this I am getting completely different results and I have no idea why. If anyone could help it would be greatly appreciated. Thanks, Paul Fisch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Some help with dates.
Hey, I'm new to R but familiar with other programming languages. Basically, I want to store an array of dates. For each of these dates I want to store only the day of the week and the hour. So for example: Monday 12 would be Monday at 12 o'clock and Tuesday 20 would be Tuesday at 8 p.m. Alternatively it could be stored as 0-6 for Sunday to Saturday. So Tuesday at 11 a.m. would be something like 2 11. These dates don't need to be stored exactly like I have written. I just would like to know how R would store a day of the week combined with an hour in a variable. Thanks, Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.