Re: [R] predict() an rpart() model: how to ignore missing levels in a factor
many thanks - that's perfect, excluding records on a rep-by-rep basis is what I was just hoping for but I probably didn't explain myself that well! James -- View this message in context: http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3050670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict() an rpart() model: how to ignore missing levels in a factor
I am using an algorigm to split my data set into two random sections repeatedly and constuct a model using rpart() on one, test on the other and average out the results. One of my variables is a factor(crop) where each crop type has a code. Some crop types occur infrequently or singly. when the data set is randomly split, it may be that the first data set has a crop type which is not present in the second and so using predict() I get the error: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : factor 'factor(c2001)' has new level(s) 13, 24, 35 where c2001 is the crop. I would like the predict function to ignore these records. is there a command which will allow this as part of the predict() function? With those with a small number of records (eg. 3-4), I would hope some of the models would have the right balance to allow a prediction to be made. -- View this message in context: http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] best predictive model for mixed catagorical/continuous variables
Would anybody be able to advise on which package would offer the best approach for producing a model able to predict the probability of species occupation based upon a range of variables, some of them catagorical (eg. ten soil types where the numbers assigned are not related to any qualitative/quantitative continuum or vegetation type) and others continuous such as field size or vegetation height. I have tried using the TREE package but the models produced seem too simplistic and discard most variables with the result that there is no predictive power in the result. I would expect that there will be interactions between variables eg. if the vegetation is grassland then the vegetation height variable will mediate the interaction, if the vegetation is arable then crop type will be more significant. Would it be possible to use GLM or GAM models for this type of predictive modelling? Any assistance would be greatly appreciated - it's several years since I last used R for this type of work and unfortunately I don't have the support network of a university to turn to for advice these days! -- View this message in context: http://r.789695.n4.nabble.com/best-predictive-model-for-mixed-catagorical-continuous-variables-tp3009275p3009275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.