Re: [R] predict() an rpart() model: how to ignore missing levels in a factor

2010-11-19 Thread jamessc

many thanks - that's perfect, excluding records on a rep-by-rep basis is what
I was just hoping for but I probably didn't explain myself that well!

James
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3050670.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict() an rpart() model: how to ignore missing levels in a factor

2010-11-18 Thread jamessc

I am using an algorigm to split my data set into two random sections
repeatedly and constuct a model using rpart() on one, test on the other and
average out the results.

One of my variables is a factor(crop) where each crop type has a code. Some
crop types occur infrequently or singly. when the data set is randomly
split, it may be that the first data set has a crop type which is not
present in the second and so using predict() I get the error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
attr(object,  : 
  factor 'factor(c2001)' has new level(s) 13, 24, 35

where c2001 is the crop. I would like the predict function to ignore these
records. is there a command which will allow this as part of the predict()
function? With those with a small number of records (eg. 3-4), I would hope
some of the models would have the right balance to allow a prediction to be
made.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] best predictive model for mixed catagorical/continuous variables

2010-10-24 Thread jamessc

Would anybody be able to advise on which package would offer the best
approach for producing a model able to predict the probability of species
occupation based upon a range of variables, some of them catagorical (eg.
ten soil types where the numbers assigned are not related to any
qualitative/quantitative continuum or vegetation type) and others continuous
such as field size or vegetation height.

I have tried using the TREE package but the models produced seem too
simplistic and discard most variables with the result that there is no
predictive power in the result.

I would expect that there will be interactions between variables eg. if the
vegetation is grassland then the vegetation height variable will mediate the
interaction, if the vegetation is arable then crop type will be more
significant.

Would it be possible to use GLM or GAM models for this type of predictive
modelling?

Any assistance would be greatly appreciated - it's several years since I
last used R for this type of work and unfortunately I don't have the support
network of a university to turn to for advice these days!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/best-predictive-model-for-mixed-catagorical-continuous-variables-tp3009275p3009275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.