[R] how to handle missing values in the data?

uttam . phulwale Wed, 05 Oct 2005 21:18:10 -0700

Hello Everybody,
I am reffering  David Meyer's  Benchmarking Support Vector Machines , 
Report No.78 (Nov.2002), i am newly working with R  but i am not sure how 
it is handling missing values in the benchmark datasets, I would be very 
thankful to you if you could let me know how to handle those missing 
numerical & categorical variables in the data (e.g. BreastCancer).


because, i am getting fewer predictions after trained model than the test 
observations for SVM, so could not calculate confusion matrix. At the same 
time, function lda(),fda() , rpart() did give the equal predictions. Then 
i m confused a lot, how these functions handled the missing values, are 
those missing values are imputed with mean, median or new category??

I have another problem with Generalized Linear Model (glm) function.  I 
might have commited some error, but i am not sure where i did?

The script for glm function i have tried is as:

trdata<-data.frame(train,row.names=NULL)
attach(trdata)

glmmod <- glm(Class~., family= binomial(link = 
"logit"),data=trdata,maxit=50)

tstdata<-data.frame(test,row.names=NULL)
attach(tstdata)

xtst <- subset(tstdata, select = -Class)
ytst <- Class

pred<-predict(glmmod,xtst)
library(mda)
confusion(pred,ytst)

can you help me to sort out the problems?

Uttam Phulwale
Tata Consultancy Services Limited
Mailto: [EMAIL PROTECTED]
Website: http://www.tcs.com


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] how to handle missing values in the data?

Reply via email to