[R] NaiveBayes fails with one input variable (caret and klarR packages)

2009-06-30 Thread Damian Krstajic

Hello,

We have a system which creates thousands of regression/classification models 
and in cases where we have only one input variable  NaiveBayes throws an error. 
Maybe I am mistaken and I shouldn't expect to have a model with only one input 
variable.

We use R version 2.6.0 (2007-10-03). We use caret (v4.1.19), but have tested 
similar code with klaR (v.0.5.8), because caret relies on NaiveBayes 
implementation from klaR. I get different error messages from caret than from 
klaR so I will provide the code for caret usage and klaR usage.

Here is the code which uses the iris dataset.

> library(klaR);
Loading required package: MASS
> X<-iris["Sepal.Length"];
> Y<-iris["Species"];
> mnX<-as.matrix (X);
> mnY<-as.matrix (Y);
> cY<-factor(mnY);
> d <- data.frame (cbind(mnX,cY));
> m<-NaiveBayes(cY~mnX, data=d);
> predict(m);
Error in as.vector(x, mode) : invalid argument 'mode'
> library(caret);
Loading required package: lattice
> mCaret<-train(mnX,cY,method="nb",trControl = trainControl(method = "cv", 
> number = 10));
Loading required package: class
Fitting: usekernel=TRUE
Fitting: usekernel=FALSE
> predicted <- predict(mCaret, newdata=mnX);
Error in 1:nrow(newdata) : NA/NaN argument
>

We use caret to call NaiveBayes and we don't have any error messages in cases 
where the number of input variables is greater than 1.

Cheers
DK

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nearZeroVar in caret fails

2009-06-17 Thread Damian Krstajic

I am using R version 2.6.0 on Linux (CentOS 4.5) and have a problem with 
executing nearZeroVar function in the package caret.

I am using the latest release of caret v4.17.

I have a matrix X with 266 rows and 4 columns and when implementing nearZeroVar 
function from caret package I get following error message.

> C <- nearZeroVar(X);
Error in table(data, useNA = "no") :
  all arguments must have the same length
Calls: nearZeroVar -> apply -> FUN -> table

I have executed step by step commands in the function nearZeroVar and found 
that it fails when it tries
> t<- table(X,useNa = "no")
Error in table(X, useNa = "no") : all arguments must have the same length

If I try without useNa="no" it works fine
> t<- table(X)
> t
X
   019   11   12   14   17   18   21   22   37   39   66  123
1026   10158111112412
>
When I tried to see the code for table there is no mention of useNa. 
> table
function (..., exclude = c(NA, NaN), dnn = list.names(...), deparse.level = 1)
{
Could this be a problem with my current version of R 2.6 ?  In the specs for  
caret it depends on R>2.5.1.

Thanks in advance 
DK

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict.rpart question

2008-02-26 Thread Damian Krstajic

Dear All,

I have a question regarding predict.rpart. I use
rpart to build classification and regression trees and I deal with data with
relatively large number of input variables (predictors). For example, I build an
rpart model like this

rpartModel <- rpart(Y ~ X, method="class",
minsplit =1, minbucket=nMinBucket,cp=nCp);

and get predictors used in building the model like
this

colnamesUsed<-unique(rownames(rpartModel$splits));

When later I apply the rpart model to predict the new
data I strip the input data from unneccessary columns and only use X columns
that exist in colnamesUsed. Unfortunately I get error message like this

Error: variable 'X' was fitted with type
"nmatrix.3522" but type "nmatrix.19" was supplied

The error message is correct. In the documentation it
clearly specifies that the predictors referred to in the right side of formula
(object) must be present by name in newdata, but I wonder why, if they are not
used?

Thanks

DK

_
Share what Santa brought you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict.rpart question

2008-02-26 Thread Damian Krstajic

Dear All,



I have a question regarding predict.rpart. I use
rpart to build classification and regression trees and I deal with data with
relatively large number of input variables (predictors). For example, I build an
rpart model like this



rpartModel <- rpart(Y ~ X, method="class",
minsplit =1, minbucket=nMinBucket,cp=nCp);



and get predictors used in building the model like
this



colnamesUsed<-unique(rownames(rpartModel$splits));



When later I apply the rpart model to predict the new
data I strip the input data from unneccessary columns and only use X columns
that exist in colnamesUsed. Unfortunately I get error message like this 

Error: variable 'X' was fitted with type
"nmatrix.3522" but type "nmatrix.19" was supplied



The error message is correct. In the documentation it
clearly specifies that the predictors referred to in the right side of formula
(object) must be present by name in newdata, but I wonder why, if they are not
used?



Thanks

DK
_
Who's friends with who and co-starred in what?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] genetic algorithms in solving classification problems

2008-01-07 Thread Damian Krstajic

Hello,

I am planning to implement Genetic Algorithms in solving classification 
problems. As far as I can see there is only genalg package that I can use. I 
can see that there are gafit and rgenoud packages that can be used for 
regression and optimisation problems but not for classification. Is there any 
other R package that I can use?

Any ideas on how to implement GA in resolving classification problem without 
re-inventing a wheel would be much appreciated. I can see that there are some 
good stuff in C++ for matlab but am keen to do it in R.

Thanks.
Damjan Krstajic
Director 
Research Centre for Cheminformatics
www.rcc.org.yu


_
Fancy some celeb spotting? 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.