On Feb 7, 2012, at 12:43 PM, Ali Tofigh wrote:

Hi,

I'm currently using the R package e1071 to train naive bayes
classifiers and came across a bug: When the posterior probabilities of
all classes are small, the result from the predict.naiveBayes function
become NaNs.

This should be sent to the maintainer of the package. The name of the maintainer can always be found in the DESCRIPTION file. Several of the authors are regular readers of rhelp, but I do not know whether David Meyer is. I'm sure a well-documented bug report, as this appears to be, will be welcomed.

--
David.
This is an issue with the treatment of the
log-transformed probabilities inside the predict.naiveBayes function.
Here is an example to demonstrate the problem (you might need to
increase 'nvar' depending on your machine):

-------------------- 8< --------------------
N <- 100
nvar <- 60
varnames <- paste("v", 1:nvar, sep="")

dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/ 2, 10, 1))})
colnames(dat) <- varnames

out <- rep(c("a","b"), each=N/2)
names(dat) <- varnames

nb <- naiveBayes(x=dat, y=out)

new.dat <- t(rnorm(nvar, 5, 0.1))
colnames(new.dat) <- varnames

predict(nb, new.dat, type="raw")
-------------------- 8< --------------------

the results of the last line is usually NaNs. As for the solution:

To protect agains very small numbers, the e1071:::predict.naiveBayes
function takes the probabilities into log-space and adds instead of
multiplying probabilities. However, when calculating the posterior
probabilities of each class (when type = "raw"), the log of the
probabilities are exponentiated, which defeats the purpose of the
logspace transformation. I suggest the following change to the code:

Towards the end of the predict.naiveBayes function, you currently do:

L <- exp(L)
L / sum(L)   # this is what is returned

you can instead use

sapply(L, function(lp) {1 / sum(exp(L - lp))})

the above comes from the following equality:

x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))

Best wishes,
/Ali Tofigh

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to