We don't have the data, but my guess is that you want to have some factors in your data that were integers when you tried the code below.

Uwe Ligges


On 10.02.2012 03:43, Sam Steingold wrote:
I did this:
nb<- naiveBayes(users, platform)
pl<- predict(nb,users)
nrow(users) ==>  314781
ncol(users) ==>  109

1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
(tens of minutes).  why?

2. the predict results were completely off the mark (quite the opposite
of the expected overfitting).  suffice it to show the tables:

pl:

    android blackberry       ipad     iphone         lg      linux        mac
          3          5         11         14     312723          5         11
     mobile      nokia    samsung    symbian    unknown    windows
       1864         17         16        112          0          0

platform:
    android blackberry       ipad     iphone         lg      linux        mac
      18013       1221       2647       1328          4       2936      34336
     mobile      nokia    samsung    symbian    unknown    windows
         18         88         39        103       2660     251388

i.e., nb classified nearly everything as "lg" while in the actual data
"lg" is virtually nonexistent.

3. when I print "nb", I see "A-priori probabilities" (which are what I
expected) and "Conditional probabilities" which are confusing because
there are only two of them, e.g.:

              android    0.048464998 0.43946764
              blackberry 0.001638002 0.04045564
              ipad       0.322251606 1.84940588
              iphone     0.030873494 0.23250250
              lg         0.000000000 0.00000000
              linux      0.023501362 0.34698919
              mac        0.082653774 1.22535027
              mobile     0.000000000 0.00000000
              nokia      0.000000000 0.00000000
              samsung    0.000000000 0.00000000
              symbian    0.000000000 0.00000000
              unknown    0.003759398 0.08219078
              windows    0.021158528 0.32916970

the predictors are integers.
is the first column for the 0 predictors and the second for all non-0?
Is there a way to ask naiveBayes to differenciate between non-0 values?

thanks!


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to