[R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold
When I tried to run svm on the same data frame, memory usage as reported
by top(1) doubled to 4GB almost right away and the function never
returned (has been running for ~15 hours now). ^C does not stop it.
This is most unusual, libsvm has always seemed very fast.

This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).

 * Sam Steingold f...@tah.bet [2012-02-09 21:43:30 -0500]:

 I did this:
 nb - naiveBayes(users, platform)
 pl - predict(nb,users)
 nrow(users) == 314781
 ncol(users) == 109

 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
 (tens of minutes).  why?

 2. the predict results were completely off the mark (quite the opposite
 of the expected overfitting).  suffice it to show the tables:

 pl:

android blackberry   ipad iphone lg  linuxmac 
  3  5 11 14 312723  5 11 
 mobile  nokiasamsungsymbianunknownwindows 
   1864 17 16112  0  0 

 platform:
android blackberry   ipad iphone lg  linuxmac 
  18013   1221   2647   1328  4   2936  34336 
 mobile  nokiasamsungsymbianunknownwindows 
 18 88 39103   2660 251388 

 i.e., nb classified nearly everything as lg while in the actual data
 lg is virtually nonexistent.

 3. when I print nb, I see A-priori probabilities (which are what I
 expected) and Conditional probabilities which are confusing because
 there are only two of them, e.g.:

  android0.048464998 0.43946764
  blackberry 0.001638002 0.04045564
  ipad   0.322251606 1.84940588
  iphone 0.030873494 0.23250250
  lg 0.0 0.
  linux  0.023501362 0.34698919
  mac0.082653774 1.22535027
  mobile 0.0 0.
  nokia  0.0 0.
  samsung0.0 0.
  symbian0.0 0.
  unknown0.003759398 0.08219078
  windows0.021158528 0.32916970

 the predictors are integers.
 is the first column for the 0 predictors and the second for all non-0?
 Is there a way to ask naiveBayes to differenciate between non-0 values?

 thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
http://jihadwatch.org http://camera.org http://www.memritv.org
Don't ascribe to malice what can be adequately explained by stupidity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold
 * Sam Steingold f...@tah.bet [2012-02-10 10:01:54 -0500]:

 When I tried to run svm on the same data frame, memory usage as reported
 by top(1) doubled to 4GB almost right away and the function never
 returned (has been running for ~15 hours now). ^C does not stop it.
 This is most unusual, libsvm has always seemed very fast.

looks like it _is_ libsvm:

#0  0x72aedc64 in Solver::select_working_set (this=0x7fff97f0, 
out_i=@0x7fff95a0, out_j=@0x7fff95b0) at svm.cpp:852
#1  0x72aef91d in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1, 
Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:573
#2  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444
#3  svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at 
svm.cpp:1641
#4  0x72af4a8e in svm_train (prob=optimized out, 
param=0x7fff9d40) at svm.cpp:2179
#5  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=optimized out, y=optimized out, rowindex=optimized out, 
colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, 
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, 
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, 
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, 
cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at 
Rsvm.c:275
#6  0x7792cefc in ?? () from /usr/lib/R/lib/libR.so
#7  0x7795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#8  0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#9  0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#11 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#12 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#13 0x779ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so
#14 0x779ada47 in ?? () from /usr/lib/R/lib/libR.so
#15 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#16 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#17 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#18 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#19 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#20 0x7795db9b in ?? () from /usr/lib/R/lib/libR.so
#21 0x7795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so
#22 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#23 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#24 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#25 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#26 0x77998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#27 0x779982e0 in ?? () from /usr/lib/R/lib/libR.so
#28 0x77998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#29 0x0040078b in main ()
#30 0x772d930d in __libc_start_main () from 
/lib/x86_64-linux-gnu/libc.so.6
#31 0x004007bd in _start ()


#0  0x72aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at 
svm.cpp:295
#1  0x72af7a25 in Kernel::kernel_rbf (this=optimized out, 
i=optimized out, j=optimized out) at svm.cpp:239
#2  0x72af782c in SVC_Q::get_Q (this=0x7fff9870, i=187701, 
len=208039) at svm.cpp:1271
#3  0x72aef9ab in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1,
Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:591
#4  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444
#5  svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at 
svm.cpp:1641
#6  0x72af4a8e in svm_train (prob=optimized out, 
param=0x7fff9d40) at svm.cpp:2179
#7  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=optimized out, y=optimized out, rowindex=optimized out,
colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0,
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50,
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40,
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88,