Re: [R] Can this code be written more efficiently?

2010-09-30 Thread jim holtman
Have you tried using Rprof to determine where time is being spent in
the current code?  Have you looked at how much memory you are using?
Are you paging?  Have you run with a size 'x', then '2x' then '4x' to
see what the growth in both CPU time and memory usage is?  This is
what I would do if I were trying to debug/optimize one of my scripts.
Before I would run something for a day, I would understand how the
processing time increases with the size of the input file so that I
would have an idea of how long to wait.

On Thu, Sep 30, 2010 at 1:40 PM, Guelman, Leo  wrote:
> Dear users,
>
> I'm working on binary classification problem using Support Vector
> Machines (SVM). My objective is to train a series of SVM models on a
> grid of hyperparameters and then select those that maximize the AUC
> based on an independent validation sample.
>
> My attempted code is shown below. It runs well on "small" data sets but
> when I use it on a slightly larger sample (e.g., my train data is
> composed of about 8,000 observations on each class and 21 inputs), it
> takes "forever" to run (more than 1 day already and still running). I'm
> wondering if there's any way I can optimize this code. Thanks in advance
> for any help.
>
> I'm using 64-bit R 2.11.1 on Win 7.
>
> Start Code
>
> library(e1071)
> library(ROCR)
>
> ### Create grid of hyperparameters
>
> Gseq <- seq(-15,3,2); G <- rep(2, length(Gseq)); G <- G^Gseq
> Cseq <- seq(-5,13,2); C <- rep(2, length(Cseq)); C <- C^Cseq
> mygrid <- expand.grid(C=C, G=G)
>
> ### Train models
>
> svm.models <-  lapply(1:nrow(mygrid), function(i) {
>                svm(churn.form, data = mytraindata,
>                method = "C-classification", kernel = "radial",
>                cost = mygrid[i,1], gamma = mygrid[i,2],
> probability=TRUE)
>                })
>
> ### Predict on test set
>
> pred.step3 <- numeric(length(svm.models))
>
> for (i in 1:length(svm.models)) {
>
> pred.step1 <- predict(svm.models[[i]], myvaliddata, decision.values = F,
>
>              probability=T)
>
> pred.step2 <-
> prediction(predictions=attr(pred.step1,"probabilities")[,1],
> labels=myvaliddata$churn)
>
> pred.step3[i] <- performance(pred.step2, "auc")@y.values[[1]]
>
> }
>
> pred.step3
>
> End Code
>
>
> Thanks,
> Leo.
>
> ___
>
> This e-mail may be privileged and/or confidential, and the sender does not 
> waive
> any related rights and obligations. Any distribution, use or copying of this 
> e-mail or the information
> it contains by other than an intended recipient is unauthorized.
> If you received this e-mail in error, please advise me (by return e-mail or 
> otherwise) immediately.
>
> Ce courriel peut contenir des renseignements protégés et confidentiels.
> L’expéditeur ne renonce pas aux droits et obligations qui s’y rapportent.
> Toute diffusion, utilisation ou copie de ce courriel ou des renseignements 
> qu’il contient
> par une personne autre que le destinataire désigné est interdite.
> Si vous recevez ce courriel par erreur, veuillez m’en aviser immédiatement,
> par retour de courriel ou par un autre moyen.
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can this code be written more efficiently?

2010-09-30 Thread Guelman, Leo
Dear users,

I'm working on binary classification problem using Support Vector
Machines (SVM). My objective is to train a series of SVM models on a
grid of hyperparameters and then select those that maximize the AUC
based on an independent validation sample. 

My attempted code is shown below. It runs well on "small" data sets but
when I use it on a slightly larger sample (e.g., my train data is
composed of about 8,000 observations on each class and 21 inputs), it
takes "forever" to run (more than 1 day already and still running). I'm
wondering if there's any way I can optimize this code. Thanks in advance
for any help.

I'm using 64-bit R 2.11.1 on Win 7. 

Start Code

library(e1071)
library(ROCR)

### Create grid of hyperparameters

Gseq <- seq(-15,3,2); G <- rep(2, length(Gseq)); G <- G^Gseq
Cseq <- seq(-5,13,2); C <- rep(2, length(Cseq)); C <- C^Cseq
mygrid <- expand.grid(C=C, G=G)

### Train models

svm.models <-  lapply(1:nrow(mygrid), function(i) {
svm(churn.form, data = mytraindata,
method = "C-classification", kernel = "radial",
cost = mygrid[i,1], gamma = mygrid[i,2],
probability=TRUE)
})

### Predict on test set 

pred.step3 <- numeric(length(svm.models))

for (i in 1:length(svm.models)) {

pred.step1 <- predict(svm.models[[i]], myvaliddata, decision.values = F,

  probability=T)

pred.step2 <-
prediction(predictions=attr(pred.step1,"probabilities")[,1],
labels=myvaliddata$churn)

pred.step3[i] <- performance(pred.step2, "auc")@y.values[[1]]

}

pred.step3

End Code


Thanks,
Leo.

___

This e-mail may be privileged and/or confidential, and the sender does not waive
any related rights and obligations. Any distribution, use or copying of this 
e-mail or the information
it contains by other than an intended recipient is unauthorized.
If you received this e-mail in error, please advise me (by return e-mail or 
otherwise) immediately.

Ce courriel peut contenir des renseignements protégés et confidentiels.
L’expéditeur ne renonce pas aux droits et obligations qui s’y rapportent.
Toute diffusion, utilisation ou copie de ce courriel ou des renseignements 
qu’il contient
par une personne autre que le destinataire désigné est interdite.
Si vous recevez ce courriel par erreur, veuillez m’en aviser immédiatement, 
par retour de courriel ou par un autre moyen.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.