Re: [R] R parallel - slow speed

2015-07-31 Thread Martin Spindler
Thank you very much to you both for your help.

I knew that parallelizing has some additional "overhead" costs, but I was 
surprised be the order of magnitude (it was 10 times slower.) Therefore I 
thought I made some mistake or that there is a more clever way to do it.

Best,

Martin
 
 

Gesendet: Donnerstag, 30. Juli 2015 um 15:28 Uhr
Von: "jim holtman" 
An: "Jeff Newmiller" 
Cc: "Martin Spindler" , "r-help@r-project.org" 

Betreff: Re: [R] R parallel - slow speed

I ran a test on my Windows box with 4 CPUs.  THere were 4 RScript processes 
started in response to the request for a cluster of 4.  Each of these ran for 
an elapsed time of around 23 seconds, making the median time around 0.2 seconds 
for 100 iterations as reported by microbenchmark.  The 'apply' only takes about 
0.003 seconds for a single iteration - again what microbenchmark is reporting.
 
The 4 RScript processes each use about 3 CPU seconds in the 23 seconds of 
elapsed time, most of that is probably the communication and startup time for 
the processes and reporting results.
 
So as was pointed out previous there is overhead is running in parallel.  You 
probably have to have at least several seconds of heavy computation for a 
iteration to make trying to parallelize something.  You should also investigate 
exactly what is happening on your system so that you can account for the time 
being spent.
 

Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it. 
On Thu, Jul 30, 2015 at 8:56 AM, Jeff Newmiller  
wrote:Parallelizing comes at a price... and there is no guarantee that you can 
afford it. Vectorizing your algorithms is often a better approach. 
Microbenchmarking  is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make 
each core look like two.

The operating system can make a difference also... Windows processes are more 
expensive to start and communicate between than *nix processes are. In 
particular, Windows seems to require duplicated RAM pages while *nix can share 
process RAM (at least until they are written to) so you end up needing more 
memory and disk paging of virtual memory becomes more likely.
---
Jeff Newmiller                        The     .       .  Go Live...
DCN:        Basics: ##.#.   
    ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler 
 wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
>                           expr       min        lq      mean    median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
>   h = 0.5)  4.674914  4.726463  5.455323  4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights,      h = 0.5) 34.168250 35.434829 56.553296 39.438899
>        uq       max neval
>  4.843324  57.01519   100
> 49.777265 347.77887   100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
>  xc <- xeval[1]
>  xd <- xeval[2]
>  l <- function(x,X) {
>    w <-  Weights[x,X]
>    return(w)
>  }
>  u <- (Xc-xc)/h
>  #K <- kernel(u)
>  K <- dnorm(u)
>  L <- l(xd,Xd)
>  nom <- sum(y*K*L)
>  denom <- sum(K*L)
>  ghat <- nom/denom
>  return

Re: [R] R parallel - slow speed

2015-07-31 Thread Martin Spindler
Thank you very much for your help.
 
I tried it under Unix and then the parallel version was faster than under 
Windows (but still slower than the non parall version). This is an important 
point to keep in mind. Thanks for this.
 
Best,
 
Martin

 
 

Gesendet: Donnerstag, 30. Juli 2015 um 14:56 Uhr
Von: "Jeff Newmiller" 
An: "Martin Spindler" , "r-help@r-project.org" 

Betreff: Re: [R] R parallel - slow speed
Parallelizing comes at a price... and there is no guarantee that you can afford 
it. Vectorizing your algorithms is often a better approach. Microbenchmarking 
is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make 
each core look like two.

The operating system can make a difference also... Windows processes are more 
expensive to start and communicate between than *nix processes are. In 
particular, Windows seems to require duplicated RAM pages while *nix can share 
process RAM (at least until they are written to) so you end up needing more 
memory and disk paging of virtual memory becomes more likely.
---
Jeff Newmiller The . . Go Live...
DCN: Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler  wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
> expr min lq mean median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
> h = 0.5) 4.674914 4.726463 5.455323 4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899
> uq max neval
> 4.843324 57.01519 100
> 49.777265 347.77887 100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
> xc <- xeval[1]
> xd <- xeval[2]
> l <- function(x,X) {
> w <- Weights[x,X]
> return(w)
> }
> u <- (Xc-xc)/h
> #K <- kernel(u)
> K <- dnorm(u)
> L <- l(xd,Xd)
> nom <- sum(y*K*L)
> denom <- sum(K*L)
> ghat <- nom/denom
> return(ghat)
>}
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html]
>and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R parallel - slow speed

2015-07-30 Thread jim holtman
I ran a test on my Windows box with 4 CPUs.  THere were 4 RScript processes
started in response to the request for a cluster of 4.  Each of these ran
for an elapsed time of around 23 seconds, making the median time around 0.2
seconds for 100 iterations as reported by microbenchmark.  The 'apply' only
takes about 0.003 seconds for a single iteration - again what
microbenchmark is reporting.

The 4 RScript processes each use about 3 CPU seconds in the 23 seconds of
elapsed time, most of that is probably the communication and startup time
for the processes and reporting results.

So as was pointed out previous there is overhead is running in parallel.
You probably have to have at least several seconds of heavy computation for
a iteration to make trying to parallelize something.  You should also
investigate exactly what is happening on your system so that you can
account for the time being spent.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 30, 2015 at 8:56 AM, Jeff Newmiller 
wrote:

> Parallelizing comes at a price... and there is no guarantee that you can
> afford it. Vectorizing your algorithms is often a better approach.
> Microbenchmarking  is usually overkill for evaluating parallelizing.
>
> You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to
> make each core look like two.
>
> The operating system can make a difference also... Windows processes are
> more expensive to start and communicate between than *nix processes are. In
> particular, Windows seems to require duplicated RAM pages while *nix can
> share process RAM (at least until they are written to) so you end up
> needing more memory and disk paging of virtual memory becomes more likely.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On July 30, 2015 8:26:34 AM EDT, Martin Spindler 
> wrote:
> >Dear all,
> >
> >I am trying to parallelize the function npnewpar given below. When I am
> >comparing an application of "apply" with "parApply" the parallelized
> >version seems to be much slower (cf output below). Therefore I would
> >like to ask how the function could be parallelized more efficient.
> >(With increasing sample size the difference becomes smaller, but I was
> >wondering about this big differences and how it could be improved.)
> >
> >Thank you very much for help in advance!
> >
> >Best,
> >
> >Martin
> >
> >
> >library(microbenchmark)
> >library(doParallel)
> >
> >n <- 500
> >y <- rnorm(n)
> >Xc <- rnorm(n)
> >Xd <- sample(c(0,1), replace=TRUE)
> >Weights <- diag(n)
> >n1 <- 50
> >Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
> >
> >
> >detectCores()
> >cl <- makeCluster(4)
> >registerDoParallel(cl)
> >microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
> >Weights=Weights, h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
> >Xd = Xd, Weights=Weights, h=0.5), times=100)
> >stopCluster(cl)
> >
> >
> >Unit: milliseconds
> >   expr   minlq  meanmedian
> >apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
> >   h = 0.5)  4.674914  4.726463  5.455323  4.771016
> >parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
> >Weights,  h = 0.5) 34.168250 35.434829 56.553296 39.438899
> >uq   max neval
> >  4.843324  57.01519   100
> > 49.777265 347.77887   100
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
> >  xc <- xeval[1]
> >  xd <- xeval[2]
> >  l <- function(x,X) {
> >w <-  Weights[x,X]
> >return(w)
> >  }
> >  u <- (Xc-xc)/h
> >  #K <- kernel(u)
> >  K <- dnorm(u)
> >  L <- l(xd,Xd)
> >  nom <- sum(y*K*L)
> >  denom <- sum(K*L)
> >  ghat <- nom/denom
> >  return(ghat)
> >}
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[altern

Re: [R] R parallel - slow speed

2015-07-30 Thread Jeff Newmiller
Parallelizing comes at a price... and there is no guarantee that you can afford 
it. Vectorizing your algorithms is often a better approach. Microbenchmarking  
is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make 
each core look like two.

The operating system can make a difference also... Windows processes are more 
expensive to start and communicate between than *nix processes are. In 
particular, Windows seems to require duplicated RAM pages while *nix can share 
process RAM (at least until they are written to) so you end up needing more 
memory and disk paging of virtual memory becomes more likely.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler  wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
>   expr   minlq  meanmedian
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,  
>   h = 0.5)  4.674914  4.726463  5.455323  4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights,  h = 0.5) 34.168250 35.434829 56.553296 39.438899
>uq   max neval
>  4.843324  57.01519   100
> 49.777265 347.77887   100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
>  xc <- xeval[1]
>  xd <- xeval[2]
>  l <- function(x,X) {
>w <-  Weights[x,X]
>return(w)
>  }
>  u <- (Xc-xc)/h
>  #K <- kernel(u)
>  K <- dnorm(u)
>  L <- l(xd,Xd)
>  nom <- sum(y*K*L)
>  denom <- sum(K*L)
>  ghat <- nom/denom
>  return(ghat)
>}
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R parallel - slow speed

2015-07-30 Thread Martin Spindler
Dear all,

I am trying to parallelize the function npnewpar given below. When I am 
comparing an application of "apply" with "parApply" the parallelized version 
seems to be much slower (cf output below). Therefore I would like to ask how 
the function could be parallelized more efficient. (With increasing sample size 
the difference becomes smaller, but I was wondering about this big differences 
and how it could be improved.)

Thank you very much for help in advance!

Best,

Martin


library(microbenchmark)
library(doParallel)

n <- 500
y <- rnorm(n)
Xc <- rnorm(n)
Xd <- sample(c(0,1), replace=TRUE)
Weights <- diag(n)
n1 <- 50
Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))


detectCores()
cl <- makeCluster(4)
registerDoParallel(cl)
microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, Weights=Weights, 
h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, Weights=Weights, 
h=0.5), times=100)
stopCluster(cl)


Unit: milliseconds

   expr   minlq  meanmedian
apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,   
   h = 0.5)  4.674914  4.726463  5.455323  4.771016
 parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,   
   h = 0.5) 34.168250 35.434829 56.553296 39.438899
uq   max neval
  4.843324  57.01519   100
 49.777265 347.77887   100














npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
  xc <- xeval[1]
  xd <- xeval[2]
  l <- function(x,X) {
w <-  Weights[x,X]
return(w)
  }
  u <- (Xc-xc)/h
  #K <- kernel(u)
  K <- dnorm(u)
  L <- l(xd,Xd)
  nom <- sum(y*K*L)
  denom <- sum(K*L)
  ghat <- nom/denom
  return(ghat)
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.