[Rd] An example of very slow computation

2011-08-17 Thread John C Nash
This message is about a curious difference in timing between two ways of 
computing the
same function. One uses expm, so is expected to be a bit slower, but a bit 
turned out to
be a factor of 1000. The code is below. We would be grateful if anyone can 
point out any
egregious bad practice in our code, or enlighten us on why one approach is so 
much slower
than the other. The problem arose in an activity to develop guidelines for 
nonlinear
modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), and 
we will be
trying to include suggestions of how to prepare problems like this for 
efficient and
effective solution. The code for nlogL was the original from the worker who 
supplied the
problem.

Best,

John Nash

--

cat(mineral-timing.R == benchmark MIN functions in R\n)
#  J C Nash July 31, 2011

require(microbenchmark)
require(numDeriv)
library(Matrix)
library(optimx)
# dat-read.table('min.dat', skip=3, header=FALSE)
# t-dat[,1]
t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)

# y-dat[,2] # ?? tidy up
 y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
12.699,
12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 14.351,
14.458, 14.756, 15.262, 15.703, 15.703, 15.703)


ones-rep(1,length(t))
theta-c(-2,-2,-2,-2)


nlogL-function(theta){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(t)100-sum(expm(A*t)%*%x0)
  pred-sapply(dat[,1],sol)
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
}

getpred-function(theta, t){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(tt)100-sum(expm(A*tt)%*%x0)
  pred-sapply(t,sol)
}

Mpred - function(theta) {
# WARNING: assumes t global
kvec-exp(theta[1:3])
k1-kvec[1]
k2-kvec[2]
k3-kvec[3]
#   MIN problem terbuthylazene disappearance
z-k1+k2+k3
y-z*z-4*k1*k3
l1-0.5*(-z+sqrt(y))
l2-0.5*(-z-sqrt(y))
val-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
} # val should be a vector if t is a vector

negll - function(theta){
# non expm version JN 110731
  pred-Mpred(theta)
  sigma-exp(theta[4])
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
}

theta-rep(-2,4)
fand-nlogL(theta)
fsim-negll(theta)
cat(Check fn vals: expm =,fand,   simple=,fsim,  diff=,fand-fsim,\n)

cat(time the function in expm form\n)
tnlogL-microbenchmark(nlogL(theta), times=100L)
tnlogL

cat(time the function in simpler form\n)
tnegll-microbenchmark(negll(theta), times=100L)
tnegll

ftimes-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
# ftimes


boxplot(log(ftimes))
title(Log times in nanoseconds for matrix exponential and simple MIN fn)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux

Hi,

in R 2.12.1, R CMD check hangs when building a vignette that uses a 
foreach loop with the doMC parallel backend.

This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
All versions of multicore, doMC and foreach are the same on both my R 
installations.


Has anybody encountered a similar issue?

Thank you.
Renaud




###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann
I think one difference is that negll() is fully vectorized - no loops, whereas
nlogL calls the function sol() inside sapply, i.e. a loop.

Michael


On 17 Aug 2011, at 10:27AM, John C Nash wrote:

 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a bit 
 turned out to
 be a factor of 1000. The code is below. We would be grateful if anyone can 
 point out any
 egregious bad practice in our code, or enlighten us on why one approach is so 
 much slower
 than the other. The problem arose in an activity to develop guidelines for 
 nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker who 
 supplied the
 problem.
 
 Best,
 
 John Nash
 
 --
 
 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011
 
 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
 
 # y-dat[,2] # ?? tidy up
 y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
 
 
 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)
 
 
 nlogL-function(theta){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(t)100-sum(expm(A*t)%*%x0)
  pred-sapply(dat[,1],sol)
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }
 
 getpred-function(theta, t){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(tt)100-sum(expm(A*tt)%*%x0)
  pred-sapply(t,sol)
 }
 
 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem terbuthylazene disappearance
z-k1+k2+k3
y-z*z-4*k1*k3
l1-0.5*(-z+sqrt(y))
l2-0.5*(-z-sqrt(y))
val-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
 } # val should be a vector if t is a vector
 
 negll - function(theta){
 # non expm version JN 110731
  pred-Mpred(theta)
  sigma-exp(theta[4])
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }
 
 theta-rep(-2,4)
 fand-nlogL(theta)
 fsim-negll(theta)
 cat(Check fn vals: expm =,fand,   simple=,fsim,  diff=,fand-fsim,\n)
 
 cat(time the function in expm form\n)
 tnlogL-microbenchmark(nlogL(theta), times=100L)
 tnlogL
 
 cat(time the function in simpler form\n)
 tnegll-microbenchmark(negll(theta), times=100L)
 tnegll
 
 ftimes-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
 # ftimes
 
 
 boxplot(log(ftimes))
 title(Log times in nanoseconds for matrix exponential and simple MIN fn)
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Jonathan Malmaud
Hi,
My R package has files in the 'inst' directory that it needs to reference. How 
can the R scripts in my package find out the full path to the 'inst' directory 
after the package is installed, given that different users may have installed 
the package to different libraries? 

Thanks,
Jon Malmaud
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux
I did notice some strange behaviour once, but things were working 
without a problem for a while now.
foreach loops and concepts are nice concepts, which I'd like to carry on 
using.
Maybe somebody from the foreach-dopar packages could explain why they 
have such issues?

Tim do you have concrete examples of the issues you talked with other 
developers?

Thank you.

Renaud

On 17/08/2011 12:45, Tim Triche, Jr. wrote:
 yes -- doMC (and doSMP) are kind of bogus and I just use 
 multicore::mclapply() to good effect these days.  I checked around 
 with other people doing similar things at the Bioconductor developer 
 day and pretty much everyone confirmed that doWHATEVER seems to have 
 issues, where as multicore itself... doesn't.

 Just my $0.02,

 --t


 On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux 
 ren...@mancala.cbio.uct.ac.za mailto:ren...@mancala.cbio.uct.ac.za 
 wrote:

 Hi,

 in R 2.12.1, R CMD check hangs when building a vignette that uses
 a foreach loop with the doMC parallel backend.
 This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
 All versions of multicore, doMC and foreach are the same on both
 my R installations.

 Has anybody encountered a similar issue?

 Thank you.
 Renaud




 ###
 UNIVERSITY OF CAPE TOWN
 This e-mail is subject to the UCT ICT policies and
 e-mai...{{dropped:5}}

 __
 R-devel@r-project.org mailto:R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 -- 


 If people do not believe that mathematics is simple, it is
 only because they do not realize how complicated life is.

 John von Neumann 
 http://www-groups.dcs.st-and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:9}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux
hopefully you'll be able to create a reproducible example, as my hanging 
issues seem to come and go without any obvious reason.


On 17/08/2011 13:50, Tim Triche, Jr. wrote:
 I'll see if I can put together a self-contained example.  Primarily, 
 the times that I use multicore (and attempted to use doSMP, mostly 
 because one of my users refuses to ditch Windows) are when I am 
 reading a ton of binary files, none of which depend on the others. 
  This is a blindingly obvious use-case for e.g. doMC and doSMP, yet 
 what typically happens is that the entire operation wedges.  I'm told 
 that doSMP really only works well with Revolution R, but per above, I 
 will try to put together a working self-contained example to show how.


 On Wed, Aug 17, 2011 at 4:39 AM, Renaud Gaujoux ren...@cbio.uct.ac.za 
 mailto:ren...@cbio.uct.ac.za wrote:

 I did notice some strange behaviour once, but things were working
 without a problem for a while now.
 foreach loops and concepts are nice concepts, which I'd like to
 carry on using.
 Maybe somebody from the foreach-dopar packages could explain why
 they have such issues?

 Tim do you have concrete examples of the issues you talked with
 other developers?

 Thank you.

 Renaud

 On 17/08/2011 12:45, Tim Triche, Jr. wrote:
 yes -- doMC (and doSMP) are kind of bogus and I just use
 multicore::mclapply() to good effect these days.  I checked
 around with other people doing similar things at the Bioconductor
 developer day and pretty much everyone confirmed that doWHATEVER
 seems to have issues, where as multicore itself... doesn't.

 Just my $0.02,

 --t


 On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux
 ren...@mancala.cbio.uct.ac.za
 mailto:ren...@mancala.cbio.uct.ac.za wrote:

 Hi,

 in R 2.12.1, R CMD check hangs when building a vignette that
 uses a foreach loop with the doMC parallel backend.
 This does not happen in R 2.13.1, nor if I use doSEQ instead
 of doMC.
 All versions of multicore, doMC and foreach are the same on
 both my R installations.

 Has anybody encountered a similar issue?

 Thank you.
 Renaud




 ###
 UNIVERSITY OF CAPE TOWN
 This e-mail is subject to the UCT ICT policies and
 e-mai...{{dropped:5}}

 __
 R-devel@r-project.org mailto:R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 -- 


 If people do not believe that mathematics is simple, it
 is only because they do not realize how complicated life is.

 John von Neumann
 
 http://www-groups.dcs.st-and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html


 ###

 UNIVERSITY OF CAPE TOWN

 This e-mail is subject to the UCT ICT policies and e-mail
 disclaimer published on our website at
 http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable
 from +27 21 650 9111 tel:%2B27%2021%20650%209111. This e-mail is
 intended only for the person(s) to whom it is addressed. If the
 e-mail has reached you in error, please notify the author. If you
 are not the intended recipient of the e-mail you may not use,
 disclose, copy, redirect or print the content. If this e-mail is
 not related to the business of UCT it is sent by the sender in the
 sender's individual capacity.

 ###




 -- 
 When you emerge in a few years, you can ask someone what you missed, 
 and you'll find it can be summed up in a few minutes.

 Derek Sivers http://sivers.org/berklee


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:9}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Tim Triche, Jr.
I'll see if I can put together a self-contained example.  Primarily, the
times that I use multicore (and attempted to use doSMP, mostly because one
of my users refuses to ditch Windows) are when I am reading a ton of binary
files, none of which depend on the others.  This is a blindingly obvious
use-case for e.g. doMC and doSMP, yet what typically happens is that the
entire operation wedges.  I'm told that doSMP really only works well with
Revolution R, but per above, I will try to put together a working
self-contained example to show how.


On Wed, Aug 17, 2011 at 4:39 AM, Renaud Gaujoux ren...@cbio.uct.ac.zawrote:

 **
 I did notice some strange behaviour once, but things were working without a
 problem for a while now.
 foreach loops and concepts are nice concepts, which I'd like to carry on
 using.
 Maybe somebody from the foreach-dopar packages could explain why they have
 such issues?

 Tim do you have concrete examples of the issues you talked with other
 developers?

 Thank you.

 Renaud

 On 17/08/2011 12:45, Tim Triche, Jr. wrote:

 yes -- doMC (and doSMP) are kind of bogus and I just use
 multicore::mclapply() to good effect these days.  I checked around with
 other people doing similar things at the Bioconductor developer day and
 pretty much everyone confirmed that doWHATEVER seems to have issues, where
 as multicore itself... doesn't.

  Just my $0.02,

  --t


 On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux 
 ren...@mancala.cbio.uct.ac.za wrote:

 Hi,

 in R 2.12.1, R CMD check hangs when building a vignette that uses a
 foreach loop with the doMC parallel backend.
 This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
 All versions of multicore, doMC and foreach are the same on both my R
 installations.

 Has anybody encountered a similar issue?

 Thank you.
 Renaud




 ###
 UNIVERSITY OF CAPE TOWN
 This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




  --
  If people do not believe that mathematics is simple, it is only because
 they do not realize how complicated life is.
 John von 
 Neumannhttp://www-groups.dcs.st-and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html


  ###

 UNIVERSITY OF CAPE TOWN

  This e-mail is subject to the UCT ICT policies and e-mail disclaimer
 published on our website at
 http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
 21 650 9111. This e-mail is intended only for the person(s) to whom it is
 addressed. If the e-mail has reached you in error, please notify the author.
 If you are not the intended recipient of the e-mail you may not use,
 disclose, copy, redirect or print the content. If this e-mail is not related
 to the business of UCT it is sent by the sender in the sender's individual
 capacity.

  ###




-- 
When you emerge in a few years, you can ask someone what you missed, and
you'll find it can be summed up in a few minutes.

Derek Sivers http://sivers.org/berklee

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Kasper Daniel Hansen
On Tue, Aug 16, 2011 at 11:17 PM, Jonathan Malmaud malm...@gmail.com wrote:
 Hi,
 My R package has files in the 'inst' directory that it needs to reference. 
 How can the R scripts in my package find out the full path to the 'inst' 
 directory after the package is installed, given that different users may have 
 installed the package to different libraries?


The inst directory does not exists after installation (this is
described in R-exts).  Use something like
   system.file(extdata, package = MyPackage)
to locate a directory etc.

Kasper




 Thanks,
 Jon Malmaud
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Marc Schwartz
On Aug 16, 2011, at 10:17 PM, Jonathan Malmaud wrote:

 Hi,
 My R package has files in the 'inst' directory that it needs to reference. 
 How can the R scripts in my package find out the full path to the 'inst' 
 directory after the package is installed, given that different users may have 
 installed the package to different libraries? 
 
 Thanks,
 Jon Malmaud


See ?path.package and ?file.path

Example:

 require(WriteXLS)
Loading required package: WriteXLS

# I am on OSX
 path.package(WriteXLS)
[1] /Library/Frameworks/R.framework/Versions/2.13/Resources/library/WriteXLS

I have Perl scripts in my package, which are in the /inst/Perl folder in the 
package source, so:

 file.path(path.package(WriteXLS), Perl/WriteXLS.pl)
[1] 
/Library/Frameworks/R.framework/Versions/2.13/Resources/library/WriteXLS/Perl/WriteXLS.pl


HTH,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Dirk Eddelbuettel

On 16 August 2011 at 23:17, Jonathan Malmaud wrote:
| Hi,
| My R package has files in the 'inst' directory that it needs to reference. 
How can the R scripts in my package find out the full path to the 'inst' 
directory after the package is installed, given that different users may have 
installed the package to different libraries? 

It is slightly different:  files and directories below the inst/ directory in
the _sources_ will be installed in the toplevel diretory of the _installed 
package_.

You can then use system.file() to get the location at run-time for the
installed package. E.g. to get the example file 'fib.r' from the Fibonacci
directory within the examples of Rcpp of my installed version:

R system.file(package=Rcpp, examples, Fibonacci, fib.r)
[1] /usr/local/lib/R/site-library/Rcpp/examples/Fibonacci/fib.r

The result of that system.file() call could now be fed to source() etc. 

system.file() can be used for other files within the package too. 'Writing R
Extensions' details what other files are installed by default -- but as
stated, everything below inst/ gets copied as is, without the layer of inst/
itself.

Hope this helps,  Dirk

-- 
Two new Rcpp master classes for R and C++ integration scheduled for 
New York (Sep 24) and San Francisco (Oct 8), more details are at
http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Brian G. Peterson
On Wed, 2011-08-17 at 04:50 -0700, Tim Triche, Jr. wrote:
 I'll see if I can put together a self-contained example.  Primarily,
 the times that I use multicore (and attempted to use doSMP, mostly
 because one of my users refuses to ditch Windows) are when I am
 reading a ton of binary files, none of which depend on the others.
 This is a blindingly obvious use-case for e.g. doMC and doSMP, yet
 what typically happens is that the entire operation wedges.  I'm told
 that doSMP really only works well with Revolution R, but per above, I
 will try to put together a working self-contained example to show
 how. 

Remember that physical I/O can bind up the processes too.  Having a
bunch of processes all trying to read from local disk at the same time
(especially when they are all trying to read the same file(s), a problem
it seems you may not have) is a recipe for I/O locks that can seize up
your processes.

So, if the problem only occurs with physical I/O, the first thing I
would try is to move that storage to a storage device on another machine
that is tuned for that level of disk I/O.

Regards,

   - Brian

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux

Uhm... maybe this is the issue.
The issue seems to specially occur when I am building the vignette, 
which performs some parallel computations on a reduced example, 
therefore faster than in a normal usage of the package.
The two processes (on my dual core) output some logging information 
using cat, which are normally sent to the console, but I guess that in 
the case of a vignette these are written to tex file. It is very few 
data though (a loop counter), so writing should be also very quick, even 
in a file.


Could it be possible that a I/O deadlock happens because of this output?
I actually use mutexes, at the end of each loop to perform bigger 
writing operation on a common file, but I hadn't think these would be 
required for the logging output,  assuming that stdout and stderr were 
thread safe. Maybe I should use mutexes at this level too.
Does multicore or doMC provide optional separate logging as doMPI does? 
(I guess this might be better posted to R-hpc)


Thank you.
Renaud



On 17/08/2011 14:56, Brian G. Peterson wrote:

On Wed, 2011-08-17 at 04:50 -0700, Tim Triche, Jr. wrote:

I'll see if I can put together a self-contained example.  Primarily,
the times that I use multicore (and attempted to use doSMP, mostly
because one of my users refuses to ditch Windows) are when I am
reading a ton of binary files, none of which depend on the others.
This is a blindingly obvious use-case for e.g. doMC and doSMP, yet
what typically happens is that the entire operation wedges.  I'm told
that doSMP really only works well with Revolution R, but per above, I
will try to put together a working self-contained example to show
how.

Remember that physical I/O can bind up the processes too.  Having a
bunch of processes all trying to read from local disk at the same time
(especially when they are all trying to read the same file(s), a problem
it seems you may not have) is a recipe for I/O locks that can seize up
your processes.

So, if the problem only occurs with physical I/O, the first thing I
would try is to move that storage to a storage device on another machine
that is tuned for that level of disk I/O.

Regards,

- Brian





###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Mehmet Suzen
You can use
system.file(package=your_package_name)
which will return you library directory.


-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Jonathan Malmaud
Sent: 17 August 2011 04:18
To: r-devel@r-project.org
Subject: [Rd] Referencing 'inst' directory in installed package

Hi,
My R package has files in the 'inst' directory that it needs to
reference. How can the R scripts in my package find out the full path to
the 'inst' directory after the package is installed, given that
different users may have installed the package to different libraries? 

Thanks,
Jon Malmaud
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
LEGAL NOTICE
This message is intended for the use o...{{dropped:10}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] getNativeSymbolInfo(user_unif_rand) returns different results on windows and linux

2011-08-17 Thread Renaud Gaujoux

Hi,

When loading a package that provides the user-supplied RNG hook 
user_unif_rand, calling getNativeSymbolInfo(user_unif_rand) returns 
informations about the loaded symbol.

I am using this to identify which package currently provides the RNG hook.
The results are the same on windows and linux if only one library 
provides the hook.


If one loads a second package that provides this hook, then on linux 
(Ubuntu 10.10), calling again getNativeSymbolInfo(user_unif_rand) 
returns the latest symbol information (which I presume is the correct 
result).
On windows (XP and Vista) however the symbol information does not change 
(same pointer address) and the package data is NULL.
See results for both systems below. I tested this with R 2.12.1, 2.13.0 
and 2.13.1 on Windows.


Is this a normal behaviour for dll loaded on Windows?
Thank you.

Renaud


#
# LINUX
#
 library(rlecuyer)
 getNativeSymbolInfo(user_unif_rand)
$name
[1] user_unif_rand

$address
pointer: 0x7ff55acffed0
attr(,class)
[1] NativeSymbol

$package
DLL name: rlecuyer
Filename: 
/home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rlecuyer/libs/rlecuyer.so

Dynamic lookup: TRUE

attr(,class)
[1] NativeSymbolInfo
 library(rstream)
 getNativeSymbolInfo(user_unif_rand)
$name
[1] user_unif_rand

$address
pointer: 0x7ff55aaf58c0
attr(,class)
[1] NativeSymbol

$package
DLL name: rstream
Filename: 
/home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rstream/libs/rstream.so

Dynamic lookup: TRUE

attr(,class)
[1] NativeSymbolInfo

#
# WINDOWS
#
 library(rlecuyer)
 getNativeSymbolInfo(user_unif_rand)
$name
[1] user_unif_rand

$address
pointer: 0x6bb84fb8
attr(,class)
[1] NativeSymbol

$package
DLL name: rlecuyer
Filename: C:/Program
Files/R/R-2.13.1/library/rlecuyer/libs/i386/rlecuyer.dll
Dynamic lookup: TRUE

attr(,class)
[1] NativeSymbolInfo
 library(rstream)
 getNativeSymbolInfo(user_unif_rand)
$name
[1] user_unif_rand

$address
pointer: 0x6bb84fb8
attr(,class)
[1] NativeSymbol

$package
NULL

attr(,class)
[1] NativeSymbolInfo








###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread cberry
John C Nash nas...@uottawa.ca writes:

 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a bit 
 turned out to
 be a factor of 1000. The code is below. We would be grateful if anyone can 
 point out any
 egregious bad practice in our code, or enlighten us on why one approach is so 
 much slower
 than the other. 

Looks like A*t in expm(A*t) is a matrix.

'getMethod(expm,matrix)' coerces it arg to a Matrix, then calls
expm(), whose method coerces its arg to a dMatrix and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
dgeMatrix ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




 The problem arose in an activity to develop guidelines for nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker who 
 supplied the
 problem.

 Best,

 John Nash

 --

 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011

 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)

 # y-dat[,2] # ?? tidy up
  y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)


 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)


 nlogL-function(theta){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(t)100-sum(expm(A*t)%*%x0)
   pred-sapply(dat[,1],sol)
   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }

 getpred-function(theta, t){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(tt)100-sum(expm(A*tt)%*%x0)
   pred-sapply(t,sol)
 }

 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem terbuthylazene disappearance
 z-k1+k2+k3
 y-z*z-4*k1*k3
 l1-0.5*(-z+sqrt(y))
 l2-0.5*(-z-sqrt(y))
 val-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
 } # val should be a vector if t is a vector

 negll - function(theta){
 # non expm version JN 110731
   pred-Mpred(theta)
   sigma-exp(theta[4])
   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }

 theta-rep(-2,4)
 fand-nlogL(theta)
 fsim-negll(theta)
 cat(Check fn vals: expm =,fand,   simple=,fsim,  diff=,fand-fsim,\n)

 cat(time the function in expm form\n)
 tnlogL-microbenchmark(nlogL(theta), times=100L)
 tnlogL

 cat(time the function in simpler form\n)
 tnegll-microbenchmark(negll(theta), times=100L)
 tnegll

 ftimes-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
 # ftimes


 boxplot(log(ftimes))
 title(Log times in nanoseconds for matrix exponential and simple MIN fn)


-- 
Charles C. Berry cbe...@tajo.ucsd.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Ravi Varadhan
Yes, the culprit is the evaluation of expm(A*t).  This is a lazy way of solving 
the system of ODEs, where you save analytic effort, but you pay for it dearly 
in terms of computational effort!

Even in this lazy approach, I am sure there ought to be faster ways to 
evaluating exponent of a matrix than that in Matrix package.

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of cbe...@tajo.ucsd.edu
Sent: Wednesday, August 17, 2011 1:08 PM
To: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] An example of very slow computation

John C Nash nas...@uottawa.ca writes:

 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a bit 
 turned out to
 be a factor of 1000. The code is below. We would be grateful if anyone can 
 point out any
 egregious bad practice in our code, or enlighten us on why one approach is so 
 much slower
 than the other. 

Looks like A*t in expm(A*t) is a matrix.

'getMethod(expm,matrix)' coerces it arg to a Matrix, then calls
expm(), whose method coerces its arg to a dMatrix and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
dgeMatrix ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




 The problem arose in an activity to develop guidelines for nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker who 
 supplied the
 problem.

 Best,

 John Nash

 --

 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011

 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)

 # y-dat[,2] # ?? tidy up
  y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)


 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)


 nlogL-function(theta){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(t)100-sum(expm(A*t)%*%x0)
   pred-sapply(dat[,1],sol)
   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }

 getpred-function(theta, t){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(tt)100-sum(expm(A*tt)%*%x0)
   pred-sapply(t,sol)
 }

 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem terbuthylazene disappearance
 z-k1+k2+k3
 y-z*z-4*k1*k3
 l1-0.5*(-z+sqrt(y))
 l2-0.5*(-z-sqrt(y))
 val-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
 } # val should be a vector if t is a vector

 negll - function(theta){
 # non expm version JN 110731
   pred-Mpred(theta)
   sigma-exp(theta[4])
   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }

 theta-rep(-2,4)
 fand-nlogL(theta)
 fsim-negll(theta)
 cat(Check fn vals: expm =,fand,   simple=,fsim,  diff=,fand-fsim,\n)

 cat(time the function in expm form\n)
 tnlogL-microbenchmark(nlogL(theta), times=100L)
 tnlogL

 cat(time the function in simpler form\n)
 tnegll-microbenchmark(negll(theta), times=100L)
 tnegll

 ftimes-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
 # ftimes


 boxplot(log(ftimes))
 title(Log times in nanoseconds for matrix exponential and simple MIN fn)


-- 
Charles C. Berry cbe...@tajo.ucsd.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getNativeSymbolInfo(user_unif_rand) returns different results on windows and linux

2011-08-17 Thread Duncan Temple Lang

Hi Renaud

  I cannot presently step through the code on Windows to verify the cause of 
the problem,
but looking at the code, I would _presume_ the reason is that symbol lookup is 
cached on
Windows but not on Linux or OS X (at least by default).
Thus when we perform the second search for user_unif_rand, we find it in the 
cache
rather than searching all the DLLs.

This happens because we did not specify a value for the PACKAGE parameter.

When we load a new DLL, the cache should probably be cleared or we
intelligently update it as necessary for the new DLLs on demand, i.e.
for a new search for a symbol we look in the new DLLs and then use the cache.
(This involves some book-keeping that could become convoluted.)

If we had specified a value for PACKAGE, e.g.

  getNativeSymbolInfo(user_unif_rand, rstream)

we would get the version in that package's DLL.

So this gives a workaround that you can use with just R code

  findNativeSymbolInfo =
  function(sym,  dlls = rev(getLoadedDLLs()))  {

  for(d in dlls) {
 z = getNativeSymbolInfo(sym, d)
 if(!is.null(z))
  return(z)
  }

NULL
  }

We'll think about whether to change the behaviour on Windows and how to do it
without affecting performance excessively.

 Best,
   D.


On 8/17/11 9:12 AM, Renaud Gaujoux wrote:
 Hi,
 
 When loading a package that provides the user-supplied RNG hook 
 user_unif_rand, calling
 getNativeSymbolInfo(user_unif_rand) returns informations about the loaded 
 symbol.
 I am using this to identify which package currently provides the RNG hook.
 The results are the same on windows and linux if only one library provides 
 the hook.
 
 If one loads a second package that provides this hook, then on linux (Ubuntu 
 10.10), calling again
 getNativeSymbolInfo(user_unif_rand) returns the latest symbol information 
 (which I presume is the correct result).
 On windows (XP and Vista) however the symbol information does not change 
 (same pointer address) and the package data is
 NULL.
 See results for both systems below. I tested this with R 2.12.1, 2.13.0 and 
 2.13.1 on Windows.
 
 Is this a normal behaviour for dll loaded on Windows?
 Thank you.
 
 Renaud
 
 
 #
 # LINUX
 #
 library(rlecuyer)
 getNativeSymbolInfo(user_unif_rand)
 $name
 [1] user_unif_rand
 
 $address
 pointer: 0x7ff55acffed0
 attr(,class)
 [1] NativeSymbol
 
 $package
 DLL name: rlecuyer
 Filename: 
 /home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rlecuyer/libs/rlecuyer.so
 Dynamic lookup: TRUE
 
 attr(,class)
 [1] NativeSymbolInfo
 library(rstream)
 getNativeSymbolInfo(user_unif_rand)
 $name
 [1] user_unif_rand
 
 $address
 pointer: 0x7ff55aaf58c0
 attr(,class)
 [1] NativeSymbol
 
 $package
 DLL name: rstream
 Filename: 
 /home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rstream/libs/rstream.so
 Dynamic lookup: TRUE
 
 attr(,class)
 [1] NativeSymbolInfo
 
 #
 # WINDOWS
 #
 library(rlecuyer)
 getNativeSymbolInfo(user_unif_rand)
 $name
 [1] user_unif_rand
 
 $address
 pointer: 0x6bb84fb8
 attr(,class)
 [1] NativeSymbol
 
 $package
 DLL name: rlecuyer
 Filename: C:/Program
 Files/R/R-2.13.1/library/rlecuyer/libs/i386/rlecuyer.dll
 Dynamic lookup: TRUE
 
 attr(,class)
 [1] NativeSymbolInfo
 library(rstream)
 getNativeSymbolInfo(user_unif_rand)
 $name
 [1] user_unif_rand
 
 $address
 pointer: 0x6bb84fb8
 attr(,class)
 [1] NativeSymbol
 
 $package
 NULL
 
 attr(,class)
 [1] NativeSymbolInfo
 
 
 
 
 
 
 
 
 ###
 UNIVERSITY OF CAPE TOWN
 This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Ravi Varadhan
A principled way to solve this system of ODEs is to use the idea of 
fundamental matrix, which is the same idea as that of diagonalization of a 
matrix (see Boyce and DiPrima or any ODE text).

Here is the code for that:


nlogL2 - function(theta){
  k - exp(theta[1:3])
  sigma - exp(theta[4])
  A - rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
eA - eigen(A)
T - eA$vectors
r - eA$values
x0 - c(0,100)
Tx0 - T %*% x0

sol - function(t) 100 - sum(T %*% diag(exp(r*t)) %*% Tx0)
pred - sapply(dat[,1], sol)
-sum(dnorm(dat[,2], mean=pred, sd=sigma, log=TRUE)) 
}
This is much faster than using expm(A*t), but much slower than by hand 
calculations since we have to repeatedly do the diagonalization.  An obvious 
advantage of this fuunction is that it applies to *any* linear system of ODEs 
for which the eigenvalues are real (and negative).

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Ravi Varadhan
Sent: Wednesday, August 17, 2011 2:33 PM
To: 'cbe...@tajo.ucsd.edu'; r-de...@stat.math.ethz.ch; 'nas...@uottawa.ca'
Subject: Re: [Rd] An example of very slow computation

Yes, the culprit is the evaluation of expm(A*t).  This is a lazy way of solving 
the system of ODEs, where you save analytic effort, but you pay for it dearly 
in terms of computational effort!

Even in this lazy approach, I am sure there ought to be faster ways to 
evaluating exponent of a matrix than that in Matrix package.

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of cbe...@tajo.ucsd.edu
Sent: Wednesday, August 17, 2011 1:08 PM
To: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] An example of very slow computation

John C Nash nas...@uottawa.ca writes:

 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a bit 
 turned out to
 be a factor of 1000. The code is below. We would be grateful if anyone can 
 point out any
 egregious bad practice in our code, or enlighten us on why one approach is so 
 much slower
 than the other. 

Looks like A*t in expm(A*t) is a matrix.

'getMethod(expm,matrix)' coerces it arg to a Matrix, then calls
expm(), whose method coerces its arg to a dMatrix and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
dgeMatrix ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




 The problem arose in an activity to develop guidelines for nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker who 
 supplied the
 problem.

 Best,

 John Nash

 --

 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011

 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)

 # y-dat[,2] # ?? tidy up
  y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)


 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)


 nlogL-function(theta){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(t)100-sum(expm(A*t)%*%x0)
   pred-sapply(dat[,1],sol)
   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }

 getpred-function(theta, t){
   k-exp(theta[1:3])
   sigma-exp(theta[4])
   A-rbind(
   c(-k[1], k[2]),
   c( k[1], -(k[2]+k[3]))
   )
   x0-c(0,100)
   sol-function(tt)100-sum(expm(A*tt)%*%x0)
   pred-sapply(t,sol)
 }

 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem 

Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann

On 17 Aug 2011, at 7:08PM, cbe...@tajo.ucsd.edu cbe...@tajo.ucsd.edu wrote:

 John C Nash nas...@uottawa.ca writes:
 
 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a bit 
 turned out to
 be a factor of 1000. 
 
 Looks like A*t in expm(A*t) is a matrix.
 
 'getMethod(expm,matrix)' coerces it arg to a Matrix, then calls
 expm(), whose method coerces its arg to a dMatrix and calls expm(),
 whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
 method finally calls '.Call(dgeMatrix_exp, x)' 
 
 Whew!
 
 The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
 dgeMatrix ))' is a factor of 10 on my box. 
 

You are right! 

I was testing running nlogL below 100 times. expm() is then called 2500 times.
The total running time on my machine was 13 seconds.

If you replace in nlogL the part:
--
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 x0-c(0,100)
 sol-function(t)100-sum(expm(A*t)%*%x0)
--
with:
--
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 A-as(A,dgeMatrix)  # --- this is the difference
  sol-function(t)100-sum(expm(A*t)%*%x0)
--

this time drops to 1.5 seconds (!).

At that point, expm() takes up much less time than, for example, calculating 
A*t in sol(), and the sum() - I think because conversions have to be done.

Thus, if m is a 2x2 dgeMatrix, then 
 system.time({for(i in 1:2500) m*3.2})
   user  system elapsed 
  0.425   0.004   0.579 
(i.e. 1/3 of the total time for nlogL() above)

whereas if mm=as.matrix(m), then
 system.time({for(i in 1:2500) mm*3.2})
   user  system elapsed 
  0.004   0.000   0.005 

(!!)

and, similarly:
--
 system.time({for(i in 1:2500) sum(m)})
   user  system elapsed 
  0.399   0.002   0.494 
 system.time({for(i in 1:2500) sum(mm)})
   user  system elapsed 
  0.002   0.000   0.028 
--
whereas
 system.time({for(i in 1:2500) expm(m)})
   user  system elapsed 
  0.106   0.001   0.118 

to sum it up, of 13 seconds, 11.5 were spent on conversions to dgeMatrix
0.5 are spent on multiplying a dgeMatrix by a double
0.5 are spent on summing a dgeMatrix
and 0.1 are actually spent in expm.

Michael

P.S. You could have used Rprof() to see these times, only that interpreting 
summaryRprof() is a bit hard. (Is there something that does summaryRprof(), but 
also shows the call graph?)





 Dunno 'bout the other factor of 100.
 
 Chuck
 
 
 
 
 The problem arose in an activity to develop guidelines for nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker 
 who supplied the
 problem.
 
 Best,
 
 John Nash
 
 --
 
 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011
 
 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
 
 # y-dat[,2] # ?? tidy up
 y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
 
 
 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)
 
 
 nlogL-function(theta){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(t)100-sum(expm(A*t)%*%x0)
  pred-sapply(dat[,1],sol)
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }
 
 getpred-function(theta, t){
  k-exp(theta[1:3])
  sigma-exp(theta[4])
  A-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0-c(0,100)
  sol-function(tt)100-sum(expm(A*tt)%*%x0)
  pred-sapply(t,sol)
 }
 
 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem terbuthylazene disappearance
z-k1+k2+k3
y-z*z-4*k1*k3
l1-0.5*(-z+sqrt(y))
l2-0.5*(-z-sqrt(y))
val-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
 } # val should be a vector if t is a vector
 
 negll - function(theta){
 # non expm version JN 110731
  pred-Mpred(theta)
  sigma-exp(theta[4])
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }
 
 theta-rep(-2,4)
 fand-nlogL(theta)
 fsim-negll(theta)
 cat(Check fn vals: expm =,fand,   simple=,fsim,  diff=,fand-fsim,\n)
 
 cat(time the function in expm form\n)
 tnlogL-microbenchmark(nlogL(theta), times=100L)
 tnlogL
 
 cat(time the function in simpler form\n)
 tnegll-microbenchmark(negll(theta), times=100L)
 tnegll
 
 

Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann
Just a small addition:

If you replace below
 sol-function(t)100-sum(expm(A*t)%*%x0)
by
sol-function(t){A@x=A@x*t;100-sum(expm(A)@x * x0)}

(ugly! But avoiding the conversions and generics)

The time on my machine drop further down to 0.3 seconds. (from the original 13 
seconds, and then from the 1.5 seconds change mentioned below.

So, overall a ~40 fold improvement, though on my machine, the initial ratio was 
~3200 times slower, so a ~80 fold slowdown is still present.

Michael

On 17 Aug 2011, at 11:27PM, Michael Lachmann wrote:

 
 On 17 Aug 2011, at 7:08PM, cbe...@tajo.ucsd.edu cbe...@tajo.ucsd.edu 
 wrote:
 
 John C Nash nas...@uottawa.ca writes:
 
 This message is about a curious difference in timing between two ways of 
 computing the
 same function. One uses expm, so is expected to be a bit slower, but a 
 bit turned out to
 be a factor of 1000. 
 
 Looks like A*t in expm(A*t) is a matrix.
 
 'getMethod(expm,matrix)' coerces it arg to a Matrix, then calls
 expm(), whose method coerces its arg to a dMatrix and calls expm(),
 whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
 method finally calls '.Call(dgeMatrix_exp, x)' 
 
 Whew!
 
 The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
 dgeMatrix ))' is a factor of 10 on my box. 
 
 
 You are right! 
 
 I was testing running nlogL below 100 times. expm() is then called 2500 times.
 The total running time on my machine was 13 seconds.
 
 If you replace in nlogL the part:
 --
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 x0-c(0,100)
 sol-function(t)100-sum(expm(A*t)%*%x0)
 --
 with:
 --
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 A-as(A,dgeMatrix)  # --- this is the difference
  sol-function(t)100-sum(expm(A*t)%*%x0)
 --
 
 this time drops to 1.5 seconds (!).
 
 At that point, expm() takes up much less time than, for example, calculating 
 A*t in sol(), and the sum() - I think because conversions have to be done.
 
 Thus, if m is a 2x2 dgeMatrix, then 
 system.time({for(i in 1:2500) m*3.2})
   user  system elapsed 
  0.425   0.004   0.579 
 (i.e. 1/3 of the total time for nlogL() above)
 
 whereas if mm=as.matrix(m), then
 system.time({for(i in 1:2500) mm*3.2})
   user  system elapsed 
  0.004   0.000   0.005 
 
 (!!)
 
 and, similarly:
 --
 system.time({for(i in 1:2500) sum(m)})
   user  system elapsed 
  0.399   0.002   0.494 
 system.time({for(i in 1:2500) sum(mm)})
   user  system elapsed 
  0.002   0.000   0.028 
 --
 whereas
 system.time({for(i in 1:2500) expm(m)})
   user  system elapsed 
  0.106   0.001   0.118 
 
 to sum it up, of 13 seconds, 11.5 were spent on conversions to dgeMatrix
 0.5 are spent on multiplying a dgeMatrix by a double
 0.5 are spent on summing a dgeMatrix
 and 0.1 are actually spent in expm.
 
 Michael
 
 P.S. You could have used Rprof() to see these times, only that interpreting 
 summaryRprof() is a bit hard. (Is there something that does summaryRprof(), 
 but also shows the call graph?)
 
 
 
 
 
 Dunno 'bout the other factor of 100.
 
 Chuck
 
 
 
 
 The problem arose in an activity to develop guidelines for nonlinear
 modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
 and we will be
 trying to include suggestions of how to prepare problems like this for 
 efficient and
 effective solution. The code for nlogL was the original from the worker 
 who supplied the
 problem.
 
 Best,
 
 John Nash
 
 --
 
 cat(mineral-timing.R == benchmark MIN functions in R\n)
 #  J C Nash July 31, 2011
 
 require(microbenchmark)
 require(numDeriv)
 library(Matrix)
 library(optimx)
 # dat-read.table('min.dat', skip=3, header=FALSE)
 # t-dat[,1]
 t - c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
 
 # y-dat[,2] # ?? tidy up
 y- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
 12.699,
 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
 14.351,
 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
 
 
 ones-rep(1,length(t))
 theta-c(-2,-2,-2,-2)
 
 
 nlogL-function(theta){
 k-exp(theta[1:3])
 sigma-exp(theta[4])
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 x0-c(0,100)
 sol-function(t)100-sum(expm(A*t)%*%x0)
 pred-sapply(dat[,1],sol)
 -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
 }
 
 getpred-function(theta, t){
 k-exp(theta[1:3])
 sigma-exp(theta[4])
 A-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 x0-c(0,100)
 sol-function(tt)100-sum(expm(A*tt)%*%x0)
 pred-sapply(t,sol)
 }
 
 Mpred - function(theta) {
 # WARNING: assumes t global
 kvec-exp(theta[1:3])
 k1-kvec[1]
 k2-kvec[2]
 k3-kvec[3]
 #   MIN problem terbuthylazene disappearance
   z-k1+k2+k3
   y-z*z-4*k1*k3
   l1-0.5*(-z+sqrt(y))
   l2-0.5*(-z-sqrt(y))