Re: [Rd] Speeding up transpose

2010-08-27 Thread Hervé Pagès

Hi Radford,

On 08/25/2010 07:50 PM, Radford Neal wrote:

I've looked at how to speed up the transpose function in R
(ie, t(X)).

The existing code does the work with loops like the following:

for (i = 0; i<  len; i++)
REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow];

It seems a bit optimistic to expect a compiler to produce good code
from this.  I've re-written these loops as follows:

 for (i = 0, j = 0; i=len) j -= (len-1);
 REAL(r)[i] = REAL(a)[j];
 }


You can even avoid the j>=len test within the loop by using 2 nested
loops:

   dest = REAL(r);
   src = REAL(a);
   // walk the dest col by col (i.e. linearly)
   // walk the src row by row (i.e. not linearly)
   for (i = 0; i < nrow; i++, src++)
   for (j = 0; j < len; j += nrow, dest++)
   *dest = src[j];

1 test, 3 additions (src[j] being equiv to *(src + j)), 1 assignment
in the inner loop. I wonder if it's possible to get rid of one of the
additions.

This gives me another 10% gain in speed (on my 64-bit Ubuntu laptop),
not that much, but still...

BTW isn't that surprising that using div() is actually slower than using
of / and %, at least on Linux? Since div() is kind of the atomic version
of it, I would have expected it to be slightly faster :-/

Cheers,
H.



The resulting improvement is sometimes dramatic.  Here's a test
program:

 M<- matrix(seq(0,1,12000),200,60)

 print(system.time({for (i in 1:1) S<- t(M)}))
 print(system.time({for (i in 1:1) R<- t(S)}))

 v<- seq(0,2,12000)

 print(system.time({for (i in 1:10) u<- t(v)}))
 print(system.time({for (i in 1:10) w<- t(u)}))

Here are the times on an Intel Linux system:

 R version 2.11.1:Modified version:

  user  system elapsed  user  system elapsed
 1.190   0.040   1.232 0.610   0.010   0.619
  user  system elapsed  user  system elapsed
 1.200   0.020   1.226 0.610   0.000   0.616
  user  system elapsed  user  system elapsed
 0.800   0.010   0.813 0.750   0.000   0.752
  user  system elapsed  user  system elapsed
 0.910   0.010   0.921 0.860   0.000   0.864

Here are the times on a SPARC Solaris system:

 R version 2.11.1:Modified version:

  user  system elapsed  user  system elapsed
18.643   0.041  18.685 2.994   0.039   3.033
  user  system elapsed  user  system elapsed
18.574   0.041  18.616 3.123   0.039   3.163
  user  system elapsed  user  system elapsed
 3.803   0.271   4.075 3.868   0.296   4.163
  user  system elapsed  user  system elapsed
 4.184   0.273   4.457 4.238   0.302   4.540

So with the modification, transpose for a 60x200 or 200x60 matrix is
about a factor of two faster on the Intel system, and a factor of six
faster on the SPARC system.  There is little or no gain from the
modification when transposing a row or column vector, however.  (I
think it must be that on these machines multiplies and divides do not
take constant time, but are faster when, for instance, dividing by 1.)

I've appended below the new version of the modified part of the
do_transpose function in src/main/array.c.

 Radford Neal

--

 PROTECT(r = allocVector(TYPEOF(a), len));
 switch (TYPEOF(a)) {
 case LGLSXP:
 case INTSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 INTEGER(r)[i] = INTEGER(a)[j];
 }
 case REALSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 REAL(r)[i] = REAL(a)[j];
 }
 break;
 case CPLXSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 COMPLEX(r)[i] = COMPLEX(a)[j];
 }
 break;
 case STRSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 SET_STRING_ELT(r, i, STRING_ELT(a,j));
 }
 break;
 case VECSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
 }
 break;
 case RAWSXP:
 for (i = 0, j = 0; i=len) j -= (len-1);
 RAW(r)[i] = RAW(a)[j];
 }
 break;
 default:
 UNPROTECT(1);
 goto not_matrix;
 }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
htt

Re: [Rd] Speeding up matrix multiplies

2010-08-27 Thread James Cloos
I just tried out the effects of using R's NA value¹ with C arithmetic
on an amd64 linux box.

I always got a NAN result for which R's R_IsNA() would return true.

At least on this platform, NAN's propagate w/o a change in their
lower 32 bits.

If NA is supposed to propagate the way NaN is spec'ed to in IEEE754,
then on some platforms it should be OK to skip the NA/NaN checks and
let the optimised routines handle all of the matrix multiplies.

A configure test could be used to determine whether the current
platform is one where NA/NaN checks are required.

-JimC

1] given:
 union rd { double d; uint64_t l; }; union rd u;
   then:
 u.d is an R NA iff both of:
 u.l & 0x7ff0 == 0x7ff0
 u.l & 0x == 1954;
 are true.
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NEWS and readNEWS

2010-08-27 Thread Kurt Hornik
> Hadley Wickham writes:

> readNEWS() states:
>  Read R's ‘NEWS’ file or a similarly formatted one.  This is an
>  experimental feature, new in R 2.4.0 and may change in several
>  ways

> and news() also indicates that this tool is supposed to work with
> non-R news files.  However, I've not been able to get readNEWS to read
> a package news file, even when following the format indicated in
> news().  Looking at the code for readNEWS() it seems there are couple
> of places where it assumes it's working with the main R NEWS file:

>   * s.post <- " SERIES NEWS"
>   * s.pre <- "^[\t ]*CHANGES IN R VERSION "

> Is this a bug or is the documentation incorrect?

readNEWS() is really for such 3-level files, but then R itself will move
to an 2-level Rd format in R 2.12.0.

Use news() to read package news files.

-k

> Hadley

> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it safe not to coerce matrices with as.double() in .C()?

2010-08-27 Thread Liaw, Andy
From: Prof Brian Ripley
> 
> On Fri, 27 Aug 2010, peter dalgaard wrote:
> 
> >
> > On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
> >
> >> I'd very much appreciate guidance on this.  A user 
> reported that the
> >> as.double() coercion used inside the .C() call for a function in my
> >> package (specifically, randomForest:::predict.randomForest()) is
> >> taking up significant amount of time when called repeatedly, and
> >> Removing some of these reduced run time by 30-40% in some cases.
> >> These arguments are components of the fitted model (thus do not
> >> change), and are matrices.  Some basic tests show no difference in
> >> The result when the coercions are removed (other than 
> faster run time).
> >> What I like to know is whether this is safe to do, or is 
> it likely to
> >> lead
> >> to trouble in the future?
> >
> > In a word: yes. It is safe as long as you are absolutely sure that 
> > the argument has the right mode. The unsafeness comes in 
> when people 
> > might unwittingly use, say, an integer vector where a double was 
> > expected, causing memory overruns and general mayhem.
> >
> > Notice, BTW, that if you switch to .Call or .External, then 
> you have 
> > much more scope for handling such details on the C-side. E.g. you 
> > could coerce only if the object has the wrong mode, avoid 
> > duplicating things you won't be modifying anyway, etc.
> 
> But as as.double is effectively .Call it has the same freedom, and it 
> does nothing if no coercion is required.  The crunch here is 
> likely to 
> be
> 
>   ‘as.double’ attempts to coerce its argument to be of 
> double type:
>   like ‘as.vector’ it strips attributes including names.  
> (To ensure
>   that an object is of double type without stripping 
> attributes, use
>   ‘storage.mode’.)
> 
> I suspect the issue is the copying to remove attributes, in which case

I can certainly believe this.  I've tried replacing as.double() to c(), 
thinking attributes need to be stripped.  That actually increased run time very 
slightly instead of reducing it.
 
> storage.mode(x) <- "double"
> 
> should be a null op and so both fast and safe.

Will follow this advise.  Thanks to both of you for the help!

Best,
Andy

 
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it safe not to coerce matrices with as.double() in .C()?

2010-08-27 Thread Prof Brian Ripley

On Fri, 27 Aug 2010, peter dalgaard wrote:



On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:


I'd very much appreciate guidance on this.  A user reported that the
as.double() coercion used inside the .C() call for a function in my
package (specifically, randomForest:::predict.randomForest()) is
taking up significant amount of time when called repeatedly, and
Removing some of these reduced run time by 30-40% in some cases.
These arguments are components of the fitted model (thus do not
change), and are matrices.  Some basic tests show no difference in
The result when the coercions are removed (other than faster run time).
What I like to know is whether this is safe to do, or is it likely to
lead
to trouble in the future?


In a word: yes. It is safe as long as you are absolutely sure that 
the argument has the right mode. The unsafeness comes in when people 
might unwittingly use, say, an integer vector where a double was 
expected, causing memory overruns and general mayhem.


Notice, BTW, that if you switch to .Call or .External, then you have 
much more scope for handling such details on the C-side. E.g. you 
could coerce only if the object has the wrong mode, avoid 
duplicating things you won't be modifying anyway, etc.


But as as.double is effectively .Call it has the same freedom, and it 
does nothing if no coercion is required.  The crunch here is likely to 
be


 ‘as.double’ attempts to coerce its argument to be of double type:
 like ‘as.vector’ it strips attributes including names.  (To ensure
 that an object is of double type without stripping attributes, use
 ‘storage.mode’.)

I suspect the issue is the copying to remove attributes, in which case

storage.mode(x) <- "double"

should be a null op and so both fast and safe.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NEWS and readNEWS

2010-08-27 Thread Hadley Wickham
readNEWS() states:

 Read R's ‘NEWS’ file or a similarly formatted one.  This is an
 experimental feature, new in R 2.4.0 and may change in several
 ways

and news() also indicates that this tool is supposed to work with
non-R news files.  However, I've not been able to get readNEWS to read
a package news file, even when following the format indicated in
news().  Looking at the code for readNEWS() it seems there are couple
of places where it assumes it's working with the main R NEWS file:

  * s.post <- " SERIES NEWS"
  * s.pre <- "^[\t ]*CHANGES IN R VERSION "

Is this a bug or is the documentation incorrect?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it safe not to coerce matrices with as.double() in .C()?

2010-08-27 Thread peter dalgaard

On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:

> I'd very much appreciate guidance on this.  A user reported that the
> as.double() coercion used inside the .C() call for a function in my 
> package (specifically, randomForest:::predict.randomForest()) is 
> taking up significant amount of time when called repeatedly, and
> Removing some of these reduced run time by 30-40% in some cases.
> These arguments are components of the fitted model (thus do not 
> change), and are matrices.  Some basic tests show no difference in
> The result when the coercions are removed (other than faster run time).
> What I like to know is whether this is safe to do, or is it likely to
> lead
> to trouble in the future?

In a word: yes. It is safe as long as you are absolutely sure that the argument 
has the right mode. The unsafeness comes in when people might unwittingly use, 
say, an integer vector where a double was expected, causing memory overruns and 
general mayhem. 

Notice, BTW, that if you switch to .Call or .External, then you have much more 
scope for handling such details on the C-side. E.g. you could coerce only if 
the object has the wrong mode, avoid duplicating things you won't be modifying 
anyway, etc. 

> 
> Best,
> Andy
> Notice:  This e-mail message, together with any attachme...{{dropped:14}}
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Is it safe not to coerce matrices with as.double() in .C()?

2010-08-27 Thread Liaw, Andy
I'd very much appreciate guidance on this.  A user reported that the
 as.double() coercion used inside the .C() call for a function in my 
package (specifically, randomForest:::predict.randomForest()) is 
taking up significant amount of time when called repeatedly, and
Removing some of these reduced run time by 30-40% in some cases.
These arguments are components of the fitted model (thus do not 
change), and are matrices.  Some basic tests show no difference in
The result when the coercions are removed (other than faster run time).
What I like to know is whether this is safe to do, or is it likely to
lead
to trouble in the future?

Best,
Andy
Notice:  This e-mail message, together with any attachme...{{dropped:14}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] introspective capabilities

2010-08-27 Thread Christophe Rhodes
Hi,

Is there any way, from R code, to perform introspection as to where
certain names acquired their values?  The specific functionality I'm
looking for in this case is to be able to request my editor to view the
definition corresponding to a name in its original source location
(presuming for the moment that that location exists).  Other
functionality that I'm looking for is to identify where in the current
image a particular name is used -- "what functions use the value bound
to a particular name?"

The context for these questions, in case anyone is interested, is that I
am usually a Common Lisp programmer, and my programming environment for
that (SLIME) is what I'm used to.  R is sufficiently close to CL (the
discovery of withCallingHandlers/withRestarts was a pleasant surprise)
that I decided to experiment with implementing a SLIME backend for R --
and enough of it seems to work that I'm motivated to make it more
complete.  (The current state of the project is summarised at
).

Thanks,

Christophe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel