Re: [Rd] Why does the lexical analyzer drop comments ?

2009-03-23 Thread Romain Francois

Duncan Murdoch wrote:

On 22/03/2009 4:50 PM, Romain Francois wrote:

Romain Francois wrote:

Peter Dalgaard wrote:

Duncan Murdoch wrote:

On 3/20/2009 2:56 PM, romain.franc...@dbmail.com wrote:

It happens in the token function in gram.c:
    c = SkipSpace();
    if (c == '#') c = SkipComment();

and then SkipComment goes like that:
static int SkipComment(void)
{
    int c;
    while ((c = xxgetc()) != '\n'  c != R_EOF) ;
    if (c == R_EOF) EndOfFile = 2;
    return c;
}

which effectively drops comments.

Would it be possible to keep the information somewhere ?
The source code says this:
 *  The function yylex() scans the input, breaking it into
 *  tokens which are then passed to the parser.  The lexical
 *  analyser maintains a symbol table (in a very messy fashion).

so my question is could we use this symbol table to keep track 
of, say, COMMENT tokens.

Why would I even care about that ? I'm writing a package that will
perform syntax highlighting of R source code based on the output 
of the

parser, and it seems a waste to drop the comments.
An also, when you print a function to the R console, you don't 
get the comments, and some of them might be useful to the user.


Am I mad if I contemplate looking into this ? 
Comments are syntactically the same as whitespace.  You don't want 
them to affect the parsing.

Well, you might, but there is quite some madness lying that way.

Back in the bronze age, we did actually try to keep comments 
attached to (AFAIR) the preceding token. One problem is that the 
elements of the parse tree typically involve multiple tokens, and 
if comments after different tokens get stored in the same place 
something is not going back where it came from when deparsing. So 
we had problems with comments moving from one end of a loop the 
other and the like.

Ouch. That helps picturing the kind of madness ...

Another way could be to record comments separately (similarly to 
srcfile attribute for example) instead of dropping them entirely, 
but I guess this is the same as Duncan's idea, which is easier to 
set up.


You could try extending the scheme by encoding which part of a 
syntactic structure the comment belongs to, but consider for 
instance how many places in a function call you can stick in a 
comment.


f #here
( #here
a #here (possibly)
= #here
1 #this one belongs to the argument, though
) #but here as well

Coming back on this. I actually get two expressions:

  p - parse( /tmp/parsing.R)
  str( p )
length 2 expression(f, (a = 1))
 - attr(*, srcref)=List of 2
  ..$ :Class 'srcref'  atomic [1:6] 1 1 1 1 1 1
  .. .. ..- attr(*, srcfile)=Class 'srcfile' environment: 0x95c3c00
  ..$ :Class 'srcref'  atomic [1:6] 2 1 6 1 1 1
  .. .. ..- attr(*, srcfile)=Class 'srcfile' environment: 0x95c3c00
 - attr(*, srcfile)=Class 'srcfile' environment: 0x95c3c00

But anyway, if I drop the first comment, then I get one expression 
with some srcref information:


  p - parse( /tmp/parsing.R)
  str( p )
length 1 expression(f(a = 1))
 - attr(*, srcref)=List of 1
  ..$ :Class 'srcref'  atomic [1:6] 1 1 5 1 1 1
  .. .. ..- attr(*, srcfile)=Class 'srcfile' environment: 0x9bca314
 - attr(*, srcfile)=Class 'srcfile' environment: 0x9bca314

but as far as i can see, there is only srcref information for that 
expression as a whole, it does not go beyond, so I am not sure I can 
implement Duncan's proposal without more detailed information from 
the parser, since I will only have the chance to check if a 
whitespace is actually a comment if it is between two expressions 
with a srcref.


Currently srcrefs are only attached to whole statements.  Since your 
source only included one or two statements, you only get one or two 
srcrefs.  It would not be hard to attach a srcref to every 
subexpression; there hasn't been a need for that before, so I didn't 
do it just for the sake of efficiency.


I understand that. I wanted to make sure I did not miss something.

However, it might make sense for you to have your own parser, based on 
the grammar in R's parser, but handling white space differently. 
Certainly it would make sense to do that before making changes to the 
base R one.  The whole source is in src/main/gram.y; if you're not 
familiar with Bison, I can give you a hand.


Thank you, I appreciate your help. Having my own parser is the option I 
am slowly converging to.
I'll start with reading bison documentation. Besides bison documents, is 
there R specific documentation on how the R parser was written ?




Duncan Murdoch



Would it be sensible then to retain the comments and their srcref 
information, but separate from the tokens used for the actual 
parsing, in some other attribute of the output of parse ?


Romain

If you're doing syntax highlighting, you can determine the 
whitespace by
looking at the srcref records, and then parse that to determine 
what isn't being counted as tokens.  (I think you'll find a few 
things there besides whitespace, but it is a 

[Rd] gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)

2009-03-23 Thread waku
Full_Name: Wacek Kusnierczyk
Version: 2.10.0 r48181
OS: Ubuntu 8.04 Linux 32bit
Submission from: (NULL) (129.241.199.135)


there seems to be something wrong with r's regexing.  consider the following
example:

gregexpr('a*|b', 'ab')
# positions: 1 2
# lengths: 1 1

gsub('a*|b', '.', 'ab')
# ..

where the pattern matches any number of 'a's or one b, and replaces the match
with a dot, globally.  the answer is correct (assuming a dfa engine).  however,

gregexpr('a*|b', 'ab', perl=TRUE)
# positions: 1 2
# lengths: 1 0

gsub('a*|b', '.', 'ab', perl=TRUE)
# .b.

where the pattern is identical, but the result is wrong.  perl uses an nfa (if
it used a dfa, the result would still be wrong), and in the above example it
should find *four* matches, collectively including *all* letters in the input,
thus producing *four* dots (and *only* dots) in the output:

perl -le '
   $input = qq|ab|;
   print qq|match: $_| foreach $input =~ /a*|b/g;
   $input =~ s/a*|b/./g;
   print qq|output: $input|;'
# match: a
# match: 
# match: b
# match: 
# output: 

since with perl=TRUE both gregexpr and gsub seem to use pcre, i've checked the
example with pcretest, and also with a trivial c program (available on demand)
using the pcre api;  there were four matches, exactly as in the perl bit above.

the results above are surprising, and suggest a bug in r's use of pcre rather
than in pcre itself.  possibly, the issue is that when an empty sting is matched
(with a*, for example), the next attempt is not trying to match a non-empty
string at the same position, but rather an empty string again at the next
position.  for example,

gsub('a|b|c', '.', 'abc', perl=TRUE)
# ..., correct

gsub('a*|b|c', '.', 'abc', perl=TRUE)
# .b.c., wrong

gsub('a|b*|c', '.', 'abc', perl=TRUE)
# ..c., wrong (but now only 'c' remains)

gsub('a|b*|c', '.', 'aba', perl=TRUE)
# ..., incidentally correct


without detailed analysis of the code, i guess the bug is located somewhere in
src/main/pcre.c, and is distributed among the do_p* functions, so that multiple
fixes may be needed.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
somewhat related to a previous discussion [1] on how 'names-' would
sometimes modify its argument in place, and sometimes produce a modified
copy without changing the original, here's another example of how it
becomes visible to the user when r makes or doesn't make a copy of an
object:

x = NULL
dput(x)
# NULL
class(x) = 'integer'
# error: invalid (NULL) left side of assignment

x = c()
dput(x)
# NULL
class(x) = 'integer'
dput(x)
# integer(0)

in both cases, x ends up with the value NULL (the no-value object).  in
both cases, dput explains that x is NULL.  in both cases, an attempt is
made to make x be an empty integer vector.  the first fails, because it
tries to modify NULL itself, the latter apparently does not and succeeds.

however, the following has a different pattern:

x = NULL
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

x = c()
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

and also:

x = c()
class(x) = 'integer'
# fine
class(x) = 'foo'
# error: attempt to set an attribute on NULL

how come?  the behaviour can obviously be explained by looking at the
source code (hardly surprisingly, because it is as it is because the
source is as it is), and referring to the NAMED property (i.e., the
sxpinfo.named field of a SEXPREC struct).  but can the *design* be
justified?  can the apparent incoherences visible above the interface be
defended? 

why should the first example above be unable to produce an empty integer
vector? 

why is it possible to set a class attribute, but not a names attribute,
on c()? 

why is it possible to set the class attribute in c() to 'integer', but
not to 'foo'? 

why are there different error messages for apparently the same problem?


vQ


[1] search the rd archives for 'surprising behaviour of names-'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
 WK == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 on Mon, 23 Mar 2009 09:52:19 +0100 writes:

WK somewhat related to a previous discussion [1] on how 'names-' would
WK sometimes modify its argument in place, and sometimes produce a modified
WK copy without changing the original, here's another example of how it
WK becomes visible to the user when r makes or doesn't make a copy of an
WK object:

WK x = NULL
WK dput(x)
WK # NULL
WK class(x) = 'integer'
WK # error: invalid (NULL) left side of assignment

does not happen for me in R-2.8.1,  R-patched or newer

So you must be using your own patched version of  R ?




WK x = c()
WK dput(x)
WK # NULL
WK class(x) = 'integer'
WK dput(x)
WK # integer(0)

WK in both cases, x ends up with the value NULL (the no-value object).  in
WK both cases, dput explains that x is NULL.  in both cases, an attempt is
WK made to make x be an empty integer vector.  the first fails, because it
WK tries to modify NULL itself, the latter apparently does not and 
succeeds.

WK however, the following has a different pattern:

WK x = NULL
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL

WK x = c()
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL

WK and also:

WK x = c()
WK class(x) = 'integer'
WK # fine
WK class(x) = 'foo'
WK # error: attempt to set an attribute on NULL

WK how come?  the behaviour can obviously be explained by looking at the
WK source code (hardly surprisingly, because it is as it is because the
WK source is as it is), and referring to the NAMED property (i.e., the
WK sxpinfo.named field of a SEXPREC struct).  but can the *design* be
WK justified?  can the apparent incoherences visible above the interface be
WK defended? 

WK why should the first example above be unable to produce an empty integer
WK vector? 

WK why is it possible to set a class attribute, but not a names attribute,
WK on c()? 

WK why is it possible to set the class attribute in c() to 'integer', but
WK not to 'foo'? 

WK why are there different error messages for apparently the same problem?


WK vQ


WK [1] search the rd archives for 'surprising behaviour of names-'

WK __
WK R-devel@r-project.org mailing list
WK https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why does the lexical analyzer drop comments ?

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 3:10 AM, Romain Francois wrote:

Duncan Murdoch wrote:

 
However, it might make sense for you to have your own parser, based on 
the grammar in R's parser, but handling white space differently. 
Certainly it would make sense to do that before making changes to the 
base R one.  The whole source is in src/main/gram.y; if you're not 
familiar with Bison, I can give you a hand.


Thank you, I appreciate your help. Having my own parser is the option I 
am slowly converging to.
I'll start with reading bison documentation. Besides bison documents, is 
there R specific documentation on how the R parser was written ?


I don't think so.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
 WK == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 on Mon, 23 Mar 2009 10:56:37 +0100 writes:

WK Martin Maechler wrote:
 WK == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 
 
WK somewhat related to a previous discussion [1] on how 'names-' would
WK sometimes modify its argument in place, and sometimes produce a modified
WK copy without changing the original, here's another example of how it
WK becomes visible to the user when r makes or doesn't make a copy of an
WK object:
 
WK x = NULL
WK dput(x)
WK # NULL
WK class(x) = 'integer'
WK # error: invalid (NULL) left side of assignment
 
 does not happen for me in R-2.8.1,  R-patched or newer
 
 So you must be using your own patched version of  R ?
 

WK oops, i meant to use 2.8.1 or devel for testing.  you're right, in this
WK example there is no error reported in  2.8.0, but see below.

ok

 [.. omitted part no longer relevant ]

WK however, the following has a different pattern:
 
WK x = NULL
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL
 

WK i get the error in devel.

Yes,  NULL is NULL is NULL !   Do read  ?NULL !   [ ;-) ]

more verbously,  all NULL objects in R are identical, or as the
help page says, there's only ``*The* NULL Object'' in R,
i.e., NULL cannot get any attributes.

WK x = c()
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL
 

WK i get the error in devel.

of course!  
   [I think *you* should have noticed that  NULL and c()  *are* identical]

WK and also:
 
WK x = c()
WK class(x) = 'integer'
WK # fine
fine yes; 
here, the convention has been to change NULL into integer(0);
and no, this won't change, if you find it inconsistent.


WK class(x) = 'foo'
WK # error: attempt to set an attribute on NULL
 

WK i get the error in devel.

No, not if you evaluate the statements above (where 'x' has
become  'integer(0)' in the mean time).

But yes, you get in something like

x - c();  class(x) - foo

and I do agree that there's a buglet : 
The error message should be slightly more precise,
--- improvement proposals are welcome ---
but an error nontheless

WK it doesn't seem coherent to me:  why can i set the class, 

you cannot set it, you can *change* it.

WK but not names
WK attribute on both NULL and c()?  why can i set the class attribute to
WK 'integer', but not to 'foo', as i could on a non-empty vector:

WK x = 1
WK class(x) = 'foo'
WK # just fine

mainly because 'NULL is NULL is NULL' 
(NULL cannot have attributes)

WK i'd naively expect to be able to create an empty vector classed 'foo',

yes, but that expectation is wrong

WK displayed perhaps as

WK # speculation
WK x = NULL
WK class(x) = 'foo'
WK x
WK # foo(0)

WK or maybe as

WK x
WK # NULL
WK # attr(, class)
WK # [1] foo

WK vQ

WK __
WK R-devel@r-project.org mailing list
WK https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
 WK == Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 on Mon, 23 Mar 2009 16:11:04 +0100 writes:

WK Martin Maechler wrote:
 
 [.. omitted part no longer relevant ]
 
WK however, the following has a different pattern:
  
WK x = NULL
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL
  
 
WK i get the error in devel.
 
 Yes,  NULL is NULL is NULL !   Do read  ?NULL !   [ ;-) ]
 
 more verbously,  all NULL objects in R are identical, or as the
 help page says, there's only ``*The* NULL Object'' in R,
 i.e., NULL cannot get any attributes.
 

WK yes, but that's not the issue.  the issue is that names(x)- seems to
WK try to attach an attribute to NULL, while it could, in principle, do the
WK same as class(x)-, i.e., coerce x to some type (and hence attach the
WK name attribute not to NULL, but to the coerced-to object).

yes, it could;  but really, the  fact that  'class-' works is
the exception.  The other variants (with the error message) are
the rule.

WK but, as someone else explained to me behind the scenes, the matters are
WK a little bit, so to speak, untidy:

WK x = NULL
WK class(x) = 'integer'
WK # just fine

WK x = NULL
WK attr(x, 'class') = 'integer'
WK # no go

WK where class()-, but not attr(,'class')-, will try to coerce x to an
WK object of the storage *mode* 'integer', hence the former succeeds
WK (because it sets, roughly, the 'integer' class on an empty integer
WK vector), while the latter fails (because it tries to set the 'integer'
WK class on NULL itself).

WK what was not clear to me is not why setting a class on NULL fails here,
WK but why it is setting on NULL in the first place.  after all,

WK x = 1
WK names(x) = 'foo'

WK is setting names on a *copy* of 1, not on *the* 1, so why could not
WK class()- create a 'copy' of NULL, i.e., an empty vector of some type
WK (perhaps raw, as the lowest in the hierarchy).

yes, it could.  I personally don't think this would add any
value to R's behavior;  rather, for most useRs I'd think it
rather helps to get an error in such a case, than a  raw(0)
object.

Also, note (here and further below),
that Using   class(.) -  className
is an S3 idiom   and S3 classes  ``don't really exist'', 
the class attribute being a useful hack,
and many of us would rather like to work and improve working
with S4 classes ( generics  methods) than to fiddle with  'class-'.

In S4, you'd  use  setClass(.), new(.) and  setAs(.),
typically, for defining and changing classes of objects.

But maybe I have now lead you into a direction I will later
regret, 

when you start telling us about the perceived inconsistencies of
S4 classes, methods, etc.
BTW: If you go there, please do use  R 2.9.0 (or newer)
 exclusively.

WK x = c()
WK dput(x)
WK # NULL
WK names(x) = character(0)
WK # error: attempt to set an attribute on NULL
  
 
WK i get the error in devel.
 
 of course!  
 [I think *you* should have noticed that  NULL and c()  *are* identical]
 
WK and also:
  
WK x = c()
WK class(x) = 'integer'
WK # fine
 fine yes; 
 here, the convention has been to change NULL into integer(0);
 and no, this won't change, if you find it inconsistent.
 

WK that's ok, this is what i'd expect in the other cases, too (modulo the
WK actual storage mode).


 
WK class(x) = 'foo'
WK # error: attempt to set an attribute on NULL
  
 
WK i get the error in devel.
 
 No, not if you evaluate the statements above (where 'x' has
 become  'integer(0)' in the mean time).
 
 But yes, you get in something like
 
 x - c();  class(x) - foo
 

WK that's what i meant, must have forgotten the x = c().

 and I do agree that there's a buglet : 
 The error message should be slightly more precise,
 --- improvement proposals are welcome ---
 but an error nontheless
 
WK it doesn't seem coherent to me:  why can i set the class, 
 
 you cannot set it, you can *change* it.
 

WK terminological wars? 

WK btw. the class of NULL is NULL;  why can't nullify an object by
WK setting its class to 'NULL'?

WK x = 1
WK class(x) = 'NULL'
WK x
WK # *not* NULL

see above {S4 / S3 / ...}; 
If you want to  nullify, rather use
more (S-language) idiomatic calls like

as(x, NULL)
or  
as.null(x)

both of which do work.

Regards,
Martin


WK and one more interesting example:

WK x = 1:2
WK class(x) = 'NULL'
WK x
WK # [1] 1 2
WK # attr(,class) NULL
WK x[1]
WK # 1
WK x[2]
WK # 2
WK is.vector(x)
WK # FALSE

WK hurray!!! apparently, i've alchemized a non-vector vector...  (you can
WK do it in r-devel, for that 

Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:

  more verbously,  all NULL objects in R are identical, or as the
  help page says, there's only ``*The* NULL Object'' in R,
  i.e., NULL cannot get any attributes.
  

 WK yes, but that's not the issue.  the issue is that names(x)- seems to
 WK try to attach an attribute to NULL, while it could, in principle, do 
 the
 WK same as class(x)-, i.e., coerce x to some type (and hence attach the
 WK name attribute not to NULL, but to the coerced-to object).

 yes, it could;  but really, the  fact that  'class-' works is
 the exception.  The other variants (with the error message) are
 the rule.
   

ok.

 Also, note (here and further below),
 that Using   class(.) -  className
 is an S3 idiom   and S3 classes  ``don't really exist'', 
 the class attribute being a useful hack,
 and many of us would rather like to work and improve working
 with S4 classes ( generics  methods) than to fiddle with  'class-'.

 In S4, you'd  use  setClass(.), new(.) and  setAs(.),
 typically, for defining and changing classes of objects.

 But maybe I have now lead you into a direction I will later
 regret, 
 
 when you start telling us about the perceived inconsistencies of
 S4 classes, methods, etc.
 BTW: If you go there, please do use  R 2.9.0 (or newer)
   

using latest r-devel for the most part.

i think you will probably not regret your words;  from what i've seen
already, s4 classes are the last thing i'd ever try to learn in r.  but
yes, there would certainly be lots of issues to complain about.  i'll
rather wait for s5.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error in Package Description (PR#13618)

2009-03-23 Thread bonner . reed
In the Installer for R.8.1 for Mac OSX Tiger or higher, the  
description of the GNU Fortran package in the customize option writes  
Fortran as Fotran.  Just a minor error, but should be fixed if  
revisited.

-Bonner Reed
Yale Univ.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] matplot and lend=butt

2009-03-23 Thread Christophe Genolini

Hi the list,

I am using matplot with the option lend=butt, but only the first line 
(the black) is printed correctly  :


 matplot(matrix(1:9,3),type=c,lwd=10,lty=1,lend=butt)

Is it a bug ?
I am using R2.8.1 under windows XP pack3.

Christophe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] matplot and lend=butt

2009-03-23 Thread Gabor Grothendieck
It looks to be a bug.  Here is the code and notice that ... is passed to
plot (which plots the first series) but not to lines (which plots the rest):

if (!add) {
ii - ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

This is from 2.8.1 patched but I noticed the same thing in
R version 2.9.0 Under development (unstable) (2009-03-02 r48041)


On Mon, Mar 23, 2009 at 6:25 PM, Christophe Genolini
cgeno...@u-paris10.fr wrote:
 Hi the list,

 I am using matplot with the option lend=butt, but only the first line (the
 black) is printed correctly  :

 matplot(matrix(1:9,3),type=c,lwd=10,lty=1,lend=butt)

 Is it a bug ?
 I am using R2.8.1 under windows XP pack3.

 Christophe

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] matplot does not considere the parametre lend (PR#13619)

2009-03-23 Thread cgenolin
Full_Name: Christophe Genolini
Version: 2.8.1, but also 2.9
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


I am using matplot with the option lend=butt, but only the first line (the
black) is printed correctly  :

 matplot(matrix(1:9,3),type=c,lwd=10,lty=1,lend=butt)

Gabor Grothendieck find the problem in matplot code:
the ... is passed to plot (which plots the first series) but not to lines (which
plots the rest):

if (!add) {
ii - ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] savePlot export strange eps (PR#13620)

2009-03-23 Thread cgenolin
Full_Name: Christophe Genolini
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


savePlot export eps graph that seems to be incorrect. 

Trying to incorporate them in a LaTeX file, I get : 
++
Cannot determine size of graphics in foo.eps (no BoundingBox)
--

Trying to open them with GSview, I get :
++
GSview 4.9 2007-11-18
AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Displaying non DSC file C:/Documents and Settings/Christophe/Mes
documents/Recherche/Trajectoires/kmeal/trajectories/testsDev/toti.eps
Error: /undefined in 
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--  
--nostringval--   2   %stopped_push   --nostringval--   --nostringval--   false 
 1   %stopped_push   1   3   %oparray_pop   1   3   %oparray_pop   1   3  
%oparray_pop   1   3   %oparray_pop   .runexec2   --nostringval--  
--nostringval--   --nostringval--   2   %stopped_push   --nostringval--
Dictionary stack:
   --dict:1130/1686(ro)(G)--   --dict:0/20(G)--   --dict:74/200(L)--
Current allocation mode is local
Last OS error: No such file or directory

--- Begin offending input ---
   €      L   z  f  C  fC   EMF   $6  7     
   l       °    €— ° G r a p h A p p %        €%
       €%        €%        €%        €%        €%        €%       
€%        €%        €%        €%        €K   @   0              
N   N   y  @  N   N   y  @  %        €%        €:      
   _   8      8   8 
   %               
   ;            l   *  6      Z  õ    @      f   ï  `  0  %   
    €(         %        €%        €K   @   0               N   N 
 y  @  N   N   y  @  %        €%        €:      
   _   8      8   8 
   %               
   ;            m  ñ  6      Z  »    @      g  µ  `  ÷  %   
    €(         %        €%        €K   @   0             
 ¡  ¡  ¡  ¡  %        €%        €:      
   _   8      8   8    
--- End offending input ---
file offset = 1024
gsapi_run_string_continue returns -101

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] matplot does not considere the parametre lend (PR#13619)

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 7:25 PM, cgeno...@u-paris10.fr wrote:

Full_Name: Christophe Genolini
Version: 2.8.1, but also 2.9
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


I am using matplot with the option lend=butt, but only the first line (the
black) is printed correctly  :


matplot(matrix(1:9,3),type=c,lwd=10,lty=1,lend=butt)


I'd call this another case where it is performing as documented, but 
should probably be changed (but not by me).  In the meantime, there's 
the simple workaround:


save - par(lend=butt)
matplot(matrix(1:9,3),type=c,lwd=10,lty=1)
par(save)

Duncan Murdoch



Gabor Grothendieck find the problem in matplot code:
the ... is passed to plot (which plots the first series) but not to lines (which
plots the rest):

if (!add) {
ii - ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 Tested in R 2.8.1 Windows

   
 ff - formals(function(x)1)
 ff1 - as.list(function(x)1)[1]
 
 # ff1 acts the same as ff in the examples below, but is a list rather
 than a pairlist

   
 dput( ff , control=c(warnIncomplete))
 
 list(x = )

 This string is not parsable, but dput does not give a warning as specified.

   

same in 2.10.0 r48200, ubuntu 8.04 linux 32 bit


 dput( ff , control=c(all,warnIncomplete))
 
 list(x = quote())
   

likewise.

 This string is parseable, but quote() is not evaluable, and again dput
 does not give a warning as specified.

 In fact, I don't know how to write out ff$x.  It appears to be the
 zero-length name:

 is.name(ff$x) = TRUE
 as.character(ff$x) = 

 but there is no obvious way to create such an object:

 as.name() = execution error
 quote(``) = parse error

 The above examples should either produce a parseable and evaluable
 output (preferable), or give a warning.
   

interestingly,

quote(NULL)
# NULL

as.name(NULL)
# Error in as.name(NULL) :
#  invalid type/length (symbol/0) in vector allocation

æsj.

vQ

 -s

 PS As a matter of comparative linguistics, many versions of Lisp allow
 zero-length symbols/names.  But R coerces strings to symbols/names in
 a way that Lisp does not, so that might be an invitation to obscure
 bugs in R where it is rarely problematic in Lisp.

 PPS dput(pairlist(23),control=all) also gives the same output as
 dput(list(23),control=all), but as I understand it, pairlists will
 become non-user-visible at some point.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Stavros Macrakis
Tested in R 2.8.1 Windows

 ff - formals(function(x)1)
 ff1 - as.list(function(x)1)[1]
# ff1 acts the same as ff in the examples below, but is a list rather
than a pairlist

 dput( ff , control=c(warnIncomplete))
list(x = )

This string is not parsable, but dput does not give a warning as specified.

 dput( ff , control=c(all,warnIncomplete))
list(x = quote())

This string is parseable, but quote() is not evaluable, and again dput
does not give a warning as specified.

In fact, I don't know how to write out ff$x.  It appears to be the
zero-length name:

is.name(ff$x) = TRUE
as.character(ff$x) = 

but there is no obvious way to create such an object:

as.name() = execution error
quote(``) = parse error

The above examples should either produce a parseable and evaluable
output (preferable), or give a warning.

-s

PS As a matter of comparative linguistics, many versions of Lisp allow
zero-length symbols/names.  But R coerces strings to symbols/names in
a way that Lisp does not, so that might be an invitation to obscure
bugs in R where it is rarely problematic in Lisp.

PPS dput(pairlist(23),control=all) also gives the same output as
dput(list(23),control=all), but as I understand it, pairlists will
become non-user-visible at some point.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread Wacek Kusnierczyk

(this post suggests a patch to the sources, so i allow myself to divert
it to r-devel)

Bert Gunter wrote:
 x a numeric vector, matrix or data frame. 
 y NULL (default) or a vector, matrix or data frame with compatible
 dimensions to x. The default is equivalent to y = x (but more efficient). 

   
bert points to an interesting fragment of ?var:  it suggests that
computing var(x) is more efficient than computing var(x,x), for any x
valid as input to var.  indeed:

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.091
# 2 var(x, x)   2.051

that's of course, so to speak, unreasonable:  for what var(x) does is
actually computing the covariance of x and x, which should be the same
as var(x,x). 

the hack is that if y is given, there's an overhead of memory allocation
for *both* x and y when y is given, as seen in src/main/cov.c:720+.
incidentally, it seems that the problem can be solved with a trivial fix
(see the attached patch), so that

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.121
# 2 var(x, x)   1.107

with the quick checks

all.equal(var(x), var(x, x))
# TRUE
   
all(var(x) == var(x, x))
# TRUE

and for cor it seems to make cor(x,x) slightly faster than cor(x), while
originally it was twice slower:

# original
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.196
# 2 cor(x, x)   2.253
   
# patched
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.207
# 2 cor(x, x)   1.204

(there is a visible penalty due to an additional pointer test, but it's
10ms on 1000 replications with 1 data points, which i think is
negligible.)

 This is as clear as I would know how to state. 

i believe bert is right.

however, with the above fix, this can now be rewritten as:


x: a numeric vector, matrix or data frame. 
y: a vector, matrix or data frame with dimensions compatible to those of x. 
By default, y = x. 


which, to my simple mind, is even more clear than what bert would know
how to state, and less likely to cause the sort of confusion that
originated this thread.

the attached patch suggests modifications to src/main/cov.c and
src/library/stats/man/cor.Rd.
it has been prepared and checked as follows:

svn co https://svn.r-project.org/R/trunk trunk
cd trunk
# edited the sources
svn diff  cov.diff
svn revert -R src
patch -p0  cov.diff

tools/rsync-recommended
./configure
make
make check
bin/R
# subsequent testing within R

if you happen to consider this patch for a commit, please be sure to
examine and test it carefully first.

vQ
Index: src/library/stats/man/cor.Rd
===
--- src/library/stats/man/cor.Rd	(revision 48200)
+++ src/library/stats/man/cor.Rd	(working copy)
@@ -6,9 +6,9 @@
 \name{cor}
 \title{Correlation, Variance and Covariance (Matrices)}
 \usage{
-var(x, y = NULL, na.rm = FALSE, use)
+var(x, y = x, na.rm = FALSE, use)
 
-cov(x, y = NULL, use = everything,
+cov(x, y = x, use = everything,
 method = c(pearson, kendall, spearman))
 
 cor(x, y = NULL, use = everything,
@@ -32,9 +32,7 @@
 }
 \arguments{
   \item{x}{a numeric vector, matrix or data frame.}
-  \item{y}{\code{NULL} (default) or a vector, matrix or data frame with
-compatible dimensions to \code{x}.   The default is equivalent to
-\code{y = x} (but more efficient).}
+  \item{y}{a vector, matrix or data frame with dimensions compatible to those of \code{x}. By default, y = x.}
   \item{na.rm}{logical. Should missing values be removed?}
   \item{use}{an optional character string giving a
 method for computing covariances in the presence
Index: src/main/cov.c
===
--- src/main/cov.c	(revision 48200)
+++ src/main/cov.c	(working copy)
@@ -689,7 +689,7 @@
 if (ansmat) PROTECT(ans = allocMatrix(REALSXP, ncx, ncy));
 else PROTECT(ans = allocVector(REALSXP, ncx * ncy));
 sd_0 = FALSE;
-if (isNull(y)) {
+if (isNull(y) || (DATAPTR(x) == DATAPTR(y))) {
 	if (everything) { /* NA's are propagated */
 	PROTECT(xm = allocVector(REALSXP, ncx));
 	PROTECT(ind = allocVector(LGLSXP, ncx));
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 7:37 PM, Stavros Macrakis wrote:

Tested in R 2.8.1 Windows


ff - formals(function(x)1)
ff1 - as.list(function(x)1)[1]

# ff1 acts the same as ff in the examples below, but is a list rather
than a pairlist


dput( ff , control=c(warnIncomplete))

list(x = )

This string is not parsable, but dput does not give a warning as specified.


That's not what warnIncomplete is documented to do.  The docs (in 
?.deparseOpts) say


 'warnIncomplete' Some exotic objects such as environments,
  external pointers, etc. can not be deparsed properly.  This
  option causes a warning to be issued if any of those may give
  problems.

  Also, the parser in R  2.7.0 would only accept strings of up
  to 8192 bytes, and this option gives a warning for longer
  strings.

As far as I can see, none of those conditions apply here:  ff is not one 
of those exotic objects or a very long string.  The really relevant 
comment is in the dput documentation:


Deparsing an object is difficult, and not always possible.

Yes, it would be nice if deparsing and parsing were mutual inverses, but 
they're not, and are documented not to be.




dput( ff , control=c(all,warnIncomplete))

list(x = quote())

This string is parseable, but quote() is not evaluable, and again dput
does not give a warning as specified.

In fact, I don't know how to write out ff$x. 


I don't know of any input that will parse to it.


 It appears to be the

zero-length name:

is.name(ff$x) = TRUE
as.character(ff$x) = 


This may give you a hint:

 y - ff$x
 y
Error: argument y is missing, with no default

It's a special internal thing that triggers the missing value error when 
evaluated.  It probably shouldn't be user visible at all.


Duncan Murdoch



but there is no obvious way to create such an object:

as.name() = execution error
quote(``) = parse error

The above examples should either produce a parseable and evaluable
output (preferable), or give a warning.

-s

PS As a matter of comparative linguistics, many versions of Lisp allow
zero-length symbols/names.  But R coerces strings to symbols/names in
a way that Lisp does not, so that might be an invitation to obscure
bugs in R where it is rarely problematic in Lisp.

PPS dput(pairlist(23),control=all) also gives the same output as
dput(list(23),control=all), but as I understand it, pairlists will
become non-user-visible at some point.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread William Dunlap
 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
 Sent: Monday, March 23, 2009 5:28 PM
 To: Stavros Macrakis
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] dput(as.list(function...)...) bug
 
 On 23/03/2009 7:37 PM, Stavros Macrakis wrote:
  Tested in R 2.8.1 Windows
  
  ff - formals(function(x)1)
  ff1 - as.list(function(x)1)[1]
  # ff1 acts the same as ff in the examples below, but is a 
 list rather
  than a pairlist
  
  dput( ff , control=c(warnIncomplete))
  list(x = )
  
  This string is not parsable, but dput does not give a 
 warning as specified.

The string list(x = ) is parsable:
  z - parse(text=list(x = ))
Evaluating the resulting expression results in a run-time error:
  eval(z)
  Error in eval(expr, envir, enclos) :
element 1 is empty;
 the part of the args list of 'list' being evaluated was:
 (x = )
That is the same sort of error you get from running list(,):
list wants all of its arguments to be present.

With other functions such a construct will run in R, although its result
does not match that of S+ (or SV4):

   f-function(x,y,z)c(x=if(missing(x))missingelse x,
y=if(missing(y))missing else y,
z=if(missing(z))missing else z)
  R f(x=,2,3)
x   y   z
  2 3 missing
  S+ f(x=,2,3)
 x   y   z
   missing 2 3
or
  R f(y=,1,3)
x   y   z
  1 3 missing
  S+ f(y=,1,3)
 x   y   z
   1 missing 3

R and S+ act the same if you skip an argument by position
   f(1,,3)
 x   y   z
   1 missing 3
but differ if you use name=nothing: in S+ it skips an argument by name
and in R it is ignored by ordinary functions (where
typeof(func)==closure).

I wouldn't say this is recommended or often used or the point
of the original post.
 
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 
 That's not what warnIncomplete is documented to do.  The docs (in 
 ?.deparseOpts) say
 
   'warnIncomplete' Some exotic objects such as environments,
external pointers, etc. can not be deparsed properly.  This
option causes a warning to be issued if any of 
 those may give
problems.
 
Also, the parser in R  2.7.0 would only accept 
 strings of up
to 8192 bytes, and this option gives a warning for longer
strings.
 
 As far as I can see, none of those conditions apply here:  ff 
 is not one 
 of those exotic objects or a very long string.  The really relevant 
 comment is in the dput documentation:
 
 Deparsing an object is difficult, and not always possible.
 
 Yes, it would be nice if deparsing and parsing were mutual 
 inverses, but 
 they're not, and are documented not to be.
 
 
  dput( ff , control=c(all,warnIncomplete))
  list(x = quote())
  
  This string is parseable, but quote() is not evaluable, and 
 again dput
  does not give a warning as specified.
  
  In fact, I don't know how to write out ff$x. 
 
 I don't know of any input that will parse to it.
 
 
   It appears to be the
  zero-length name:
  
  is.name(ff$x) = TRUE
  as.character(ff$x) = 
 
 This may give you a hint:
 
   y - ff$x
   y
 Error: argument y is missing, with no default
 
 It's a special internal thing that triggers the missing value 
 error when 
 evaluated.  It probably shouldn't be user visible at all.
 
 Duncan Murdoch
 
  
  but there is no obvious way to create such an object:
  
  as.name() = execution error
  quote(``) = parse error
  
  The above examples should either produce a parseable and evaluable
  output (preferable), or give a warning.
  
  -s
  
  PS As a matter of comparative linguistics, many versions of 
 Lisp allow
  zero-length symbols/names.  But R coerces strings to 
 symbols/names in
  a way that Lisp does not, so that might be an invitation to obscure
  bugs in R where it is rarely problematic in Lisp.
  
  PPS dput(pairlist(23),control=all) also gives the same output as
  dput(list(23),control=all), but as I understand it, pairlists will
  become non-user-visible at some point.
  
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread William Dunlap
Doesn't Fortran still require that the arguments to
a function not alias each other (in whole or in part)?
I could imagine that var() might call into Fortran code
(BLAS or LAPACK).  Wouldn you want to chance erroneous
results  at a high optimization level to save a bit of
time in an unusual situation?

(I could also imagine someone changing the R interpreter
so that x and x[-length(x)] could share the same memory
block and that could cause Fortran aliasing problems as
well.)

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk
 Sent: Monday, March 23, 2009 4:40 PM
 To: r-devel@r-project.org
 Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter
 Subject: Re: [Rd] [R] variance/mean
 
 
 (this post suggests a patch to the sources, so i allow myself 
 to divert
 it to r-devel)
 
 Bert Gunter wrote:
  x a numeric vector, matrix or data frame. 
  y NULL (default) or a vector, matrix or data frame with compatible
  dimensions to x. The default is equivalent to y = x (but 
 more efficient). 
 

 bert points to an interesting fragment of ?var:  it suggests that
 computing var(x) is more efficient than computing var(x,x), for any x
 valid as input to var.  indeed:
 
 set.seed(0)
 x = matrix(rnorm(1), 100, 100)
 
 library(rbenchmark)
 benchmark(replications=1000, columns=c('test', 'elapsed'),
var(x),
var(x, x))
 #test elapsed
 # 1var(x)   1.091
 # 2 var(x, x)   2.051
 
 that's of course, so to speak, unreasonable:  for what var(x) does is
 actually computing the covariance of x and x, which should be the same
 as var(x,x). 
 
 the hack is that if y is given, there's an overhead of memory 
 allocation
 for *both* x and y when y is given, as seen in src/main/cov.c:720+.
 incidentally, it seems that the problem can be solved with a 
 trivial fix
 (see the attached patch), so that
 
 set.seed(0)
 x = matrix(rnorm(1), 100, 100)
 
 library(rbenchmark)
 benchmark(replications=1000, columns=c('test', 'elapsed'),
var(x),
var(x, x))
 #test elapsed
 # 1var(x)   1.121
 # 2 var(x, x)   1.107
 
 with the quick checks
 
 all.equal(var(x), var(x, x))
 # TRUE

 all(var(x) == var(x, x))
 # TRUE
 
 and for cor it seems to make cor(x,x) slightly faster than 
 cor(x), while
 originally it was twice slower:
 
 # original
 benchmark(replications=1000, columns=c('test', 'elapsed'),
cor(x),
cor(x, x))
 #test elapsed
 # 1cor(x)   1.196
 # 2 cor(x, x)   2.253

 # patched
 benchmark(replications=1000, columns=c('test', 'elapsed'),
cor(x),
cor(x, x))
 #test elapsed
 # 1cor(x)   1.207
 # 2 cor(x, x)   1.204
 
 (there is a visible penalty due to an additional pointer 
 test, but it's
 10ms on 1000 replications with 1 data points, which i think is
 negligible.)
 
  This is as clear as I would know how to state. 
 
 i believe bert is right.
 
 however, with the above fix, this can now be rewritten as:
 
 
 x: a numeric vector, matrix or data frame. 
 y: a vector, matrix or data frame with dimensions compatible 
 to those of x. 
 By default, y = x. 
 
 
 which, to my simple mind, is even more clear than what bert would know
 how to state, and less likely to cause the sort of confusion that
 originated this thread.
 
 the attached patch suggests modifications to src/main/cov.c and
 src/library/stats/man/cor.Rd.
 it has been prepared and checked as follows:
 
 svn co https://svn.r-project.org/R/trunk trunk
 cd trunk
 # edited the sources
 svn diff  cov.diff
 svn revert -R src
 patch -p0  cov.diff
 
 tools/rsync-recommended
 ./configure
 make
 make check
 bin/R
 # subsequent testing within R
 
 if you happen to consider this patch for a commit, please be sure to
 examine and test it carefully first.
 
 vQ
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread William Dunlap
Oops, I was thinking backwards.  This sort of
hack could avoid the Fortran aliasing rules, not
run afoul of them.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of William Dunlap
 Sent: Monday, March 23, 2009 6:18 PM
 To: Wacek Kusnierczyk; r-devel@r-project.org
 Subject: Re: [Rd] [R] variance/mean
 
 Doesn't Fortran still require that the arguments to
 a function not alias each other (in whole or in part)?
 I could imagine that var() might call into Fortran code
 (BLAS or LAPACK).  Wouldn you want to chance erroneous
 results  at a high optimization level to save a bit of
 time in an unusual situation?
 
 (I could also imagine someone changing the R interpreter
 so that x and x[-length(x)] could share the same memory
 block and that could cause Fortran aliasing problems as
 well.)
 
 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com  
 
  -Original Message-
  From: r-devel-boun...@r-project.org 
  [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek 
 Kusnierczyk
  Sent: Monday, March 23, 2009 4:40 PM
  To: r-devel@r-project.org
  Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter
  Subject: Re: [Rd] [R] variance/mean
  
  
  (this post suggests a patch to the sources, so i allow myself 
  to divert
  it to r-devel)
  
  Bert Gunter wrote:
   x a numeric vector, matrix or data frame. 
   y NULL (default) or a vector, matrix or data frame with compatible
   dimensions to x. The default is equivalent to y = x (but 
  more efficient). 
  
 
  bert points to an interesting fragment of ?var:  it suggests that
  computing var(x) is more efficient than computing var(x,x), 
 for any x
  valid as input to var.  indeed:
  
  set.seed(0)
  x = matrix(rnorm(1), 100, 100)
  
  library(rbenchmark)
  benchmark(replications=1000, columns=c('test', 'elapsed'),
 var(x),
 var(x, x))
  #test elapsed
  # 1var(x)   1.091
  # 2 var(x, x)   2.051
  
  that's of course, so to speak, unreasonable:  for what 
 var(x) does is
  actually computing the covariance of x and x, which should 
 be the same
  as var(x,x). 
  
  the hack is that if y is given, there's an overhead of memory 
  allocation
  for *both* x and y when y is given, as seen in src/main/cov.c:720+.
  incidentally, it seems that the problem can be solved with a 
  trivial fix
  (see the attached patch), so that
  
  set.seed(0)
  x = matrix(rnorm(1), 100, 100)
  
  library(rbenchmark)
  benchmark(replications=1000, columns=c('test', 'elapsed'),
 var(x),
 var(x, x))
  #test elapsed
  # 1var(x)   1.121
  # 2 var(x, x)   1.107
  
  with the quick checks
  
  all.equal(var(x), var(x, x))
  # TRUE
 
  all(var(x) == var(x, x))
  # TRUE
  
  and for cor it seems to make cor(x,x) slightly faster than 
  cor(x), while
  originally it was twice slower:
  
  # original
  benchmark(replications=1000, columns=c('test', 'elapsed'),
 cor(x),
 cor(x, x))
  #test elapsed
  # 1cor(x)   1.196
  # 2 cor(x, x)   2.253
 
  # patched
  benchmark(replications=1000, columns=c('test', 'elapsed'),
 cor(x),
 cor(x, x))
  #test elapsed
  # 1cor(x)   1.207
  # 2 cor(x, x)   1.204
  
  (there is a visible penalty due to an additional pointer 
  test, but it's
  10ms on 1000 replications with 1 data points, which i think is
  negligible.)
  
   This is as clear as I would know how to state. 
  
  i believe bert is right.
  
  however, with the above fix, this can now be rewritten as:
  
  
  x: a numeric vector, matrix or data frame. 
  y: a vector, matrix or data frame with dimensions compatible 
  to those of x. 
  By default, y = x. 
  
  
  which, to my simple mind, is even more clear than what bert 
 would know
  how to state, and less likely to cause the sort of confusion that
  originated this thread.
  
  the attached patch suggests modifications to src/main/cov.c and
  src/library/stats/man/cor.Rd.
  it has been prepared and checked as follows:
  
  svn co https://svn.r-project.org/R/trunk trunk
  cd trunk
  # edited the sources
  svn diff  cov.diff
  svn revert -R src
  patch -p0  cov.diff
  
  tools/rsync-recommended
  ./configure
  make
  make check
  bin/R
  # subsequent testing within R
  
  if you happen to consider this patch for a commit, please be sure to
  examine and test it carefully first.
  
  vQ
  
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel