On 7/16/2009 2:34 PM, Hervé Pagès wrote:
Duncan Murdoch wrote:
On 15/07/2009 10:15 PM, Hervé Pagès wrote:
I have to confess that I'm a little bit puzzled by how the
PROTECT/UNPROTECT mechanism is used in the C code of R.
Duncan, you say the problem you just fixed was an easy one.
I looked at the C code too and was able to recognize a pattern
that is indeed easy to identify as problematic:
an unprotected call to allocVector() followed by a call
that can trigger garbage collection (in that case another
call to allocVector())
It only took me 1 minute to find another occurrence of this pattern.
It's in the do_grep() function (src/main/character.c, line 1168):
> gctorture(TRUE)
> grep("b", c(A="aa", B="aba"), value=TRUE)
B
"B"
Given that the overhead of PROTECTing the SEXP returned by
allocVector() can really be considered 0 (or almost), I'm
wondering why this is not done in a more systematic way.
This is an explanation, not a justification:
If you look at the history of that file, you'll see a hint: line 1168
was written in 1998, the other lines were written later, by other
people. It is simply a matter of someone thinking something was safe
when it wasn't, and it's not clear who was wrong: it may have been safe
when written, but susceptible to later changes.
Even when nothing between PROTECT(allocVector()) and the
corresponding UNPROTECT could trigger garbage collection
(e.g. PROTECT(allocVector()) is close to the return statement).
Because making exceptions like this can make your code
really hard to maintain in the long term.
There are a lot of people who object to anything that slows R at all.
That puts pressure on anyone writing code to do it in a way that wastes
as few cycles as possible. That in turn makes it harder for someone
else to analyze the code. And overuse of PROTECT also makes the code
harder to read.
Most of the calls to allocVector() are currently protected. There is a
very small percentage of calls to allocVector() that are not. Most of
the times because people apparently decided that, at the time they wrote
the code, it didn't seem necessary. It doesn't matter if they were wrong
or write. My point is that this game is not worth it.
I bet if you protected all the calls to allocVector() you wouldn't notice
any slow down in R. What is guaranteed though is that you end up with code
that sooner or later will break because of some changes that are made to
the function itself or to another function called by your function (because
this other function is now calling gc and you were assuming that it wouldn't
do that).
And this kind of breakage is one of the worst kinds: if you are lucky,
you get a segfault, but if you are not, you don't notice anything and
get the wrong answer, like in the length<-() and grep() examples (and
you can safely assume that there are many other places like this in R).
I guess 99.999% of R users would happily trade a 0.001% slow down for
a correct result.
First make it right, then make it fast. And sorry, but you're not going
to make it fast by saving a few calls to PROTECT() here and here.
As I said, I gave you an explanation, not a justification. I generally
agree with you, but not everyone does. For example, after posting the
first patch I received a private email suggesting that the following
PROTECT on xnames could be removed. I didn't remove it, because I think
it is mostly harmless, and it's not worth my time to analyze whether any
particular PROTECT is unneeded.
I generally agree with you, but I don't totally agree with you. The
protection stack is not infinite, so any time you add a PROTECT you have
to be sure it will be removed in a relatively short time. You can't
have something like
for (i=0; i < n; i++) { PROTECT( ans <- allocVector(...) ) ; ... }
UNPROTECT(n);
because you are likely to blow the stack when n is large. You need the
UNPROTECTs within the loop, but still after ans stops being vulnerable
to garbage collection. It's hard to place them automatically. And
every extra function/macro call adds to the obscurity of the code, so
it's harder to read it and know whether it really does what you wanted
it to. My inclination is to over-PROTECT things, but not to PROTECT
everything.
Duncan Murdoch
Cheers,
H.
As an example: just below line 1168 there's another unprotected
allocVector of nm, but I think that one is safe, because it is attached
as an attribute to ans (which is now PROTECT'd) before anything is done
that could trigger gc. And a few lines below that, on another branch of
the if, another unprotected but safe-looking allocation. Should I
protect those? Then I'd also need to call UNPROTECT again, or keep a
counter of PROTECT calls, and the code would be a little harder to read.
Thanks for tracking down these two bugs; I'll fix the grep bug too. If
you feel like looking for more, it would be appreciated. (Writing an
automatic tool to analyze code and determine where new ones are needed
and where existing ones could be eliminated might be a fun project, but
there are too many fun projects.)
Duncan Murdoch
Cheers,
H.
Hervé Pagès wrote:
murd...@stats.uwo.ca wrote:
On 15/07/2009 8:30 PM, murd...@stats.uwo.ca wrote:
On 15/07/2009 8:08 PM, Hervé Pagès wrote:
Hi,
> x <- c(a=10, b=20)
> length(x) <- 1
> x
a
10
But with gctorture turned on, I get:
> gctorture(TRUE)
> x <- c(a=10, b=20)
> length(x) <- 1
> x
a
"a" <---- ???
> x <- c(a=10, b=20)
> length(x) <- 3
*** caught segfault ***
address (nil), cause 'unknown'
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
This seems to have been around for a while (I get this with R 2.10,
2.9 and 2.8). Note that I don't get this with an unnamed vector.
This problem affects the methods package. I found it while
troubleshooting the "Protection stack overflow" I reported earlier
(see https://stat.ethz.ch/pipermail/r-devel/2009-July/054030.html)
but I can't tell yet whether the 2 issues are related or not.
That's clearly a bug (reproducible in today's R-devel build); I've
cc'd this reply to r-bugs. I'll take a look and see if I can track
it down.
That's got to be the easiest low-level bug I've worked on in a
while. Just a missing PROTECT. Now fixed, about to be committed to
R-devel.
Thanks Duncan! And the "Protection stack overflow" issue that was
affecting
the methods package is gone now :)
Cheers,
H.
Duncan Murdoch
It would be nice to see some reaction from the R developers
about these issues. Thanks in advance!
You should post them as bug reports if they are as clearly bugs as
this one; otherwise they can easily get lost in the noise. I'm not
going to offer to look into the other one; I don't know the insides
of the methods package.
Duncan Murdoch
H.
hpa...@fhcrc.org wrote:
Hi,
> gctorture(TRUE)
> setGeneric("foo", function(x, y) standardGeneric("foo"))
[1] "foo"
> setMethod("foo", c("ANY", "ANY"),
+ function(x, y) cat("calling foo,ANY,ANY method\n")
+ )
Error: protect(): protection stack overflow
Sorry this is something I already reported one week ago here
https://stat.ethz.ch/pipermail/r-devel/2009-July/053973.html
but I just had a 2nd look at it and realized that the problem
can in fact be reproduced out of the .onLoad() hook. So I'm
reporting it again with a different subject.
See my sessionInfo() below. Thanks!
H.
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-26 r48837)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel