Re: [Rd] [External] Possible ALTREP bug
Hi Toby, Just to be more concrete about why to avoid REAL and about the iteration macros Luke mentioned. The ITERATE_BY_REGION* macros in include/R_ext/Itermacros.h are built on top of *_GET_REGION rather than REAL/INTEGER. The key difference here is that REAL/INTEGER go down to the Dataptr method of the altclass, will generally "explode" the ALTREP, which returns a writable data pointer, thus invalidating the metadata, effectively turning it into a non-altrep form while Get_region method provides a way to return a contiguous region of the data in a way that (can and generally does) leave the ALTREPness intact. We can see the difference by looking at the two respective methods for the compact sequences ALTREP classes that luke wrote that ship with R: static void *compact_intseq_Dataptr(SEXP x, Rboolean writeable) { if (COMPACT_SEQ_EXPANDED(x) == R_NilValue) { /* no need to re-run if expanded data exists */ PROTECT(x); SEXP info = COMPACT_SEQ_INFO(x); R_xlen_t n = COMPACT_INTSEQ_INFO_LENGTH(info); int n1 = COMPACT_INTSEQ_INFO_FIRST(info); int inc = COMPACT_INTSEQ_INFO_INCR(info); * SEXP val = allocVector(INTSXP, n);* int *data = INTEGER(val); if (inc == 1) { /* compact sequences n1 : n2 with n1 <= n2 */ *for (R_xlen_t i = 0; i < n; i++)* * data[i] = (int) (n1 + i);* } else if (inc == -1) { /* compact sequences n1 : n2 with n1 > n2 */ for (R_xlen_t i = 0; i < n; i++) data[i] = (int) (n1 - i); } else error("compact sequences with increment %d not supported yet", inc); * SET_COMPACT_SEQ_EXPANDED(x, val);* UNPROTECT(1); } *return DATAPTR(COMPACT_SEQ_EXPANDED(x));* } So the above sets the "Expanded" altrep data to a SEXP that holds a normal SEXP with all the data if it isn't there already, and then returns DATAPTR of that. Get_region, on the other hand, looks like this: static R_xlen_t compact_intseq_Get_region(SEXP sx, R_xlen_t i, R_xlen_t n,* int *buf*) { /* should not get here if x is already expanded */ CHECK_NOT_EXPANDED(sx); SEXP info = COMPACT_SEQ_INFO(sx); R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info); R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info); int inc = COMPACT_INTSEQ_INFO_INCR(info); R_xlen_t ncopy = size - i > n ? n : size - i; if (inc == 1) { * for (R_xlen_t k = 0; k < ncopy; k++)**buf[k] = (int) (n1 + k + i);* return ncopy; } else if (inc == -1) { for (R_xlen_t k = 0; k < ncopy; k++) buf[k] = (int) (n1 - k - i); return ncopy; } else error("compact sequences with increment %d not supported yet", inc); } Here we are passing a buffer to the method, and that buffer gets populated with the data. No new SEXP, no expanding the vector. ITERATE_BY_REGION wraps that nicely so you can loop over the whole thing, but the data is grabbed chunkwise, no huge allocation, no expanding of the altrep. One of the examples from summary.c is rsum, which looks like: static Rboolean rsum(SEXP sx, double *value, Rboolean narm) { LDOUBLE s = 0.0; Rboolean updated = FALSE; * ITERATE_BY_REGION(sx, x, i, nbatch, double, REAL, {* *for (R_xlen_t k = 0; k < nbatch; k++) {* * if (!narm || !ISNAN(x[k])) {* *if(!updated) updated = TRUE;* *s += x[k];* * }* *}* * });* if(s > DBL_MAX) *value = R_PosInf; else if (s < -DBL_MAX) *value = R_NegInf; else *value = (double) s; return updated; } sx is the SEXP input, x is the chosen name of a variable, declared in the scope of the macro, that holds the current batch of data that can be used within the bracketted expression that is the macro's last argument. nbatch is the name of a variable which will contain how many elements of data the current batch has in it, double is the raw data type (used in declaration of x, in fact), REAL is the R-macro .."type" I guess, used internally. i is the name chosen for another declared-within-the-macro variable which will always contain the position in the overall vector corresponding to the start of the current buffer. Hope that helps. Best, ~G On Thu, Jun 17, 2021 at 11:32 AM wrote: > On Thu, 17 Jun 2021, Toby Hocking wrote: > > > Oliver, for clarification that section in writing R extensions mentions > > VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was > > looking for an explanation of all the *_ELT functions (which are > apparently > > new), not just VECTOR_ELT. > > Thanks Simon that response was very helpful. > > One more question: are there any circumstances in which one should use > > REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used > > interchangeably? > > For a single call it is better to use REAL_ELT(x, i) since it doesn't > force allocating a possibly large object in order to get a pointer to > its data with REAL(x). If you are iterating over a whole object you > may want to get data in chunks. There are iteration macros that > help. Some examples are in src/main/summary.c. > > Best, > > luke > >
Re: [Rd] [External] Possible ALTREP bug
On Thu, 17 Jun 2021, Toby Hocking wrote: Oliver, for clarification that section in writing R extensions mentions VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was looking for an explanation of all the *_ELT functions (which are apparently new), not just VECTOR_ELT. Thanks Simon that response was very helpful. One more question: are there any circumstances in which one should use REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used interchangeably? For a single call it is better to use REAL_ELT(x, i) since it doesn't force allocating a possibly large object in order to get a pointer to its data with REAL(x). If you are iterating over a whole object you may want to get data in chunks. There are iteration macros that help. Some examples are in src/main/summary.c. Best, luke On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek wrote: The usual quote applies: "use the source, Luke": $ grep _ELT *.h | sort Rdefines.h:#define SET_ELEMENT(x, i, val) SET_VECTOR_ELT(x, i, val) Rinternals.h: The function STRING_ELT is used as an argument to arrayAssign even Rinternals.h:#define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i] Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i); Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t); Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v); Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i); Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v); Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v); Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP); Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v); So the indexing is with R_xlen_t and they return the value itself as one would expect. Cheers, Simon > On Jun 17, 2021, at 2:22 AM, Toby Hocking wrote: > > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I > looked in Writing R Extensions and R Internals but I did not see any > mention. > REAL_ELT is briefly mentioned on > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html > Would it be possible to please add some mention of them to Writing R > Extensions? > - how many of these _ELT functions are there? INTEGER, REAL, ... ? > - in what version of R were they introduced? > - I guess input types are always SEXP and int? > - What are the output types for each? > > On Fri, May 28, 2021 at 5:16 PM wrote: > >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may >> be possible to check that places where they are used allow for them to >> allocate. I have fixed the one that got caught by Gabor's example, and >> a rchk run might be able to pick up others if rchk knows these could >> allocate. (I may also be forgetting other places where the _ELt >> methods are used.) Fixing all call sites for REAL, INTEGER, etc, was >> never realistic so there GC has to be suspended during the method >> call, and that is done in the dispatch mechanism. >> >> The bigger problem is jumps from inside things that existing code >> assumes will not do that. Catching those jumps is possible but >> expensive; doing anything sensible if one is caught is really not >> possible. >> >> Best, >> >> luke >> >> On Fri, 28 May 2021, Gabriel Becker wrote: >> >>> Hi Jim et al, >>> Just to hopefully add a bit to what Luke already answered, from what I am >>> recalling looking back at that bioconductor thread Elt methods are used >> in >>> places where there are har
Re: [Rd] [External] Possible ALTREP bug
Oliver, for clarification that section in writing R extensions mentions VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was looking for an explanation of all the *_ELT functions (which are apparently new), not just VECTOR_ELT. Thanks Simon that response was very helpful. One more question: are there any circumstances in which one should use REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used interchangeably? On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek wrote: > The usual quote applies: "use the source, Luke": > > $ grep _ELT *.h | sort > Rdefines.h:#define SET_ELEMENT(x, i, val) SET_VECTOR_ELT(x, i, val) > Rinternals.h: The function STRING_ELT is used as an argument to > arrayAssign even > Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i] > Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i); > Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i); > Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t); > Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v); > Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i); > Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i); > Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i); > Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i); > Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v); > Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v); > Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v); > Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v); > Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v); > Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP); > Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v); > Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v); > Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v); > Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v); > > So the indexing is with R_xlen_t and they return the value itself as one > would expect. > > Cheers, > Simon > > > > On Jun 17, 2021, at 2:22 AM, Toby Hocking wrote: > > > > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I > > looked in Writing R Extensions and R Internals but I did not see any > > mention. > > REAL_ELT is briefly mentioned on > > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html > > Would it be possible to please add some mention of them to Writing R > > Extensions? > > - how many of these _ELT functions are there? INTEGER, REAL, ... ? > > - in what version of R were they introduced? > > - I guess input types are always SEXP and int? > > - What are the output types for each? > > > > On Fri, May 28, 2021 at 5:16 PM wrote: > > > >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may > >> be possible to check that places where they are used allow for them to > >> allocate. I have fixed the one that got caught by Gabor's example, and > >> a rchk run might be able to pick up others if rchk knows these could > >> allocate. (I may also be forgetting other places where the _ELt > >> methods are used.) Fixing all call sites for REAL, INTEGER, etc, was > >> never realistic so there GC has to be suspended during the method > >> call, and that is done in the dispatch mechanism. > >> > >> The bigger problem is jumps from inside things that existing code > >> assumes will not do that. Catching those jumps is possible but > >> expensive; doing anything sensible if one is caught is really not > >> possible. > >> > >> Best, > >> > >> luke > >> > >> On Fri, 28 May 2021, Gabriel Becker wrote: > >> > >>> Hi Jim et al, > >>> Just to hopefully add a bit to what Luke already answered, from what I > am > >>> recalling looking back at that bioconductor thread Elt methods are used > >> in > >>> places where there are hard implicit assumptions that no garbage > >> collection > >>> will occur (ie they are called on things that aren't PROTECTed), and > >> beyond > >>> that, in places where there are hard assumptions that no error > (longjmp) > >>> will occur. I could be wrong, but I don't know that suspending garbage > >>> collection would protect from the second one. Ie it is possible that an > >>> error *ever* being raised from R code that implements an elt method > could > >>> cause all hell to break loose. > >>> > >>> Luke or Tomas Kalibera would know more. > >>> > >>> I was disappointed that implementing ALTREPs in R code was not in the > >> cards > >>> (it was in my original proposal back in 2016 to the DSC) but I trust > Luke > >>> that there are important reasons we can't safely allow that. > >>>
Re: [Rd] [External] Possible ALTREP bug
The usual quote applies: "use the source, Luke": $ grep _ELT *.h | sort Rdefines.h:#define SET_ELEMENT(x, i, val) SET_VECTOR_ELT(x, i, val) Rinternals.h: The function STRING_ELT is used as an argument to arrayAssign even Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i] Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i); Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t); Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v); Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i); Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v); Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v); Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP); Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v); So the indexing is with R_xlen_t and they return the value itself as one would expect. Cheers, Simon > On Jun 17, 2021, at 2:22 AM, Toby Hocking wrote: > > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I > looked in Writing R Extensions and R Internals but I did not see any > mention. > REAL_ELT is briefly mentioned on > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html > Would it be possible to please add some mention of them to Writing R > Extensions? > - how many of these _ELT functions are there? INTEGER, REAL, ... ? > - in what version of R were they introduced? > - I guess input types are always SEXP and int? > - What are the output types for each? > > On Fri, May 28, 2021 at 5:16 PM wrote: > >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may >> be possible to check that places where they are used allow for them to >> allocate. I have fixed the one that got caught by Gabor's example, and >> a rchk run might be able to pick up others if rchk knows these could >> allocate. (I may also be forgetting other places where the _ELt >> methods are used.) Fixing all call sites for REAL, INTEGER, etc, was >> never realistic so there GC has to be suspended during the method >> call, and that is done in the dispatch mechanism. >> >> The bigger problem is jumps from inside things that existing code >> assumes will not do that. Catching those jumps is possible but >> expensive; doing anything sensible if one is caught is really not >> possible. >> >> Best, >> >> luke >> >> On Fri, 28 May 2021, Gabriel Becker wrote: >> >>> Hi Jim et al, >>> Just to hopefully add a bit to what Luke already answered, from what I am >>> recalling looking back at that bioconductor thread Elt methods are used >> in >>> places where there are hard implicit assumptions that no garbage >> collection >>> will occur (ie they are called on things that aren't PROTECTed), and >> beyond >>> that, in places where there are hard assumptions that no error (longjmp) >>> will occur. I could be wrong, but I don't know that suspending garbage >>> collection would protect from the second one. Ie it is possible that an >>> error *ever* being raised from R code that implements an elt method could >>> cause all hell to break loose. >>> >>> Luke or Tomas Kalibera would know more. >>> >>> I was disappointed that implementing ALTREPs in R code was not in the >> cards >>> (it was in my original proposal back in 2016 to the DSC) but I trust Luke >>> that there are important reasons we can't safely allow that. >>> >>> Best, >>> ~G >>> >>> On Fri, May 28, 2021 at 8:31 AM Jim Hester >> wrote: >>> From reading the discussion on the Bioconductor issue tracker it >>> seems like >>> the reason the GC is not suspended for the non-string ALTREP Elt >>> methods is >>> primarily due to performance concerns. >>> >>> If this is the case perhaps an additional flag could be added to >>> the >>> `R_set_altrep_*()` functions so ALTREP authors could indicate if >>> GC should >>> be halted when that particular method is called for that >>> particular ALTREP >>> class. >>> >>> This would avoid the performance hit (other than a boolean >>>
Re: [Rd] [External] Possible ALTREP bug
*_ELT accessor functions are described in "vector accessor functions" in Writing R extensions. https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Vector-accessor-functions On Wed, Jun 16, 2021 at 4:22 PM Toby Hocking wrote: > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I > looked in Writing R Extensions and R Internals but I did not see any > mention. > REAL_ELT is briefly mentioned on > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html > Would it be possible to please add some mention of them to Writing R > Extensions? > - how many of these _ELT functions are there? INTEGER, REAL, ... ? > - in what version of R were they introduced? > - I guess input types are always SEXP and int? > - What are the output types for each? > > On Fri, May 28, 2021 at 5:16 PM wrote: > > > Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may > > be possible to check that places where they are used allow for them to > > allocate. I have fixed the one that got caught by Gabor's example, and > > a rchk run might be able to pick up others if rchk knows these could > > allocate. (I may also be forgetting other places where the _ELt > > methods are used.) Fixing all call sites for REAL, INTEGER, etc, was > > never realistic so there GC has to be suspended during the method > > call, and that is done in the dispatch mechanism. > > > > The bigger problem is jumps from inside things that existing code > > assumes will not do that. Catching those jumps is possible but > > expensive; doing anything sensible if one is caught is really not > > possible. > > > > Best, > > > > luke > > > > On Fri, 28 May 2021, Gabriel Becker wrote: > > > > > Hi Jim et al, > > > Just to hopefully add a bit to what Luke already answered, from what I > am > > > recalling looking back at that bioconductor thread Elt methods are used > > in > > > places where there are hard implicit assumptions that no garbage > > collection > > > will occur (ie they are called on things that aren't PROTECTed), and > > beyond > > > that, in places where there are hard assumptions that no error > (longjmp) > > > will occur. I could be wrong, but I don't know that suspending garbage > > > collection would protect from the second one. Ie it is possible that an > > > error *ever* being raised from R code that implements an elt method > could > > > cause all hell to break loose. > > > > > > Luke or Tomas Kalibera would know more. > > > > > > I was disappointed that implementing ALTREPs in R code was not in the > > cards > > > (it was in my original proposal back in 2016 to the DSC) but I trust > Luke > > > that there are important reasons we can't safely allow that. > > > > > > Best, > > > ~G > > > > > > On Fri, May 28, 2021 at 8:31 AM Jim Hester > > wrote: > > > From reading the discussion on the Bioconductor issue tracker it > > > seems like > > > the reason the GC is not suspended for the non-string ALTREP Elt > > > methods is > > > primarily due to performance concerns. > > > > > > If this is the case perhaps an additional flag could be added to > > > the > > > `R_set_altrep_*()` functions so ALTREP authors could indicate if > > > GC should > > > be halted when that particular method is called for that > > > particular ALTREP > > > class. > > > > > > This would avoid the performance hit (other than a boolean > > > check) for the > > > standard case when no allocations are expected, but allow > > > authors to > > > indicate that R should pause GC if needed for methods in their > > > class. > > > > > > On Fri, May 28, 2021 at 9:42 AM wrote: > > > > > > > integer and real Elt methods are not expected to allocate. You > > > would > > > > have to suspend GC to be able to do that. This currently can't > > > be done > > > > from package code. > > > > > > > > Best, > > > > > > > > luke > > > > > > > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > > > > > > > I have found some weird SEXP corruption behavior with > > > ALTREP, which > > > > > could be a bug. (Or I could be doing something wrong.) > > > > > > > > > > I have an integer ALTREP vector that calls back to R from > > > the Elt > > > > > method. When this vector is indexed in a lapply(), its first > > > element > > > > > gets corrupted. Sometimes it's just a type change to > > > logical, but > > > > > sometimes the corruption causes a crash. > > > > > > > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small > > > package > > > > > that demonstrates this: > > > https://github.com/gaborcsardi/redfish > > > > > > > > > > The R callback in this package calls > > > `loadNamespace("Matrix")`, but > > > > > the same crash happens for other packages as well,
Re: [Rd] [External] Possible ALTREP bug
By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I looked in Writing R Extensions and R Internals but I did not see any mention. REAL_ELT is briefly mentioned on https://svn.r-project.org/R/branches/ALTREP/ALTREP.html Would it be possible to please add some mention of them to Writing R Extensions? - how many of these _ELT functions are there? INTEGER, REAL, ... ? - in what version of R were they introduced? - I guess input types are always SEXP and int? - What are the output types for each? On Fri, May 28, 2021 at 5:16 PM wrote: > Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may > be possible to check that places where they are used allow for them to > allocate. I have fixed the one that got caught by Gabor's example, and > a rchk run might be able to pick up others if rchk knows these could > allocate. (I may also be forgetting other places where the _ELt > methods are used.) Fixing all call sites for REAL, INTEGER, etc, was > never realistic so there GC has to be suspended during the method > call, and that is done in the dispatch mechanism. > > The bigger problem is jumps from inside things that existing code > assumes will not do that. Catching those jumps is possible but > expensive; doing anything sensible if one is caught is really not > possible. > > Best, > > luke > > On Fri, 28 May 2021, Gabriel Becker wrote: > > > Hi Jim et al, > > Just to hopefully add a bit to what Luke already answered, from what I am > > recalling looking back at that bioconductor thread Elt methods are used > in > > places where there are hard implicit assumptions that no garbage > collection > > will occur (ie they are called on things that aren't PROTECTed), and > beyond > > that, in places where there are hard assumptions that no error (longjmp) > > will occur. I could be wrong, but I don't know that suspending garbage > > collection would protect from the second one. Ie it is possible that an > > error *ever* being raised from R code that implements an elt method could > > cause all hell to break loose. > > > > Luke or Tomas Kalibera would know more. > > > > I was disappointed that implementing ALTREPs in R code was not in the > cards > > (it was in my original proposal back in 2016 to the DSC) but I trust Luke > > that there are important reasons we can't safely allow that. > > > > Best, > > ~G > > > > On Fri, May 28, 2021 at 8:31 AM Jim Hester > wrote: > > From reading the discussion on the Bioconductor issue tracker it > > seems like > > the reason the GC is not suspended for the non-string ALTREP Elt > > methods is > > primarily due to performance concerns. > > > > If this is the case perhaps an additional flag could be added to > > the > > `R_set_altrep_*()` functions so ALTREP authors could indicate if > > GC should > > be halted when that particular method is called for that > > particular ALTREP > > class. > > > > This would avoid the performance hit (other than a boolean > > check) for the > > standard case when no allocations are expected, but allow > > authors to > > indicate that R should pause GC if needed for methods in their > > class. > > > > On Fri, May 28, 2021 at 9:42 AM wrote: > > > > > integer and real Elt methods are not expected to allocate. You > > would > > > have to suspend GC to be able to do that. This currently can't > > be done > > > from package code. > > > > > > Best, > > > > > > luke > > > > > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > > > > > I have found some weird SEXP corruption behavior with > > ALTREP, which > > > > could be a bug. (Or I could be doing something wrong.) > > > > > > > > I have an integer ALTREP vector that calls back to R from > > the Elt > > > > method. When this vector is indexed in a lapply(), its first > > element > > > > gets corrupted. Sometimes it's just a type change to > > logical, but > > > > sometimes the corruption causes a crash. > > > > > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small > > package > > > > that demonstrates this: > > https://github.com/gaborcsardi/redfish > > > > > > > > The R callback in this package calls > > `loadNamespace("Matrix")`, but > > > > the same crash happens for other packages as well, and > > sometimes it > > > > also happens if I don't load any packages at all. (But that > > example > > > > was much more complicated, so I went with the package > > loading.) > > > > > > > > It is somewhat random, and sometimes turning off the JIT > > avoids the > > > > crash, but not always. > > > > > > > > Hopefully I am just doing something wrong in the ALTREP code > > (see > > > > > >
Re: [Rd] [External] Possible ALTREP bug
Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may be possible to check that places where they are used allow for them to allocate. I have fixed the one that got caught by Gabor's example, and a rchk run might be able to pick up others if rchk knows these could allocate. (I may also be forgetting other places where the _ELt methods are used.) Fixing all call sites for REAL, INTEGER, etc, was never realistic so there GC has to be suspended during the method call, and that is done in the dispatch mechanism. The bigger problem is jumps from inside things that existing code assumes will not do that. Catching those jumps is possible but expensive; doing anything sensible if one is caught is really not possible. Best, luke On Fri, 28 May 2021, Gabriel Becker wrote: Hi Jim et al, Just to hopefully add a bit to what Luke already answered, from what I am recalling looking back at that bioconductor thread Elt methods are used in places where there are hard implicit assumptions that no garbage collection will occur (ie they are called on things that aren't PROTECTed), and beyond that, in places where there are hard assumptions that no error (longjmp) will occur. I could be wrong, but I don't know that suspending garbage collection would protect from the second one. Ie it is possible that an error *ever* being raised from R code that implements an elt method could cause all hell to break loose. Luke or Tomas Kalibera would know more. I was disappointed that implementing ALTREPs in R code was not in the cards (it was in my original proposal back in 2016 to the DSC) but I trust Luke that there are important reasons we can't safely allow that. Best, ~G On Fri, May 28, 2021 at 8:31 AM Jim Hester wrote: From reading the discussion on the Bioconductor issue tracker it seems like the reason the GC is not suspended for the non-string ALTREP Elt methods is primarily due to performance concerns. If this is the case perhaps an additional flag could be added to the `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should be halted when that particular method is called for that particular ALTREP class. This would avoid the performance hit (other than a boolean check) for the standard case when no allocations are expected, but allow authors to indicate that R should pause GC if needed for methods in their class. On Fri, May 28, 2021 at 9:42 AM wrote: > integer and real Elt methods are not expected to allocate. You would > have to suspend GC to be able to do that. This currently can't be done > from package code. > > Best, > > luke > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > I have found some weird SEXP corruption behavior with ALTREP, which > > could be a bug. (Or I could be doing something wrong.) > > > > I have an integer ALTREP vector that calls back to R from the Elt > > method. When this vector is indexed in a lapply(), its first element > > gets corrupted. Sometimes it's just a type change to logical, but > > sometimes the corruption causes a crash. > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package > > that demonstrates this: https://github.com/gaborcsardi/redfish > > > > The R callback in this package calls `loadNamespace("Matrix")`, but > > the same crash happens for other packages as well, and sometimes it > > also happens if I don't load any packages at all. (But that example > > was much more complicated, so I went with the package loading.) > > > > It is somewhat random, and sometimes turning off the JIT avoids the > > crash, but not always. > > > > Hopefully I am just doing something wrong in the ALTREP code (see > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it > > is not actually a bug. > > > > Thanks, > > Gabor > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Re: [Rd] [External] Possible ALTREP bug
Hi Jim et al, Just to hopefully add a bit to what Luke already answered, from what I am recalling looking back at that bioconductor thread Elt methods are used in places where there are hard implicit assumptions that no garbage collection will occur (ie they are called on things that aren't PROTECTed), and beyond that, in places where there are hard assumptions that no error (longjmp) will occur. I could be wrong, but I don't know that suspending garbage collection would protect from the second one. Ie it is possible that an error *ever* being raised from R code that implements an elt method could cause all hell to break loose. Luke or Tomas Kalibera would know more. I was disappointed that implementing ALTREPs in R code was not in the cards (it was in my original proposal back in 2016 to the DSC) but I trust Luke that there are important reasons we can't safely allow that. Best, ~G On Fri, May 28, 2021 at 8:31 AM Jim Hester wrote: > From reading the discussion on the Bioconductor issue tracker it seems like > the reason the GC is not suspended for the non-string ALTREP Elt methods is > primarily due to performance concerns. > > If this is the case perhaps an additional flag could be added to the > `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should > be halted when that particular method is called for that particular ALTREP > class. > > This would avoid the performance hit (other than a boolean check) for the > standard case when no allocations are expected, but allow authors to > indicate that R should pause GC if needed for methods in their class. > > On Fri, May 28, 2021 at 9:42 AM wrote: > > > integer and real Elt methods are not expected to allocate. You would > > have to suspend GC to be able to do that. This currently can't be done > > from package code. > > > > Best, > > > > luke > > > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > > > I have found some weird SEXP corruption behavior with ALTREP, which > > > could be a bug. (Or I could be doing something wrong.) > > > > > > I have an integer ALTREP vector that calls back to R from the Elt > > > method. When this vector is indexed in a lapply(), its first element > > > gets corrupted. Sometimes it's just a type change to logical, but > > > sometimes the corruption causes a crash. > > > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package > > > that demonstrates this: https://github.com/gaborcsardi/redfish > > > > > > The R callback in this package calls `loadNamespace("Matrix")`, but > > > the same crash happens for other packages as well, and sometimes it > > > also happens if I don't load any packages at all. (But that example > > > was much more complicated, so I went with the package loading.) > > > > > > It is somewhat random, and sometimes turning off the JIT avoids the > > > crash, but not always. > > > > > > Hopefully I am just doing something wrong in the ALTREP code (see > > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it > > > is not actually a bug. > > > > > > Thanks, > > > Gabor > > > > > > __ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > -- > > Luke Tierney > > Ralph E. Wareham Professor of Mathematical Sciences > > University of Iowa Phone: 319-335-3386 > > Department of Statistics andFax: 319-335-3017 > > Actuarial Science > > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Possible ALTREP bug
>From reading the discussion on the Bioconductor issue tracker it seems like the reason the GC is not suspended for the non-string ALTREP Elt methods is primarily due to performance concerns. If this is the case perhaps an additional flag could be added to the `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should be halted when that particular method is called for that particular ALTREP class. This would avoid the performance hit (other than a boolean check) for the standard case when no allocations are expected, but allow authors to indicate that R should pause GC if needed for methods in their class. On Fri, May 28, 2021 at 9:42 AM wrote: > integer and real Elt methods are not expected to allocate. You would > have to suspend GC to be able to do that. This currently can't be done > from package code. > > Best, > > luke > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > I have found some weird SEXP corruption behavior with ALTREP, which > > could be a bug. (Or I could be doing something wrong.) > > > > I have an integer ALTREP vector that calls back to R from the Elt > > method. When this vector is indexed in a lapply(), its first element > > gets corrupted. Sometimes it's just a type change to logical, but > > sometimes the corruption causes a crash. > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package > > that demonstrates this: https://github.com/gaborcsardi/redfish > > > > The R callback in this package calls `loadNamespace("Matrix")`, but > > the same crash happens for other packages as well, and sometimes it > > also happens if I don't load any packages at all. (But that example > > was much more complicated, so I went with the package loading.) > > > > It is somewhat random, and sometimes turning off the JIT avoids the > > crash, but not always. > > > > Hopefully I am just doing something wrong in the ALTREP code (see > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it > > is not actually a bug. > > > > Thanks, > > Gabor > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics andFax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Possible ALTREP bug
Thank you Luke, that makes a lot of sense, Gabor On Fri, May 28, 2021 at 3:41 PM wrote: > > integer and real Elt methods are not expected to allocate. You would > have to suspend GC to be able to do that. This currently can't be done > from package code. > > Best, > > luke > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > I have found some weird SEXP corruption behavior with ALTREP, which > > could be a bug. (Or I could be doing something wrong.) > > > > I have an integer ALTREP vector that calls back to R from the Elt > > method. When this vector is indexed in a lapply(), its first element > > gets corrupted. Sometimes it's just a type change to logical, but > > sometimes the corruption causes a crash. > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package > > that demonstrates this: https://github.com/gaborcsardi/redfish > > > > The R callback in this package calls `loadNamespace("Matrix")`, but > > the same crash happens for other packages as well, and sometimes it > > also happens if I don't load any packages at all. (But that example > > was much more complicated, so I went with the package loading.) > > > > It is somewhat random, and sometimes turning off the JIT avoids the > > crash, but not always. > > > > Hopefully I am just doing something wrong in the ALTREP code (see > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it > > is not actually a bug. > > > > Thanks, > > Gabor > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics andFax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Possible ALTREP bug
integer and real Elt methods are not expected to allocate. You would have to suspend GC to be able to do that. This currently can't be done from package code. Best, luke On Fri, 28 May 2021, Gábor Csárdi wrote: I have found some weird SEXP corruption behavior with ALTREP, which could be a bug. (Or I could be doing something wrong.) I have an integer ALTREP vector that calls back to R from the Elt method. When this vector is indexed in a lapply(), its first element gets corrupted. Sometimes it's just a type change to logical, but sometimes the corruption causes a crash. I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package that demonstrates this: https://github.com/gaborcsardi/redfish The R callback in this package calls `loadNamespace("Matrix")`, but the same crash happens for other packages as well, and sometimes it also happens if I don't load any packages at all. (But that example was much more complicated, so I went with the package loading.) It is somewhat random, and sometimes turning off the JIT avoids the crash, but not always. Hopefully I am just doing something wrong in the ALTREP code (see https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it is not actually a bug. Thanks, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel