Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread Gabriel Becker
Hi Toby,

Just to be more concrete about why to avoid REAL and about the iteration
macros Luke mentioned.

The ITERATE_BY_REGION* macros in include/R_ext/Itermacros.h are built on
top of *_GET_REGION rather than REAL/INTEGER. The key difference here is
that REAL/INTEGER  go down to the Dataptr method of the altclass, will
generally "explode" the ALTREP, which returns a writable data pointer, thus
invalidating the metadata, effectively turning it into a non-altrep form
while Get_region method provides a way to return a contiguous region of the
data in a way that (can and generally does) leave the ALTREPness intact.

We can see the difference by looking at the two respective methods for the
compact sequences ALTREP classes that luke wrote that ship with R:

static void *compact_intseq_Dataptr(SEXP x, Rboolean writeable)
{

if (COMPACT_SEQ_EXPANDED(x) == R_NilValue) {

/* no need to re-run if expanded data exists */

PROTECT(x);

SEXP info = COMPACT_SEQ_INFO(x);

R_xlen_t n = COMPACT_INTSEQ_INFO_LENGTH(info);

int n1 = COMPACT_INTSEQ_INFO_FIRST(info);

int inc = COMPACT_INTSEQ_INFO_INCR(info);

* SEXP val = allocVector(INTSXP, n);*

int *data = INTEGER(val);


if (inc == 1) {

   /* compact sequences n1 : n2 with n1 <= n2 */

*for (R_xlen_t i = 0; i < n; i++)*

*   data[i] = (int) (n1 + i);*

}

else if (inc == -1) {

   /* compact sequences n1 : n2 with n1 > n2 */

   for (R_xlen_t i = 0; i < n; i++)

data[i] = (int) (n1 - i);

}

else

   error("compact sequences with increment %d not supported yet", inc);


* SET_COMPACT_SEQ_EXPANDED(x, val);*
UNPROTECT(1);

}
*return DATAPTR(COMPACT_SEQ_EXPANDED(x));*
}

So the above sets the "Expanded" altrep data to a SEXP that holds a normal
SEXP with all the data if it isn't there already, and then returns DATAPTR
of that.

Get_region, on the other hand, looks like this:

static R_xlen_t
compact_intseq_Get_region(SEXP sx, R_xlen_t i, R_xlen_t n,* int *buf*)
{
/* should not get here if x is already expanded */
CHECK_NOT_EXPANDED(sx);

SEXP info = COMPACT_SEQ_INFO(sx);
R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
int inc = COMPACT_INTSEQ_INFO_INCR(info);

R_xlen_t ncopy = size - i > n ? n : size - i;
if (inc == 1) {


* for (R_xlen_t k = 0; k < ncopy; k++)**buf[k] = (int) (n1 + k + i);*
return ncopy;

}
else if (inc == -1) {

for (R_xlen_t k = 0; k < ncopy; k++)
   buf[k] = (int) (n1 - k - i);
return ncopy;

}
else

error("compact sequences with increment %d not supported yet", inc);

}

Here we are passing a buffer to the method, and that buffer gets populated
with the data. No new SEXP, no expanding the vector.

ITERATE_BY_REGION wraps that nicely so you can loop over the whole thing,
but the data is grabbed chunkwise, no huge allocation, no expanding of the
altrep.

One of the examples from summary.c is rsum, which looks like:

static Rboolean rsum(SEXP sx, double *value, Rboolean narm)
{
LDOUBLE s = 0.0;
Rboolean updated = FALSE;


*   ITERATE_BY_REGION(sx, x, i, nbatch, double, REAL, {*

*for (R_xlen_t k = 0; k < nbatch; k++) {*

* if (!narm || !ISNAN(x[k])) {*

*if(!updated) updated = TRUE;*

*s += x[k];*

* }*

*}*
* });*

if(s > DBL_MAX) *value = R_PosInf;
else if (s < -DBL_MAX) *value = R_NegInf;
else *value = (double) s;

return updated;
}

sx is the SEXP input, x is the chosen name of a variable, declared in the
scope of the macro, that holds the current batch of data that can be used
within the bracketted expression that is the macro's last argument. nbatch
is the name of a variable which will contain how many elements of data the
current batch has in it,  double is the raw data type (used in declaration
of x, in fact), REAL is the R-macro .."type" I guess, used internally. i is
the name chosen for another declared-within-the-macro variable which will
always contain the position in the overall vector corresponding to the
start of the current buffer.

Hope that helps.

Best,
~G

On Thu, Jun 17, 2021 at 11:32 AM  wrote:

> On Thu, 17 Jun 2021, Toby Hocking wrote:
>
> > Oliver, for clarification that section in writing R extensions mentions
> > VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was
> > looking for an explanation of all the *_ELT functions (which are
> apparently
> > new), not just VECTOR_ELT.
> > Thanks Simon that response was very helpful.
> > One more question: are there any circumstances in which one should use
> > REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used
> > interchangeably?
>
> For a single call it is better to use REAL_ELT(x, i) since it doesn't
> force allocating a possibly large object in order to get a pointer to
> its data with REAL(x).  If you are iterating over a whole object you
> may want to get data in chunks. There are iteration macros that
> help. Some examples are in src/main/summary.c.
>
> Best,
>
> luke
>
>

Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread luke-tierney

On Thu, 17 Jun 2021, Toby Hocking wrote:


Oliver, for clarification that section in writing R extensions mentions
VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was
looking for an explanation of all the *_ELT functions (which are apparently
new), not just VECTOR_ELT.
Thanks Simon that response was very helpful.
One more question: are there any circumstances in which one should use
REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used
interchangeably?


For a single call it is better to use REAL_ELT(x, i) since it doesn't
force allocating a possibly large object in order to get a pointer to
its data with REAL(x).  If you are iterating over a whole object you
may want to get data in chunks. There are iteration macros that
help. Some examples are in src/main/summary.c.

Best,

luke



On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek 
wrote:
  The usual quote applies: "use the source, Luke":

  $ grep _ELT *.h | sort
  Rdefines.h:#define SET_ELEMENT(x, i, val)     
   SET_VECTOR_ELT(x, i, val)
  Rinternals.h:   The function STRING_ELT is used as an argument
  to arrayAssign even
  Rinternals.h:#define VECTOR_ELT(x,i)    ((SEXP *) DATAPTR(x))[i]
  Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
  Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
  Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i,
  Rcomplex v);
  Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
  Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
  Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
  Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
  Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);

  So the indexing is with R_xlen_t and they return the value
  itself as one would expect.

  Cheers,
  Simon


  > On Jun 17, 2021, at 2:22 AM, Toby Hocking 
  wrote:
  >
  > By the way, where is the documentation for INTEGER_ELT,
  REAL_ELT, etc? I
  > looked in Writing R Extensions and R Internals but I did not
  see any
  > mention.
  > REAL_ELT is briefly mentioned on
  > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
  > Would it be possible to please add some mention of them to
  Writing R
  > Extensions?
  > - how many of these _ELT functions are there? INTEGER, REAL,
  ... ?
  > - in what version of R were they introduced?
  > - I guess input types are always SEXP and int?
  > - What are the output types for each?
  >
  > On Fri, May 28, 2021 at 5:16 PM 
  wrote:
  >
  >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly
  new it may
  >> be possible to check that places where they are used allow
  for them to
  >> allocate. I have fixed the one that got caught by Gabor's
  example, and
  >> a rchk run might be able to pick up others if rchk knows
  these could
  >> allocate. (I may also be forgetting other places where the
  _ELt
  >> methods are used.)  Fixing all call sites for REAL, INTEGER,
  etc, was
  >> never realistic so there GC has to be suspended during the
  method
  >> call, and that is done in the dispatch mechanism.
  >>
  >> The bigger problem is jumps from inside things that existing
  code
  >> assumes will not do that. Catching those jumps is possible
  but
  >> expensive; doing anything sensible if one is caught is really
  not
  >> possible.
  >>
  >> Best,
  >>
  >> luke
  >>
  >> On Fri, 28 May 2021, Gabriel Becker wrote:
  >>
  >>> Hi Jim et al,
  >>> Just to hopefully add a bit to what Luke already answered,
  from what I am
  >>> recalling looking back at that bioconductor thread Elt
  methods are used
  >> in
  >>> places where there are har

Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread Toby Hocking
Oliver, for clarification that section in writing R extensions mentions
VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was
looking for an explanation of all the *_ELT functions (which are apparently
new), not just VECTOR_ELT.
Thanks Simon that response was very helpful.
One more question: are there any circumstances in which one should use
REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used
interchangeably?

On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek 
wrote:

> The usual quote applies: "use the source, Luke":
>
> $ grep _ELT *.h | sort
> Rdefines.h:#define SET_ELEMENT(x, i, val)   SET_VECTOR_ELT(x, i, val)
> Rinternals.h:   The function STRING_ELT is used as an argument to
> arrayAssign even
> Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i]
> Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
> Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
> Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
> Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
> Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
> Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
> Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
> Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
> Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v);
> Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
> Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
> Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
> Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
> Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
> Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
> Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
> Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
> Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);
>
> So the indexing is with R_xlen_t and they return the value itself as one
> would expect.
>
> Cheers,
> Simon
>
>
> > On Jun 17, 2021, at 2:22 AM, Toby Hocking  wrote:
> >
> > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
> > looked in Writing R Extensions and R Internals but I did not see any
> > mention.
> > REAL_ELT is briefly mentioned on
> > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> > Would it be possible to please add some mention of them to Writing R
> > Extensions?
> > - how many of these _ELT functions are there? INTEGER, REAL, ... ?
> > - in what version of R were they introduced?
> > - I guess input types are always SEXP and int?
> > - What are the output types for each?
> >
> > On Fri, May 28, 2021 at 5:16 PM  wrote:
> >
> >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
> >> be possible to check that places where they are used allow for them to
> >> allocate. I have fixed the one that got caught by Gabor's example, and
> >> a rchk run might be able to pick up others if rchk knows these could
> >> allocate. (I may also be forgetting other places where the _ELt
> >> methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
> >> never realistic so there GC has to be suspended during the method
> >> call, and that is done in the dispatch mechanism.
> >>
> >> The bigger problem is jumps from inside things that existing code
> >> assumes will not do that. Catching those jumps is possible but
> >> expensive; doing anything sensible if one is caught is really not
> >> possible.
> >>
> >> Best,
> >>
> >> luke
> >>
> >> On Fri, 28 May 2021, Gabriel Becker wrote:
> >>
> >>> Hi Jim et al,
> >>> Just to hopefully add a bit to what Luke already answered, from what I
> am
> >>> recalling looking back at that bioconductor thread Elt methods are used
> >> in
> >>> places where there are hard implicit assumptions that no garbage
> >> collection
> >>> will occur (ie they are called on things that aren't PROTECTed), and
> >> beyond
> >>> that, in places where there are hard assumptions that no error
> (longjmp)
> >>> will occur. I could be wrong, but I don't know that suspending garbage
> >>> collection would protect from the second one. Ie it is possible that an
> >>> error *ever* being raised from R code that implements an elt method
> could
> >>> cause all hell to break loose.
> >>>
> >>> Luke or Tomas Kalibera would know more.
> >>>
> >>> I was disappointed that implementing ALTREPs in R code was not in the
> >> cards
> >>> (it was in my original proposal back in 2016 to the DSC) but I trust
> Luke
> >>> that there are important reasons we can't safely allow that.
> >>>

Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Simon Urbanek


The usual quote applies: "use the source, Luke":

$ grep _ELT *.h | sort
Rdefines.h:#define SET_ELEMENT(x, i, val)   SET_VECTOR_ELT(x, i, val)
Rinternals.h:   The function STRING_ELT is used as an argument to arrayAssign 
even
Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i]
Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v);
Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);

So the indexing is with R_xlen_t and they return the value itself as one would 
expect.

Cheers,
Simon


> On Jun 17, 2021, at 2:22 AM, Toby Hocking  wrote:
> 
> By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
> looked in Writing R Extensions and R Internals but I did not see any
> mention.
> REAL_ELT is briefly mentioned on
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> Would it be possible to please add some mention of them to Writing R
> Extensions?
> - how many of these _ELT functions are there? INTEGER, REAL, ... ?
> - in what version of R were they introduced?
> - I guess input types are always SEXP and int?
> - What are the output types for each?
> 
> On Fri, May 28, 2021 at 5:16 PM  wrote:
> 
>> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
>> be possible to check that places where they are used allow for them to
>> allocate. I have fixed the one that got caught by Gabor's example, and
>> a rchk run might be able to pick up others if rchk knows these could
>> allocate. (I may also be forgetting other places where the _ELt
>> methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
>> never realistic so there GC has to be suspended during the method
>> call, and that is done in the dispatch mechanism.
>> 
>> The bigger problem is jumps from inside things that existing code
>> assumes will not do that. Catching those jumps is possible but
>> expensive; doing anything sensible if one is caught is really not
>> possible.
>> 
>> Best,
>> 
>> luke
>> 
>> On Fri, 28 May 2021, Gabriel Becker wrote:
>> 
>>> Hi Jim et al,
>>> Just to hopefully add a bit to what Luke already answered, from what I am
>>> recalling looking back at that bioconductor thread Elt methods are used
>> in
>>> places where there are hard implicit assumptions that no garbage
>> collection
>>> will occur (ie they are called on things that aren't PROTECTed), and
>> beyond
>>> that, in places where there are hard assumptions that no error (longjmp)
>>> will occur. I could be wrong, but I don't know that suspending garbage
>>> collection would protect from the second one. Ie it is possible that an
>>> error *ever* being raised from R code that implements an elt method could
>>> cause all hell to break loose.
>>> 
>>> Luke or Tomas Kalibera would know more.
>>> 
>>> I was disappointed that implementing ALTREPs in R code was not in the
>> cards
>>> (it was in my original proposal back in 2016 to the DSC) but I trust Luke
>>> that there are important reasons we can't safely allow that.
>>> 
>>> Best,
>>> ~G
>>> 
>>> On Fri, May 28, 2021 at 8:31 AM Jim Hester 
>> wrote:
>>>  From reading the discussion on the Bioconductor issue tracker it
>>>  seems like
>>>  the reason the GC is not suspended for the non-string ALTREP Elt
>>>  methods is
>>>  primarily due to performance concerns.
>>> 
>>>  If this is the case perhaps an additional flag could be added to
>>>  the
>>>  `R_set_altrep_*()` functions so ALTREP authors could indicate if
>>>  GC should
>>>  be halted when that particular method is called for that
>>>  particular ALTREP
>>>  class.
>>> 
>>>  This would avoid the performance hit (other than a boolean
>>>  

Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Oliver Madsen
*_ELT accessor functions are described in "vector accessor functions" in
Writing R extensions.

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Vector-accessor-functions

On Wed, Jun 16, 2021 at 4:22 PM Toby Hocking  wrote:

> By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
> looked in Writing R Extensions and R Internals but I did not see any
> mention.
> REAL_ELT is briefly mentioned on
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> Would it be possible to please add some mention of them to Writing R
> Extensions?
> - how many of these _ELT functions are there? INTEGER, REAL, ... ?
> - in what version of R were they introduced?
> - I guess input types are always SEXP and int?
> - What are the output types for each?
>
> On Fri, May 28, 2021 at 5:16 PM  wrote:
>
> > Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
> > be possible to check that places where they are used allow for them to
> > allocate. I have fixed the one that got caught by Gabor's example, and
> > a rchk run might be able to pick up others if rchk knows these could
> > allocate. (I may also be forgetting other places where the _ELt
> > methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
> > never realistic so there GC has to be suspended during the method
> > call, and that is done in the dispatch mechanism.
> >
> > The bigger problem is jumps from inside things that existing code
> > assumes will not do that. Catching those jumps is possible but
> > expensive; doing anything sensible if one is caught is really not
> > possible.
> >
> > Best,
> >
> > luke
> >
> > On Fri, 28 May 2021, Gabriel Becker wrote:
> >
> > > Hi Jim et al,
> > > Just to hopefully add a bit to what Luke already answered, from what I
> am
> > > recalling looking back at that bioconductor thread Elt methods are used
> > in
> > > places where there are hard implicit assumptions that no garbage
> > collection
> > > will occur (ie they are called on things that aren't PROTECTed), and
> > beyond
> > > that, in places where there are hard assumptions that no error
> (longjmp)
> > > will occur. I could be wrong, but I don't know that suspending garbage
> > > collection would protect from the second one. Ie it is possible that an
> > > error *ever* being raised from R code that implements an elt method
> could
> > > cause all hell to break loose.
> > >
> > > Luke or Tomas Kalibera would know more.
> > >
> > > I was disappointed that implementing ALTREPs in R code was not in the
> > cards
> > > (it was in my original proposal back in 2016 to the DSC) but I trust
> Luke
> > > that there are important reasons we can't safely allow that.
> > >
> > > Best,
> > > ~G
> > >
> > > On Fri, May 28, 2021 at 8:31 AM Jim Hester 
> > wrote:
> > >   From reading the discussion on the Bioconductor issue tracker it
> > >   seems like
> > >   the reason the GC is not suspended for the non-string ALTREP Elt
> > >   methods is
> > >   primarily due to performance concerns.
> > >
> > >   If this is the case perhaps an additional flag could be added to
> > >   the
> > >   `R_set_altrep_*()` functions so ALTREP authors could indicate if
> > >   GC should
> > >   be halted when that particular method is called for that
> > >   particular ALTREP
> > >   class.
> > >
> > >   This would avoid the performance hit (other than a boolean
> > >   check) for the
> > >   standard case when no allocations are expected, but allow
> > >   authors to
> > >   indicate that R should pause GC if needed for methods in their
> > >   class.
> > >
> > >   On Fri, May 28, 2021 at 9:42 AM  wrote:
> > >
> > >   > integer and real Elt methods are not expected to allocate. You
> > >   would
> > >   > have to suspend GC to be able to do that. This currently can't
> > >   be done
> > >   > from package code.
> > >   >
> > >   > Best,
> > >   >
> > >   > luke
> > >   >
> > >   > On Fri, 28 May 2021, Gábor Csárdi wrote:
> > >   >
> > >   > > I have found some weird SEXP corruption behavior with
> > >   ALTREP, which
> > >   > > could be a bug. (Or I could be doing something wrong.)
> > >   > >
> > >   > > I have an integer ALTREP vector that calls back to R from
> > >   the Elt
> > >   > > method. When this vector is indexed in a lapply(), its first
> > >   element
> > >   > > gets corrupted. Sometimes it's just a type change to
> > >   logical, but
> > >   > > sometimes the corruption causes a crash.
> > >   > >
> > >   > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
> > >   package
> > >   > > that demonstrates this:
> > >   https://github.com/gaborcsardi/redfish
> > >   > >
> > >   > > The R callback in this package calls
> > >   `loadNamespace("Matrix")`, but
> > >   > > the same crash happens for other packages as well,

Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Toby Hocking
By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
looked in Writing R Extensions and R Internals but I did not see any
mention.
REAL_ELT is briefly mentioned on
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
Would it be possible to please add some mention of them to Writing R
Extensions?
- how many of these _ELT functions are there? INTEGER, REAL, ... ?
- in what version of R were they introduced?
- I guess input types are always SEXP and int?
- What are the output types for each?

On Fri, May 28, 2021 at 5:16 PM  wrote:

> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
> be possible to check that places where they are used allow for them to
> allocate. I have fixed the one that got caught by Gabor's example, and
> a rchk run might be able to pick up others if rchk knows these could
> allocate. (I may also be forgetting other places where the _ELt
> methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
> never realistic so there GC has to be suspended during the method
> call, and that is done in the dispatch mechanism.
>
> The bigger problem is jumps from inside things that existing code
> assumes will not do that. Catching those jumps is possible but
> expensive; doing anything sensible if one is caught is really not
> possible.
>
> Best,
>
> luke
>
> On Fri, 28 May 2021, Gabriel Becker wrote:
>
> > Hi Jim et al,
> > Just to hopefully add a bit to what Luke already answered, from what I am
> > recalling looking back at that bioconductor thread Elt methods are used
> in
> > places where there are hard implicit assumptions that no garbage
> collection
> > will occur (ie they are called on things that aren't PROTECTed), and
> beyond
> > that, in places where there are hard assumptions that no error (longjmp)
> > will occur. I could be wrong, but I don't know that suspending garbage
> > collection would protect from the second one. Ie it is possible that an
> > error *ever* being raised from R code that implements an elt method could
> > cause all hell to break loose.
> >
> > Luke or Tomas Kalibera would know more.
> >
> > I was disappointed that implementing ALTREPs in R code was not in the
> cards
> > (it was in my original proposal back in 2016 to the DSC) but I trust Luke
> > that there are important reasons we can't safely allow that.
> >
> > Best,
> > ~G
> >
> > On Fri, May 28, 2021 at 8:31 AM Jim Hester 
> wrote:
> >   From reading the discussion on the Bioconductor issue tracker it
> >   seems like
> >   the reason the GC is not suspended for the non-string ALTREP Elt
> >   methods is
> >   primarily due to performance concerns.
> >
> >   If this is the case perhaps an additional flag could be added to
> >   the
> >   `R_set_altrep_*()` functions so ALTREP authors could indicate if
> >   GC should
> >   be halted when that particular method is called for that
> >   particular ALTREP
> >   class.
> >
> >   This would avoid the performance hit (other than a boolean
> >   check) for the
> >   standard case when no allocations are expected, but allow
> >   authors to
> >   indicate that R should pause GC if needed for methods in their
> >   class.
> >
> >   On Fri, May 28, 2021 at 9:42 AM  wrote:
> >
> >   > integer and real Elt methods are not expected to allocate. You
> >   would
> >   > have to suspend GC to be able to do that. This currently can't
> >   be done
> >   > from package code.
> >   >
> >   > Best,
> >   >
> >   > luke
> >   >
> >   > On Fri, 28 May 2021, Gábor Csárdi wrote:
> >   >
> >   > > I have found some weird SEXP corruption behavior with
> >   ALTREP, which
> >   > > could be a bug. (Or I could be doing something wrong.)
> >   > >
> >   > > I have an integer ALTREP vector that calls back to R from
> >   the Elt
> >   > > method. When this vector is indexed in a lapply(), its first
> >   element
> >   > > gets corrupted. Sometimes it's just a type change to
> >   logical, but
> >   > > sometimes the corruption causes a crash.
> >   > >
> >   > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
> >   package
> >   > > that demonstrates this:
> >   https://github.com/gaborcsardi/redfish
> >   > >
> >   > > The R callback in this package calls
> >   `loadNamespace("Matrix")`, but
> >   > > the same crash happens for other packages as well, and
> >   sometimes it
> >   > > also happens if I don't load any packages at all. (But that
> >   example
> >   > > was much more complicated, so I went with the package
> >   loading.)
> >   > >
> >   > > It is somewhat random, and sometimes turning off the JIT
> >   avoids the
> >   > > crash, but not always.
> >   > >
> >   > > Hopefully I am just doing something wrong in the ALTREP code
> >   (see
> >   > >
> >  

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney

Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
be possible to check that places where they are used allow for them to
allocate. I have fixed the one that got caught by Gabor's example, and
a rchk run might be able to pick up others if rchk knows these could
allocate. (I may also be forgetting other places where the _ELt
methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
never realistic so there GC has to be suspended during the method
call, and that is done in the dispatch mechanism.

The bigger problem is jumps from inside things that existing code
assumes will not do that. Catching those jumps is possible but
expensive; doing anything sensible if one is caught is really not
possible.

Best,

luke

On Fri, 28 May 2021, Gabriel Becker wrote:


Hi Jim et al,
Just to hopefully add a bit to what Luke already answered, from what I am
recalling looking back at that bioconductor thread Elt methods are used in
places where there are hard implicit assumptions that no garbage collection
will occur (ie they are called on things that aren't PROTECTed), and beyond
that, in places where there are hard assumptions that no error (longjmp)
will occur. I could be wrong, but I don't know that suspending garbage
collection would protect from the second one. Ie it is possible that an
error *ever* being raised from R code that implements an elt method could
cause all hell to break loose.

Luke or Tomas Kalibera would know more.

I was disappointed that implementing ALTREPs in R code was not in the cards
(it was in my original proposal back in 2016 to the DSC) but I trust Luke
that there are important reasons we can't safely allow that.

Best,
~G

On Fri, May 28, 2021 at 8:31 AM Jim Hester  wrote:
  From reading the discussion on the Bioconductor issue tracker it
  seems like
  the reason the GC is not suspended for the non-string ALTREP Elt
  methods is
  primarily due to performance concerns.

  If this is the case perhaps an additional flag could be added to
  the
  `R_set_altrep_*()` functions so ALTREP authors could indicate if
  GC should
  be halted when that particular method is called for that
  particular ALTREP
  class.

  This would avoid the performance hit (other than a boolean
  check) for the
  standard case when no allocations are expected, but allow
  authors to
  indicate that R should pause GC if needed for methods in their
  class.

  On Fri, May 28, 2021 at 9:42 AM  wrote:

  > integer and real Elt methods are not expected to allocate. You
  would
  > have to suspend GC to be able to do that. This currently can't
  be done
  > from package code.
  >
  > Best,
  >
  > luke
  >
  > On Fri, 28 May 2021, Gábor Csárdi wrote:
  >
  > > I have found some weird SEXP corruption behavior with
  ALTREP, which
  > > could be a bug. (Or I could be doing something wrong.)
  > >
  > > I have an integer ALTREP vector that calls back to R from
  the Elt
  > > method. When this vector is indexed in a lapply(), its first
  element
  > > gets corrupted. Sometimes it's just a type change to
  logical, but
  > > sometimes the corruption causes a crash.
  > >
  > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
  package
  > > that demonstrates this:
  https://github.com/gaborcsardi/redfish
  > >
  > > The R callback in this package calls
  `loadNamespace("Matrix")`, but
  > > the same crash happens for other packages as well, and
  sometimes it
  > > also happens if I don't load any packages at all. (But that
  example
  > > was much more complicated, so I went with the package
  loading.)
  > >
  > > It is somewhat random, and sometimes turning off the JIT
  avoids the
  > > crash, but not always.
  > >
  > > Hopefully I am just doing something wrong in the ALTREP code
  (see
  > >
  https://github.com/gaborcsardi/redfish/blob/main/src/test.c),
  and it
  > > is not actually a bug.
  > >
  > > Thanks,
  > > Gabor
  > >
  > > __
  > > R-devel@r-project.org mailing list
  > > https://stat.ethz.ch/mailman/listinfo/r-devel
  > >
  >
  > --
  > Luke Tierney
  > Ralph E. Wareham Professor of Mathematical Sciences
  > University of Iowa                  Phone:           
   319-335-3386
  > Department of Statistics and        Fax:             
   319-335-3017
  >     Actuarial Science
  > 241 Schaeffer Hall                  email: 
   luke-tier...@uiowa.edu
  > Iowa City, IA 52242                 WWW: 
  http://www.stat.uiowa.edu
  > __
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel
  >

 

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread Gabriel Becker
Hi Jim et al,

Just to hopefully add a bit to what Luke already answered, from what I am
recalling looking back at that bioconductor thread Elt methods are used in
places where there are hard implicit assumptions that no garbage collection
will occur (ie they are called on things that aren't PROTECTed), and beyond
that, in places where there are hard assumptions that no error (longjmp)
will occur. I could be wrong, but I don't know that suspending garbage
collection would protect from the second one. Ie it is possible that an
error *ever* being raised from R code that implements an elt method could
cause all hell to break loose.

Luke or Tomas Kalibera would know more.

I was disappointed that implementing ALTREPs in R code was not in the cards
(it was in my original proposal back in 2016 to the DSC) but I trust Luke
that there are important reasons we can't safely allow that.

Best,
~G

On Fri, May 28, 2021 at 8:31 AM Jim Hester  wrote:

> From reading the discussion on the Bioconductor issue tracker it seems like
> the reason the GC is not suspended for the non-string ALTREP Elt methods is
> primarily due to performance concerns.
>
> If this is the case perhaps an additional flag could be added to the
> `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should
> be halted when that particular method is called for that particular ALTREP
> class.
>
> This would avoid the performance hit (other than a boolean check) for the
> standard case when no allocations are expected, but allow authors to
> indicate that R should pause GC if needed for methods in their class.
>
> On Fri, May 28, 2021 at 9:42 AM  wrote:
>
> > integer and real Elt methods are not expected to allocate. You would
> > have to suspend GC to be able to do that. This currently can't be done
> > from package code.
> >
> > Best,
> >
> > luke
> >
> > On Fri, 28 May 2021, Gábor Csárdi wrote:
> >
> > > I have found some weird SEXP corruption behavior with ALTREP, which
> > > could be a bug. (Or I could be doing something wrong.)
> > >
> > > I have an integer ALTREP vector that calls back to R from the Elt
> > > method. When this vector is indexed in a lapply(), its first element
> > > gets corrupted. Sometimes it's just a type change to logical, but
> > > sometimes the corruption causes a crash.
> > >
> > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
> > > that demonstrates this: https://github.com/gaborcsardi/redfish
> > >
> > > The R callback in this package calls `loadNamespace("Matrix")`, but
> > > the same crash happens for other packages as well, and sometimes it
> > > also happens if I don't load any packages at all. (But that example
> > > was much more complicated, so I went with the package loading.)
> > >
> > > It is somewhat random, and sometimes turning off the JIT avoids the
> > > crash, but not always.
> > >
> > > Hopefully I am just doing something wrong in the ALTREP code (see
> > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
> > > is not actually a bug.
> > >
> > > Thanks,
> > > Gabor
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > --
> > Luke Tierney
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa  Phone: 319-335-3386
> > Department of Statistics andFax:   319-335-3017
> > Actuarial Science
> > 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> > Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread Jim Hester
>From reading the discussion on the Bioconductor issue tracker it seems like
the reason the GC is not suspended for the non-string ALTREP Elt methods is
primarily due to performance concerns.

If this is the case perhaps an additional flag could be added to the
`R_set_altrep_*()` functions so ALTREP authors could indicate if GC should
be halted when that particular method is called for that particular ALTREP
class.

This would avoid the performance hit (other than a boolean check) for the
standard case when no allocations are expected, but allow authors to
indicate that R should pause GC if needed for methods in their class.

On Fri, May 28, 2021 at 9:42 AM  wrote:

> integer and real Elt methods are not expected to allocate. You would
> have to suspend GC to be able to do that. This currently can't be done
> from package code.
>
> Best,
>
> luke
>
> On Fri, 28 May 2021, Gábor Csárdi wrote:
>
> > I have found some weird SEXP corruption behavior with ALTREP, which
> > could be a bug. (Or I could be doing something wrong.)
> >
> > I have an integer ALTREP vector that calls back to R from the Elt
> > method. When this vector is indexed in a lapply(), its first element
> > gets corrupted. Sometimes it's just a type change to logical, but
> > sometimes the corruption causes a crash.
> >
> > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
> > that demonstrates this: https://github.com/gaborcsardi/redfish
> >
> > The R callback in this package calls `loadNamespace("Matrix")`, but
> > the same crash happens for other packages as well, and sometimes it
> > also happens if I don't load any packages at all. (But that example
> > was much more complicated, so I went with the package loading.)
> >
> > It is somewhat random, and sometimes turning off the JIT avoids the
> > crash, but not always.
> >
> > Hopefully I am just doing something wrong in the ALTREP code (see
> > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
> > is not actually a bug.
> >
> > Thanks,
> > Gabor
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread Gábor Csárdi
Thank you Luke, that makes a lot of sense,
Gabor

On Fri, May 28, 2021 at 3:41 PM  wrote:
>
> integer and real Elt methods are not expected to allocate. You would
> have to suspend GC to be able to do that. This currently can't be done
> from package code.
>
> Best,
>
> luke
>
> On Fri, 28 May 2021, Gábor Csárdi wrote:
>
> > I have found some weird SEXP corruption behavior with ALTREP, which
> > could be a bug. (Or I could be doing something wrong.)
> >
> > I have an integer ALTREP vector that calls back to R from the Elt
> > method. When this vector is indexed in a lapply(), its first element
> > gets corrupted. Sometimes it's just a type change to logical, but
> > sometimes the corruption causes a crash.
> >
> > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
> > that demonstrates this: https://github.com/gaborcsardi/redfish
> >
> > The R callback in this package calls `loadNamespace("Matrix")`, but
> > the same crash happens for other packages as well, and sometimes it
> > also happens if I don't load any packages at all. (But that example
> > was much more complicated, so I went with the package loading.)
> >
> > It is somewhat random, and sometimes turning off the JIT avoids the
> > crash, but not always.
> >
> > Hopefully I am just doing something wrong in the ALTREP code (see
> > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
> > is not actually a bug.
> >
> > Thanks,
> > Gabor
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney

integer and real Elt methods are not expected to allocate. You would
have to suspend GC to be able to do that. This currently can't be done
from package code.

Best,

luke

On Fri, 28 May 2021, Gábor Csárdi wrote:


I have found some weird SEXP corruption behavior with ALTREP, which
could be a bug. (Or I could be doing something wrong.)

I have an integer ALTREP vector that calls back to R from the Elt
method. When this vector is indexed in a lapply(), its first element
gets corrupted. Sometimes it's just a type change to logical, but
sometimes the corruption causes a crash.

I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
that demonstrates this: https://github.com/gaborcsardi/redfish

The R callback in this package calls `loadNamespace("Matrix")`, but
the same crash happens for other packages as well, and sometimes it
also happens if I don't load any packages at all. (But that example
was much more complicated, so I went with the package loading.)

It is somewhat random, and sometimes turning off the JIT avoids the
crash, but not always.

Hopefully I am just doing something wrong in the ALTREP code (see
https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
is not actually a bug.

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel