Re: [Rd] [External] Re: Patches for CVE-2024-27322

2024-04-30 Thread Tierney, Luke via R-devel
That should do it

Sent from my iPad

On Apr 30, 2024, at 9:57 AM, Iñaki Ucar  wrote:


Many thanks both. I'll wait for Luke's confirmation to trigger the update with 
the backported fix.

Iñaki

On Tue, 30 Apr 2024 at 12:42, Dirk Eddelbuettel 
mailto:e...@debian.org>> wrote:

On 30 April 2024 at 11:59, peter dalgaard wrote:
| svn diff -c 86235 ~/r-devel/R

Which is also available as
  
https://github.com/r-devel/r-svn/commit/f7c46500f455eb4edfc3656c3fa20af61b16abb7

Dirk

| (or 86238 for the port to the release branch) should be easily backported.
|
| (CC Luke in case there is more to it)
|
| - pd
|
| > On 30 Apr 2024, at 11:28 , Iñaki Ucar 
mailto:iu...@fedoraproject.org>> wrote:
| >
| > Dear R-core,
| >
| > I just received notification of CVE-2024-27322 [1] in RedHat's Bugzilla. We
| > updated R to v4.4.0 in Fedora rawhide, F40, EPEL9 and EPEL8, so no problem
| > there. However, F38 and F39 will stay at v4.3.3, and I was wondering if
| > there's a specific patch available, or if you could point me to the commits
| > that fixed the issue, so that we can cherry-pick them for F38 and F39.
| > Thanks.
| >
| > [1] https://nvd.nist.gov/vuln/detail/CVE-2024-27322
| >
| > Best,
| > --
| > Iñaki Úcar
| >
| > [[alternative HTML version deleted]]
| >
| > __
| > R-devel@r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
|
| --
| Peter Dalgaard, Professor,
| Center for Statistics, Copenhagen Business School
| Solbjerg Plads 3, 2000 Frederiksberg, Denmark
| Phone: (+45)38153501
| Office: A 4.23
| Email: pd@cbs.dk  Priv: 
pda...@gmail.com
|
| __
| R-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

--
dirk.eddelbuettel.com | @eddelbuettel | 
e...@debian.org


--
Iñaki Úcar

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-19 Thread Tierney, Luke
R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:

> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max  : int 2147483647
>  $ sizeof.long  : int 8
>  $ sizeof.longlong  : int 8
>  $ sizeof.longdouble    : int 16
>  $ sizeof.pointer   : int 8
>
>
>   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.

>   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke


>
>
>   Spencer
>
>
> On 2020-01-19 15:41, Avraham Adler wrote:
>> Floor (maybe round) of non-negative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> mailto:spencer.gra...@prodsyse.com>> wrote:
>>
>>
>>
>> On 2020-01-19 13:01, Avraham Adler wrote:
>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>> sum, can you break your “big” simulation into the sum of a few
>>> smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>>   I don't perceive that as feasible.  Once I found what was
>> generating NAs, it was easy to code a function to return
>> pseudo-random numbers using the standard normal approximation to
>> the Poisson for those extreme cases.  [For a Poisson with mean =
>> 1e6, for example, the skewness (third standardized moment) is
>> 0.001.  At least for my purposes, that should be adequate.][1]
>>
>>
>>   What are the negative consequences of having rpois return
>> numerics that are always nonnegative?
>>
>>
>>   Spencer
>>
>>
>> [1]  In the code I reported before, I just changed the threshold
>> of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>> that a Poisson distributed pseudo-random number just under that
>> would have to be over 23000 standard deviations above the mean to
>> exceed .Machine$integer.max.
>>
>>>
>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>> >> > wrote:
>>>
>>>   This issue arose for me in simulations to estimate
>>> confidence, prediction, and tolerance intervals from glm(.,
>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>> simulate.bic.glm function I added to the development version
>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun;
>>> . This is part of a
>>> vignette I'm developing, available at
>>> 
>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd;
>>> 
>>> .
>>> This includes a simulated mean of a mixture of Poissons that
>>> exceeds 2e22.  It doesn't seem unreasonable to me to have
>>> rpois output a numerics rather than integers when a number
>>> simulated exceeds .Machine$integer.max.  And it does seem to
>>> make less sense in such cases to return NAs.
>>>
>>>
>>>    Alternatively, might it make sense to add another
>>> argument to rpois to give the user the choice?  E.g., an
>>> argument "bigOutput" with (I hope) default = "numeric" and
>>> "NA" as a second option.  Or NA is the default, so no code
>>> that relied that feature of the current code would be broken
>>> by the change.  If someone wanted to use arbitrary precision
>>> arithmetic, they could write their own version of this
>>> function with "arbitraryPrecision" as an optional value for
>>> the "bigOutput" argument.
>>>
>>>
>>>   Comments?
>>>   Thanks,
>>>   Spencer Graves
>>>
>>>
>>>
>>> On 2020-01-19 10:28, Avraham Adler wrote:
 

Re: [Rd] switch to reference counting in R-devel

2019-12-03 Thread Tierney, Luke
R-devel has been switched to use reference counting by default with
r77508. Building with -DSWITCH_TO_NAMED goes back to the NAMED
mechanism.

Best,

luke

On Sun, 24 Nov 2019, luke-tier...@uiowa.edu wrote:

> Baring any unforeseen issues R-devel will switch in about a week from
> the NAMED mechanism to reference counting for determining when objects
> can be safely mutated in base C code. This is expected to have minimal
> impact on packages not using unsupported coding practices in their C
> code.
>
>
> The transition to reference counting has been in progress for a
> number of years. Some older notes on this are available at
> http://developer.r-project.org/Refcnt.html.  These may no longer be
> completely accurate but should give you an idea of what is going on.
>
> If you want to test your package under reference counting you can do
> so by building R with -DSWITCH_TO_REFCNT added to CFLAGS or DEFS in a
> config.site file.
>
> A small number of packages are still using the NAMED or SET_NAMED
> functions even though this has been discouraged for some  time.
> For now these will not produce errors but also not do anything useful.
> They will probably be removed before R 4.0.0 is released, so you
> should look at why you are using them and adjust accordingly.
>
> Best,
>
> luke
>
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] switch to reference counting in R-devel

2019-11-24 Thread Tierney, Luke
Baring any unforeseen issues R-devel will switch in about a week from
the NAMED mechanism to reference counting for determining when objects
can be safely mutated in base C code. This is expected to have minimal
impact on packages not using unsupported coding practices in their C
code.


The transition to reference counting has been in progress for a
number of years. Some older notes on this are available at
http://developer.r-project.org/Refcnt.html.  These may no longer be
completely accurate but should give you an idea of what is going on.

If you want to test your package under reference counting you can do
so by building R with -DSWITCH_TO_REFCNT added to CFLAGS or DEFS in a
config.site file.

A small number of packages are still using the NAMED or SET_NAMED
functions even though this has been discouraged for some  time.
For now these will not produce errors but also not do anything useful.
They will probably be removed before R 4.0.0 is released, so you
should look at why you are using them and adjust accordingly.

Best,

luke


-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] R C api for 'inherits' S3 and S4 objects

2019-11-01 Thread Tierney, Luke
On Fri, 1 Nov 2019, Jan Gorecki wrote:

> Thank you Luke.
> That is why I don't use Rf_inherits but INHERITS which does not
> allocate, provided in the email body.

Your definition can allocate because STING_ELT can allocate.
getAttrib can GC in general. Currently it would not GC or allocate in
this case, but this could change.

You can't assume thread-safety for calls into the R API, or any API
for that matter, unless they are documented to be thread-safe.

You would be better off using Rf_inherits as it does not make the
assumption that you can use pointer comparisons to check for identical
strings.  CHARSXPs are almost always cached but they are not
guaranteed to be, and the caching strategy might change in the future.

Best,

luke

> I cannot do similarly for S4 classes, thus asking for some API for that.
>
> On Fri, Nov 1, 2019 at 5:56 PM Tierney, Luke  wrote:
>>
>> On Fri, 1 Nov 2019, Jan Gorecki wrote:
>>
>>> Dear R developers,
>>>
>>> Motivated by discussion about checking inheritance of S3 and S4
>>> objects (in head matrix/array topic) I would light to shed some light
>>> on a minor gap about that matter in R C API.
>>> Currently we are able to check inheritance for S3 class objects from C
>>> in a robust way (no allocation, thread safe). This is unfortunately
>>
>> Your premise is not correct. Rf_inherits will not GC but it can
>> allocate and is not thread safe.
>>
>> Best,
>>
>> luke
>>
>>> not possible for S4 classes. I would kindly request new function in R
>>> C api so it can be achieved for S4 classes with no risk of allocation.
>>> For reference mentioned functions below. Thank you.
>>> Jan Gorecki
>>>
>>> // S3 inheritance
>>> bool INHERITS(SEXP x, SEXP char_) {
>>>  SEXP klass;
>>>  if (isString(klass = getAttrib(x, R_ClassSymbol))) {
>>>for (int i=0; i>>  if (STRING_ELT(klass, i) == char_) return true;
>>>}
>>>  }
>>>  return false;
>>> }
>>> // S4 inheritance
>>> bool Rinherits(SEXP x, SEXP char_) {
>>>  SEXP vec = PROTECT(ScalarString(char_));
>>>  SEXP call = PROTECT(lang3(sym_inherits, x, vec));
>>>  bool ans = LOGICAL(eval(call, R_GlobalEnv))[0]==1;
>>>  UNPROTECT(2);
>>>  return ans;
>>> }
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa  Phone: 319-335-3386
>> Department of Statistics andFax:   319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
>> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] R C api for 'inherits' S3 and S4 objects

2019-11-01 Thread Tierney, Luke
On Fri, 1 Nov 2019, Jan Gorecki wrote:

> Dear R developers,
>
> Motivated by discussion about checking inheritance of S3 and S4
> objects (in head matrix/array topic) I would light to shed some light
> on a minor gap about that matter in R C API.
> Currently we are able to check inheritance for S3 class objects from C
> in a robust way (no allocation, thread safe). This is unfortunately

Your premise is not correct. Rf_inherits will not GC but it can
allocate and is not thread safe.

Best,

luke

> not possible for S4 classes. I would kindly request new function in R
> C api so it can be achieved for S4 classes with no risk of allocation.
> For reference mentioned functions below. Thank you.
> Jan Gorecki
>
> // S3 inheritance
> bool INHERITS(SEXP x, SEXP char_) {
>  SEXP klass;
>  if (isString(klass = getAttrib(x, R_ClassSymbol))) {
>for (int i=0; i  if (STRING_ELT(klass, i) == char_) return true;
>}
>  }
>  return false;
> }
> // S4 inheritance
> bool Rinherits(SEXP x, SEXP char_) {
>  SEXP vec = PROTECT(ScalarString(char_));
>  SEXP call = PROTECT(lang3(sym_inherits, x, vec));
>  bool ans = LOGICAL(eval(call, R_GlobalEnv))[0]==1;
>  UNPROTECT(2);
>  return ans;
> }
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: should base R have a piping operator ?

2019-10-07 Thread Tierney, Luke
Just for the record, and not that using return() calls like this is
necessarily a good idea, it is possible to make a nested-call-based
pipe that handles return() calls the way you want using delayedAssign.
I've added it to the end of the file on gitlab.

Time to move on to the stuff I've been avoiding ...

Best,

luke

On Mon, 7 Oct 2019, Tierney, Luke wrote:

> On Mon, 7 Oct 2019, Lionel Henry wrote:
>
>>
>>
>>> On 7 Oct 2019, at 17:04, Tierney, Luke  wrote:
>>>
>>>  Think about what happens if an
>>> argument in a pipe stage contains a pipe. (Not completely
>>> unreasonable, e.g. for a left_join).
>>
>> It should work exactly as it does in a local environment.
>>
>> ```
>> `%foo%` <- function(x, y) {
>>  env <- parent.frame()
>>
>>  # Use `:=` to avoid partial matching on .env/.frame
>>  rlang::scoped_bindings(. := x, .env = env)
>>
>>  eval(substitute(y), env)
>> }
>>
>> "A" %foo% {
>>  print(.)
>>  "B" %foo% print(.)
>>  print(.)
>> }
>> #> [1] "A"
>> #> [1] "B"
>> #> [1] "A"
>>
>> print(.)
>> #> Error in print(.) : object '.' not found
>>
>> ```
>>
>> The advantage is that side effects (such as assigning variables or calling
>> `return()`) will occur in the expected environment.
>
> You get the assignment behavior with the nested call approach. (Not
> that doing this is necessarily a good idea).
>
>> I don't see it causing
>> problems except in artificial cases. Am I missing something?
>
> Here is a stylized example:
>
> f <- function(x, y) {
> assign("xx", x, parent.frame())
> on.exit(rm(xx, envir = parent.frame()))
> y
> get("xx") + 1
> }
>
> ## This is fine:
>> f(1, 2)
> [1] 2
>
> ## This is not:
>> f(1, f(1, 2))
> Error in get("xx") : object 'xx' not found
>
> If you play these games whether you get the result you want, or an
> obvious error, or just the wrong answer depends on argument evaluation
> order and the like. You really don't want to go there. Not to mention
> that you would be telling users they are not allowed to use '.' as a
> variable name for their own purposes or you would be polluting their
> environment with some other artificial symbol that they would see in
> debugging. Just don't.
>
> Anything going in base needs to worry even about artificial cases.
> Yes, there are things in base that don't meet that standard. No, that
> is not a reason to add more.
>
>> I agree that restraining the pipe to a single placeholder (to avoid
>> double evaluation) would be a good design too.
>>
>> I can't access https://gitlab.com/luke-tierney/pipes, it appears to be 404.
>
> Should be able to get there now. Needed to change the visibility ---
> still learning my way around gitlab.
>
> Best,
>
> luke
>
>> Best,
>> Lionel
>>
>>
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: should base R have a piping operator ?

2019-10-07 Thread Tierney, Luke
Yes you can make my little example work by implementing dynamic
scope with a stack for saving/restoring binding values. Given R's
reflection capabilities and rm() with an envir argument that has its
own issues. If you want to try to get this right and maintain it in
your own packages that is up to you. I can't see the cost/benefit
calculation justifying having it in base.

Best,

luke

On Mon, 7 Oct 2019, Lionel Henry wrote:

> On 7 Oct 2019, at 18:17, Tierney, Luke  wrote:
>
>> Here is a stylized example:
>
> The previous value of the binding should only be restored if it
> existed:
>
> g <- function(x, y) {
>  rlang::scoped_bindings(xx = x, .env = parent.frame())
>  y
>  get("xx") + 10
> }
>
> # Good
> g(1, 2)
> #> [1] 11
>
> # Still good?
> g(1, g(1, 2))
> #> [1] 11
>
>
>> If you play these games whether you get the result you want, or an
>> obvious error, or just the wrong answer depends on argument evaluation
>> order and the like.
>
> I think the surprises are limited because the pattern has stack-like
> semantics. We get in a new context where `.` gains a new meaning, and
> when we exit the previous meaning is restored.
>
> One example where this could lead to unexpected behaviour is trying to
> capture the value of the placeholder in a closure:
>
> f <- function(x) {
>  x %>% {
>identity(function() .)
>  }
> }
>
> # This makes sense:
> f("A")()
> #> Error: object '.' not found
>
> # This doesn't:
> "B" %>% { f("A")() }
> #> [1] "B"
>
>
>> Not to mention that you would be telling users they are not allowed
>> to use '.' as a variable name for their own purposes or you would be
>> polluting their environment with some other artificial symbol that
>> they would see in debugging.
>
> That's a good point. Debugging allows to move up the call stack before
> the context is exited, so you'd see the last value of `.` in examples
> of nested pipes like `foo %>% bar( f %>% g() )`. That could be confusing.
>
>
>> Anything going in base needs to worry even about artificial cases.
>> Yes, there are things in base that don't meet that standard. No, that
>> is not a reason to add more.
>
> Agreed. What I meant by artificial cases is functions making
> questionable assumptions after peeking into foreign contexts etc.
>
> I'm worried about what happens with important language constructs like
> `<-` and `return()` when code is evaluated in a local context. That
> said, I think binding pipe values to `.` is more important than these
> particular semantics because the placeholder is an obvious binding to
> inspect while debug-stepping through a pipeline. So evaluating in a
> child is probably preferable to giving up the placeholder altogether.
>
> Best,
> Lionel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: should base R have a piping operator ?

2019-10-07 Thread Tierney, Luke
On Mon, 7 Oct 2019, Lionel Henry wrote:

>
>
>> On 7 Oct 2019, at 17:04, Tierney, Luke  wrote:
>>
>>  Think about what happens if an
>> argument in a pipe stage contains a pipe. (Not completely
>> unreasonable, e.g. for a left_join).
>
> It should work exactly as it does in a local environment.
>
> ```
> `%foo%` <- function(x, y) {
>  env <- parent.frame()
>
>  # Use `:=` to avoid partial matching on .env/.frame
>  rlang::scoped_bindings(. := x, .env = env)
>
>  eval(substitute(y), env)
> }
>
> "A" %foo% {
>  print(.)
>  "B" %foo% print(.)
>  print(.)
> }
> #> [1] "A"
> #> [1] "B"
> #> [1] "A"
>
> print(.)
> #> Error in print(.) : object '.' not found
>
> ```
>
> The advantage is that side effects (such as assigning variables or calling
> `return()`) will occur in the expected environment.

You get the assignment behavior with the nested call approach. (Not
that doing this is necessarily a good idea).

> I don't see it causing
> problems except in artificial cases. Am I missing something?

Here is a stylized example:

f <- function(x, y) {
 assign("xx", x, parent.frame())
 on.exit(rm(xx, envir = parent.frame()))
 y
 get("xx") + 1
}

## This is fine:
> f(1, 2) 
[1] 2

## This is not:
> f(1, f(1, 2))
Error in get("xx") : object 'xx' not found

If you play these games whether you get the result you want, or an
obvious error, or just the wrong answer depends on argument evaluation
order and the like. You really don't want to go there. Not to mention
that you would be telling users they are not allowed to use '.' as a
variable name for their own purposes or you would be polluting their
environment with some other artificial symbol that they would see in
debugging. Just don't.

Anything going in base needs to worry even about artificial cases.
Yes, there are things in base that don't meet that standard. No, that
is not a reason to add more.

> I agree that restraining the pipe to a single placeholder (to avoid
> double evaluation) would be a good design too.
>
> I can't access https://gitlab.com/luke-tierney/pipes, it appears to be 404.

Should be able to get there now. Needed to change the visibility ---
still learning my way around gitlab.

Best,

luke

> Best,
> Lionel
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: should base R have a piping operator ?

2019-10-07 Thread Tierney, Luke
On Mon, 7 Oct 2019, Lionel Henry wrote:

> Hi Gabe,
>
>> There is another way the pipe could go into base R that could not be
>> done in package space and has the potential to mitigate some pretty
>> serious downsides to the pipes relating to debugging
>
> I assume you're thinking about the large stack trace of the magrittr
> pipe? You don't need a parser transformation to solve this problem
> though, the pipe could be implemented as a regular function with a
> very limited impact on the stack. And if implemented as a SPECIALSXP,
> it would be completely invisible. We've been planning to rewrite %>%
> to fix the performance and the stack print, it's just low priority.
>
> About the semantics of local evaluation that were proposed in this
> thread, I think that wouldn't be right. A native pipe should be
> consistent with other control flow constructs like `if` and `for` and
> evaluate in the current environment. In that case, the `.` binding, if
> any, would be restored to its original value in `on.exit()` (or through
> unwind-protection if implemented in C).
>

Sorry to be blunt but adding/removing a variable from a caller's
environment is a horrible design idea. Think about what happens if an
argument in a pipe stage contains a pipe. (Not completely
unreasonable, e.g. for a left_join). We already have such a design
lurking in (at least) one place in base code and it keeps biting. It's
pretty high on my list to be expunged.

If a variable is to be used it needs to be in its own
scope/environment.  There is another option, which is to rewrite the
pipe as a nested call and evaluate that in the parent frame. Not likely
to be much worse for debugging and might even be better.  Some
tinkering with these ideas is at

https://gitlab.com/luke-tierney/pipes

All that said, there is nothing that can be done with pipes that can't
be done without them. They may be the most visible aspect of the
tidyverse but they are also the least essential. I don't find them
useful, mostly because they make debugging harder and add to the
cognitive load of figuring out what is actually going on in the
evaluation process. So I don't use them in my work or my teaching (I
do mention them in teaching so students can understand them when they
see them). Many people clearly like them, and that's fine. But they
are not in any way, shape, or form essential.

I can't speak for all of R core on this, but this is how I look at the
question of inclusion in base: R core developer time is a (very)
scarce resource. Any part of that resource that is used to incorporate
and maintain in base something that can be implemented reasonably well
in a package is then not available for improving and maintaining parts
of R that have to be in base. There would need to be extremely strong
reasons for reallocating resources in this way and I just don't see
how that case can be made here.

It is certainly possible that thinking about pipes might suggest tome
useful low level primitives to add that would have to live in base and
might be useful in other contexts. Those might be worth considering.
[Some kind of 'exec()' or aving an 'exec()' or 'tailcall()' primitive
to reuse a call frame for example.]

Best,

luke

> Best,
> Lionel
>
>
>> On 6 Oct 2019, at 01:50, Gabriel Becker  wrote:
>>
>> Hi all,
>>
>> I think there's some nuance here that makes makes me agree partially with
>> each "side".
>>
>> The pipe is inarguably extremely popular. Many probably think of it as a
>> core feature of R, along with the tidyverse that (as was pointed out)
>> largely surrounds it and drives its popularity. Whether its a good or bad
>> thing that they think that doesn't change the fact that by my estimation
>> that Ant is correct that they do. BUT, I don't agree with him that that, by
>> itself, is a reason to put it in base R in the form that it exists now. For
>> the current form, there aren't really any major downsides that I see to
>> having people just use the package version.
>>
>> Sure it may be a little weird, but it doesn't ever really stop the
>> people from using it or present a significant barrier. Another major point
>> is that many (most?) base R functions are not necessarily tooled to be
>> endomorphic, which in my personal opinion is *largely* the only place that
>> the pipes are really compelling.
>>
>> That was for pipes as the exist in package space, though. There is another
>> way the pipe could go into base R that could not be done in package space
>> and has the potential to mitigate some pretty serious downsides to the
>> pipes relating to debugging, which would be to implement them in the parser.
>>
>> If
>>
>> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
>> filter(mean_sl > 5)
>>
>>
>> were *parsed* as, for example, into
>>
>> local({
>>. = group_by(iris, Species)
>>
>>._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>>
>>filter(., mean_sl > 5)
>>   })
>>
>>
>>
>>
>> Then 

Re: [Rd] [External] REprintf could be caught by tryCatch(message)

2019-09-15 Thread Tierney, Luke
You can file it as a wishlist item in the bug trackign system. Without
a compelling case or a complete and well tested patch or both I doubt
it will rise to the top of anyone's priority list.

Best,

luke

On Sun, 15 Sep 2019, Jan Gorecki wrote:

> Thank you Luke for prompt reply.
> Is it possible then to request a new function to R C API "message"
> that would equivalent to R "message" function? Similarly as we now
> have C "warning" and C "error" functions.
>
> Best,
> Jan
>
> On Sun, Sep 15, 2019 at 5:25 PM Tierney, Luke  wrote:
>>
>> On Sun, 15 Sep 2019, Jan Gorecki wrote:
>>
>>> Dear R-devel community,
>>>
>>> There appears to be an inconsistency in R C API about the exceptions
>>> that can be raised from C code.
>>> Mapping of R C funs to corresponding R functions is as follows.
>>>
>>> error-> stop
>>> warning  -> warning
>>> REprintf -> message
>>
>> This is wrong: REpintf is like cat with file = stderr(). If this claim
>> is made somewhere in R documentation please report it a a bug.
>>
>>> Rprintf  -> cat
>>>
>>> Rprint/cat is of course not an exception, I listed it just for completeness.
>>> The inconsistency I would like to report is about REprintf. It cannot
>>> be caught by tryCatch(message). Warnings are errors are being caught
>>> as expected.
>>>
>>> Is there any chance to "fix"/"improve" REprintf so tryCatch(message)
>>> can catch it?
>>
>> No: this is behaving as intended.
>>
>> Best,
>>
>> luke
>>
>>> So in the example below catch(Cmessage()) would behave consistently to
>>> R's catch(message("a"))?
>>>
>>> Regards,
>>> Jan Gorecki
>>>
>>> catch = function(expr) {
>>>  tryCatch(expr,
>>>message=function(m) cat("caught message\n"),
>>>warning=function(w) cat("caught warning\n"),
>>>error=function(e) cat("caught error\n")
>>>  )
>>> }
>>> library(inline)
>>> Cstop = cfunction(c(), 'error("%s\\n","a"); return R_NilValue;')
>>> Cwarning = cfunction(c(), 'warning("%s\\n","a"); return R_NilValue;')
>>> Cmessage = cfunction(c(), 'REprintf("%s\\n","a"); return R_NilValue;')
>>>
>>> catch(stop("a"))
>>> #caught error
>>> catch(warning("a"))
>>> #caught warning
>>> catch(message("a"))
>>> #caught message
>>>
>>> catch(Cstop())
>>> #caught error
>>> catch(Cwarning())
>>> #caught warning
>>> catch(Cmessage())
>>> #a
>>> #NULL
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa  Phone: 319-335-3386
>> Department of Statistics andFax:   319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
>> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] REprintf could be caught by tryCatch(message)

2019-09-15 Thread Tierney, Luke
On Sun, 15 Sep 2019, Jan Gorecki wrote:

> Dear R-devel community,
>
> There appears to be an inconsistency in R C API about the exceptions
> that can be raised from C code.
> Mapping of R C funs to corresponding R functions is as follows.
>
> error-> stop
> warning  -> warning
> REprintf -> message

This is wrong: REpintf is like cat with file = stderr(). If this claim
is made somewhere in R documentation please report it a a bug.

> Rprintf  -> cat
>
> Rprint/cat is of course not an exception, I listed it just for completeness.
> The inconsistency I would like to report is about REprintf. It cannot
> be caught by tryCatch(message). Warnings are errors are being caught
> as expected.
>
> Is there any chance to "fix"/"improve" REprintf so tryCatch(message)
> can catch it?

No: this is behaving as intended.

Best,

luke

> So in the example below catch(Cmessage()) would behave consistently to
> R's catch(message("a"))?
>
> Regards,
> Jan Gorecki
>
> catch = function(expr) {
>  tryCatch(expr,
>message=function(m) cat("caught message\n"),
>warning=function(w) cat("caught warning\n"),
>error=function(e) cat("caught error\n")
>  )
> }
> library(inline)
> Cstop = cfunction(c(), 'error("%s\\n","a"); return R_NilValue;')
> Cwarning = cfunction(c(), 'warning("%s\\n","a"); return R_NilValue;')
> Cmessage = cfunction(c(), 'REprintf("%s\\n","a"); return R_NilValue;')
>
> catch(stop("a"))
> #caught error
> catch(warning("a"))
> #caught warning
> catch(message("a"))
> #caught message
>
> catch(Cstop())
> #caught error
> catch(Cwarning())
> #caught warning
> catch(Cmessage())
> #a
> #NULL
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Missing function Rf_findFun3

2019-09-08 Thread Tierney, Luke
On Sun, 8 Sep 2019, Laurent Gautier wrote:

> I am not using the C API from a package but with an embedded R.

Same rules apply.

> Why have it declared in the include/ if it cannot be accessed then?

Rinternals.h is used by R internally, as the name suggests. Only a
small fraction of the things declared there are part of the API.

Would it be better to separate out the things that are in the API into
a separate header? Probably. Would doing this be a good use of our
limited time resources? Probably not. Would doing this prevent people
from using things they shouldn't? Not likely.

Best,

luke

> 
> Best,
> 
> Laurent
> 
> On Sun, Sep 8, 2019, 8:27 AM Tierney, Luke  wrote:
>   On Sat, 7 Sep 2019, Laurent Gautier wrote:
>
>   > Hi,
>   >
>   >
>   > The function `Rf_findFun3` is declared in
>   > `$(R CMD CONFIG HOME)/lib/R/include/Rinternals.h`
>   > but appears to be missing from R's shared library (R.so).
>   >
>   > Is this an oversight?
>
>   No. This is not part of the API supported for use in packages.
>
>   Best,
>
>   luke
>
>   >
>   > Best,
>   >
>   > Laurent
>   >
>   >       [[alternative HTML version deleted]]
>   >
>   > __
>   > R-devel@r-project.org mailing list
>   > https://stat.ethz.ch/mailman/listinfo/r-devel
>   >
>
>   --
>   Luke Tierney
>   Ralph E. Wareham Professor of Mathematical Sciences
>   University of Iowa                  Phone:           
>    319-335-3386
>   Department of Statistics and        Fax:             
>    319-335-3017
>       Actuarial Science
>   241 Schaeffer Hall                  email: 
>    luke-tier...@uiowa.edu
>   Iowa City, IA 52242                 WWW: 
>   http://www.stat.uiowa.edu
> 
> 
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Missing function Rf_findFun3

2019-09-08 Thread Tierney, Luke
On Sat, 7 Sep 2019, Laurent Gautier wrote:

> Hi,
>
>
> The function `Rf_findFun3` is declared in
> `$(R CMD CONFIG HOME)/lib/R/include/Rinternals.h`
> but appears to be missing from R's shared library (R.so).
>
> Is this an oversight?

No. This is not part of the API supported for use in packages.

Best,

luke

>
> Best,
>
> Laurent
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: Farming out methods to other packages

2019-08-10 Thread Tierney, Luke
You could have your default method handle the cases you can handle; if
you want that to dispatch you can use something like

recover_data.default <- function(object, ...)
 default_recover_data(object, ...)
default_recover_data <- function(object, ...)
 UseMethod("default_recover_data")

Best,

luke

On Sat, 10 Aug 2019, Lenth, Russell V wrote:

> Thanks, Duncan. That's helpful.
>
> In addition, I confess I had a closing parenthesis in the wrong place.
>Should be:. . .   .GlobalEnv), silent = TRUE)
>
> Cheers,
>
> Russ
>
> -Original Message-
> From: Duncan Murdoch 
> Sent: Saturday, August 10, 2019 2:43 PM
> To: Lenth, Russell V ; Iñaki Ucar 
> 
> Cc: r-package-devel@r-project.org
> Subject: Re: [R-pkg-devel] [External] Re: Farming out methods to other 
> packages
>
> On 10/08/2019 3:27 p.m., Lenth, Russell V wrote:
>> H, I thought of an approach -- a kind of manual dispatch
>> technique. My generic is
>>
>> recover_data <- function(object, ...) {
>>  rd <- try(getS3method("recover_data", class(object)[1], envir = 
>> .GlobalEnv, silent = TRUE))
>>  if (!inherits(rd, "try-error"))
>>  rd(object, ...)
>>  else
>>  UseMethod("recover_data")
>> }
>>
>> and similar for emm_basis. The idea is it tries to find the method among 
>> globally registered ones, and if so, it uses it; otherwise, the internal one 
>> is used.
>
> That's a bad test:  class(object) might be a vector c("nomethod", 
> "hasmethod").  You're only looking for recover_data.nomethod, and maybe only 
> recover_data.hasmethod exists.
>
> The getS3method() function won't automatically iterate through the class, 
> you'll need to do that yourself, for example
>
> S3methodOrDefault <- function(object, generic, default) {
>   for (c in class(object)) {
> rd <- try(getS3method(generic, c, envir = .GlobalEnv, silent = TRUE))
> if (!inherits(rd, "try-error"))
>   return(rd)
>   }
>   return(default)
> }
>
> used as
>
>   S3methodOrDefault(object, "recover_data", internal_recover_data)
>
> Duncan Murdoch
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] [External] Questions regarding ALTREP_SET_ELT APIs

2019-07-30 Thread Tierney, Luke
On Tue, 30 Jul 2019, Wang Jiefei wrote:

> Hi all,
>
> I'm wondering if there is any way to define a `SET_ELT` function for an
> ALTREP class? I see there are ` ALTINTEGER_SET_ELT` etc. functions exported
> in Rinternal.h, but there is no corresponding ALTREP APIs to define them.
> The only way to set the value of an ALTREP is through a pointer, which will
> require that the ALTREP data is in memory. Is it on purpose?

For now, yes. We do support a Set_elt method for ALTSTRING classes but
not yet for others. I seem to recall that there are some issues with
going there for others, but we'll probably take a closer look later
this year.

One thing to keep in mind is that the R pass-by-value semantics
require that C code duplicate an object for which MAYBE_REFERENCED is
true, and the assumption in existing code is that duplicate returns an
object that can safely be mutated. That places a lot of limitations on
what can be done. You can see some notes on the issues in the
README.md and the vignette in
https://github.com/ALTREP-examples/Rpkg-mutable.

Best,

luke

> Will there be
> any plan to develop these ALTREP set element APIs?
>
> Best,
> Jiefei
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Any plans for ALTREP lists (VECSXP)?

2019-07-24 Thread Tierney, Luke
If one of you wanted to try to create a patch to support ALTREP
generic vectors here are some notes:

The main challenge I am aware of (there might be others): Allowing
DATAPTR to return a writable pointer would be too dangerous because
the GC write barrier needs to see all mutations. So it would be best
if Dataptr and Dataptr_or_null methods were not allowed to be
defined. The default methods in altrep.c should do the right think.

A reasonable name for the abstract class would be 'altlist'.

'altrep' methods that a class can provide:

   Unserialize or UnserializeEX
   Serialized_state
   Duplicate or DuplicateEx
   Coerce
   Inspect
   Length

'altvec' methods a class should provide:

   Extract_subset
   not Dataptr
   not Dataptr_or_null

'altlist' specific methods:

   Elt
   Set_elt

Best,

luke

On Tue, 23 Jul 2019, Gabriel Becker wrote:

> Hi Kylie,
>
> Is it a list with only numerics in it? (I only see REALSXPs there, but
> obviously inspect isn't showing all of them). If so, you could load it up
> into one big vector and then also keep partitioning information around.
> Bioconductor does this (see ?IRanges::CompressedList ). The potential
> benefit here being that the underlying large vector could then be a big
> out-of-memory altrep. How helpful this would be depends somewhat on what
> you want to do with it, of course, but it is something that comes to mind.
>
> Also, I would expect some overhead but that seems like a lot (without
> having done super much in the way of benchmarking). What exactly is
> as.altrep doing?
>
> Best,
> ~G
>
> On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
> r-devel@r-project.org> wrote:
>
>> Hi Kylie,
>>
>> As an alternative in the short term, you could consider deriving from
>> S4Vector's List class, implementing the getListElement() method to
>> lazily create the objects.
>>
>> Michael
>>
>> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie 
>> wrote:
>>>
>>> Hello,
>>>
>>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>>>
>>> It seems to me that they could be supported in a similar way to how
>> ALTSTRING works, with Elt() and Set_elt() methods, or would there be some
>> problems with that I’m not seeing due to lists not being atomic vectors?
>>>
>>> I was taking an approach of converting each list element (of a
>> file-based list data structure) to an ALTREP representation to build up an
>> “ALTREP list”.
>>>
>>> This seems fine for shorter lists with large elements, but I noticed
>> that for longer lists with smaller elements, this could be far more
>> time-consuming than simply reading the entire list into memory and
>> returning a non-ALTREP list:
>>>
 x
>>> <34840 length> matter_list :: out-of-memory list
>>> (1.1 MB real | 543.3 MB virtual)
>>>
 system.time(y <- as.list(x))
>>>user  system elapsed
>>>   1.116   2.175   5.053
>>>
 system.time(z <- as.altrep(x))
>>>user  system elapsed
>>>  36.295   4.717  41.216
>>>
 .Internal(inspect(y))
>>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
>> 404.093,404.096,404.099,404.102,404.105,...
>>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
>> 409.924,409.927,409.931,409.934,409.937,...
>>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
>> 400.3,400.303,400.306,400.309,400.312,...
>>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
>> 402.179,402.182,402.185,402.188,402.191,...
>>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
>> 403.021,403.024,403.027,403.03,403.033,...
>>>   ...
>>>
 .Internal(inspect(z))
>>> @10821 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1129, mem=0)
>>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=890, mem=0)
>>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1878, mem=0)
>>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=2266, mem=0)
>>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1981, mem=0)
>>>   ...
>>>
>>> In this situation, it would be much faster and simpler for me to return
>> a theoretical ALTREP list that serves SEXP elements on-demand, similar to
>> how ALTSTRING seems to be implemented.
>>>
>>> I don’t know how many other people would get a use out of ALTREP lists,
>> but I certainly would.
>>>
>>> Are there any plans for this?
>>>
>>> Thanks!
>>>
>>> ~~~
>>> Kylie Ariel Bemis
>>> Khoury College of Computer Sciences
>>> Northeastern University
>>> kuwisdelu.github.io
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>> --
>> Michael Lawrence
>> Scientist, Bioinformatics and Computational 

Re: [Rd] [External] Any plans for ALTREP lists (VECSXP)?

2019-07-23 Thread Tierney, Luke
Eventually, but probably not in the next release. There are many more
issues to think through for vectors where the elements can be
arbitrary R object, and I don't think there will be time for that soon
given other issues on the table.

Best,

luke

On Tue, 23 Jul 2019, Bemis, Kylie wrote:

> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING 
> works, with Elt() and Set_elt() methods, or would there be some problems with 
> that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based 
> list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for 
> longer lists with smaller elements, this could be far more time-consuming 
> than simply reading the entire list into memory and returning a non-ALTREP 
> list:
>
>> x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
>> system.time(y <- as.list(x))
>   user  system elapsed
>  1.116   2.175   5.053
>
>> system.time(z <- as.altrep(x))
>   user  system elapsed
> 36.295   4.717  41.216
>
>> .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>  @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 
> 404.093,404.096,404.099,404.102,404.105,...
>  @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 
> 409.924,409.927,409.931,409.934,409.937,...
>  @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 
> 400.3,400.303,400.306,400.309,400.312,...
>  @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 
> 402.179,402.182,402.185,402.188,402.191,...
>  @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 
> 403.021,403.024,403.027,403.03,403.033,...
>  ...
>
>> .Internal(inspect(z))
> @10821 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>  @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, 
> mem=0)
>  @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, 
> mem=0)
>  @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, 
> mem=0)
>  @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, 
> mem=0)
>  @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, 
> mem=0)
>  ...
>
> In this situation, it would be much faster and simpler for me to return a 
> theoretical ALTREP list that serves SEXP elements on-demand, similar to how 
> ALTSTRING seems to be implemented.
>
> I don’t know how many other people would get a use out of ALTREP lists, but I 
> certainly would.
>
> Are there any plans for this?
>
> Thanks!
>
> ~~~
> Kylie Ariel Bemis
> Khoury College of Computer Sciences
> Northeastern University
> kuwisdelu.github.io
>
>
>
>
>
>
>
>
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: ALTREP wrappers and factors

2019-07-19 Thread Tierney, Luke
On Fri, 19 Jul 2019, Gabriel Becker wrote:

> Hi Jiefei and Kylie,
>
> Great to see people engaging with the ALTREP framework and identifying
> places we may need more tooling. Comments inline.
>
> On Thu, Jul 18, 2019 at 12:22 PM King Jiefei  wrote:
>
>>
>> If that is the case and you are 100% sure the reference number should be 1
>> for your variable *y*, my solution is to call *SET_NAMED *in C++ to reset
>> the reference number. Note that you need to unbind your local variable
>> before you reset the number. To return an unbound SEXP,  the C++ function
>> should be placed at the end of your *matter:::as.altrep *function. I don't
>> know if there is any simpler way to do that and I'll be happy to see any
>> opinion.
>>
>
> So as far as I know, manually setting the NAMED value on any SEXP the
> garbage collector is aware of is a direct violation of C-API contract and
> not something that package code should ever be doing.
>
> Its not at all clear to me that you can *ever* be 100% sure that the
> reference number should be 1 when it is not currently one for an R object
> that exists at the R-level (as opposed to only in pure C code). Sure, maybe
> the object is created within the body of your R function instead of being
> passed in, but what if someone is debugging your function and assigns the
> value to the global environment using <<-  for later inspection; now  you
> have an invalidly low NAMED value, ie you have a segfault coming. I know of
> no way for you to prevent this or even know it has happened.

SET_NAMED should NEVER be used in a package. In fact it will hopefully
disappear at some point not too far in the future.

>> On Thu, Jul 18, 2019 at 3:28 AM Bemis, Kylie 
>> wrote:
>>
>>> Hello,
>>>
>>> I’m experimenting with ALTREP and was wondering if there is a preferred
>>> way to create an ALTREP wrapper vector without using
>>> .Internal(wrap_meta(…)), which R CMD check doesn’t like since it uses an
>>> .Internal() function.
>>
>
> So there is the .doSortWrap  (and its currently inexplicably identical
> clone .doWrap) function in base, which is an R level function that calls
> down to .Internal(wrap_meta(...)), which you can use, but it doesn't look
> general enough for what  I think you need (it was written for things that
> have just been sorted, thus the name). Specifically, its not able to
> indicate that things are of unknown sortedness as currently written.  If
> matter vectors are guaranteed to be sorted for some reason, though, you can
> use this. I'll talk to Luke about whether we want to generalize this, it
> would be easy to have this support the full space of metadata for wrappers
> and be a general purpose wrapper-maker, but that isn't what it is right now.
>
> At the C-level, it looks like we do make R_tryWrap available (it appears in
> Rinternals.h, and not within a USE_RINTERNALS section),so you can call that
> from your own C(++) code. This creates a wrapper that has no metadata on it
> (or rather it has metadata but  the metadata indicates that no special info
> is known about the vector).

At this point we are not ready to cast in stone an interface to
creating wrappers from R.  The C R_tryWrap could be used, but it is
still subject to change.

You might try your example with a larger vector. In R 3.6.x
structure() should produce a wrapper for length 100 or more.

Best,

luke

>>
>>> I was trying to create a factor that used an ALTREP integer, but
>>> attempting to set the class and levels attributes always ended up
>>> duplicating and materializing the integer vector. Using the wrapper
>> avoided
>>> this issue.
>>>
>>> Here is my initial ALTREP integer vector:
>>>
 fc0 <- factor(c("a", "a", "b"))

 y <- matter::as.matter(as.integer(fc0))
 y <- matter:::as.altrep(y)

 .Internal(inspect(y))
>>> @7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter vector (mode=3, len=3,
>> mem=0)
>>>
>>> Here is what I get without a wrapper:
>>>
 fc1 <- structure(y, class="factor", levels=levels(x))
 .Internal(inspect(fc1))
>>> @7fb0cae66408 13 INTSXP g0c2 [OBJ,NAM(2),ATT] (len=3, tl=0) 1,1,2
>>> ATTRIB:
>>>   @7fb0ce771868 02 LISTSXP g0c0 []
>>> TAG: @7fb0c80043d0 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has
>>> value)
>>> @7fb0c9fcbe90 16 STRSXP g0c1 [NAM(7)] (len=1, tl=0)
>>>   @7fb0c80841a0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
>>> "factor"
>>> TAG: @7fb0c8004050 01 SYMSXP g1c0 [MARK,NAM(7),LCK,gp=0x4000]
>> "levels"
>>> (has value)
>>> @7fb0d1dd58c8 16 STRSXP g0c2 [MARK,NAM(7)] (len=2, tl=0)
>>>   @7fb0c81bf4c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
>>>   @7fb0c90ba728 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "b"
>>>
>>> Here is what I get with a wrapper:
>>>
 fc2 <- structure(.Internal(wrap_meta(y, 0, 0)), class="factor",
>>> levels=levels(x))
 .Internal(inspect(fc2))
>>> @7fb0ce764630 13 INTSXP g0c0 [OBJ,NAM(2),ATT]  wrapper [srt=0,no_na=0]
>>>   @7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter 

Re: [Rd] [External] What is the best way to determine the version of an `.rds`?

2019-07-16 Thread Tierney, Luke
Can you add a wishlist item to bugzilla?

Thanks,

luke

On Mon, 15 Jul 2019, Jennifer Bryan wrote:

> Hi,
>
> I am writing a test that consults the serialization version of an `.rds`
> file.
>
> An attractive way to get this is:
>
> tools:::get_serialization_version() # reports just version
>
> which calls
>
> .Internal(serializeInfoFromConn() # reports much more
>
> but neither is truly exported for public use.
>
> Is there an official, exported way to get the serialization version? It is
> possible to get this information with R code yourself, but it doesn't feel
> very elegant.
>
> If not, could we have this? It's pretty easy these days to acquire a
> version 3 file, without real intent, which risks making the package require
> R >= 3.5.
>
> Thanks,
> Jenny
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Unexpected behaviour when comparing (==) long quoted expressions

2019-07-16 Thread Tierney, Luke
On Tue, 16 Jul 2019, Martin Maechler wrote:

>> Daniel Chen
>> on Fri, 12 Jul 2019 13:53:21 -0500 writes:
>
>> Hi everyone:
>> I’m one of the interns at RStudio this summer working on a project that
>> helps teachers grade student code. I found an unexpected behaviour with
>> the |==| operator when comparing |quote|d expressions.
>
>> Example 1:
>
>> |u <- quote(tidyr::gather(key = key, value = value,
>> new_sp_m014:newrel_f65, na.rm = TRUE)) s <- quote(tidyr::gather(key =
>> key, value = value, new_sp_m014:newrel_f65, na.rm = FALSE)) u == s #
>> TRUE u <- quote(tidyr::gather(key = key, value = value, na.rm = TRUE)) s
>> <- quote(tidyr::gather(key = key, value = value, na.rm = FALSE)) u == s
>> # FALSE |
>
> Unfortunately the above is almost unreadable, as you "forgot" to
> click (in the lower right corner of your Gmail interface with
> the three vertical dots) "plain text mode".
>
>> Example 2:
>
>> |u <-
>> quote(f(x123456789012345678901234567890123456789012345678901234567890,
>> 1)) s <-
>> quote(f(x123456789012345678901234567890123456789012345678901234567890,
>> 2)) u == s #> [1] TRUE |
>
> this is even readable after html - de-html mangling
>
>> Winston Chang pointed out in the help page for |==|:
>
>> Language objects such as symbols and calls are deparsed to character
>> strings before comparison.
>
>> and in the source code that does the comparison [1] shows that It
>> deparses each language object and then only extracts the first element
>> from the resulting character vector:
>
>> |SET_STRING_ELT(tmp, 0, (iS) ? PRINTNAME(x) : STRING_ELT(deparse1(x, 0,
>> DEFAULTDEPARSE), 0)); |
>
>> Is this a fix that needs to happen within the |==| documentation? or an
>> actual bug with the operator?
>
> This a good question.
>
> Thank you, Daniel, for providing the link to the source code in
> /src/main/relop.c .
>
> Looking at that and its context, I think we (R core) should
> reconsider that implementation of '=='  which indeed does about
> the same thing as deparse {which also truncates at some point by
> default; something very very reasonable for error messages, but
> undesirable in other cases}.
>
> But I think it's fair expectation that comparing  calls  ["language"]
> with '==' should compare the full call's syntax even if that may
> occasionally be very long.

Before going there I think we should reconsider whether allowing ==
comparisons on calls is a good idea. We already don't allow it for
expresison() objects. It is probably unavoidable to allow symbols
(there are probably lots of things that would break if quote(x) == "x"
did not work and return TRUE), but for calls it makes little sense to
have "f(x)" == quote(f(x)). These are very different objects that
happen to have identical string representations. For computing on the
language identical() is the right way to go (that is what is used in
the byte code compiler and codetools).

Best,

luke

>
> Martin
>
>> For more context the original issue we had is here:
>> https://github.com/rstudio-education/grader/issues/28
>
>> Workaround:
>
>> You can get around this issue by using |all.equal| or |identical|
>
>> |u <- quote(tidyr::gather(key = key, value = value,
>> new_sp_m014:newrel_f65, na.rm = TRUE)) s <- quote(tidyr::gather(key =
>> key, value = value, new_sp_m014:newrel_f65, na.rm = FALSE)) u == s #
>> TRUE all.equal(u, s) # "target, current do not match when deparsed"
>> identical(u, s) # FALSE |
>
>> Thanks,
>
>> Dan
>
>> [1] 
> https://github.com/wch/r-source/blob/e647f78cb85282263f88ea30c6337b77a30743d9/src/main/relop.c#L140-L155
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Potential bug with data.frame replacement

2019-07-15 Thread Tierney, Luke
Thanks for the report. The buffer overflow should be fixed in
R-patched and R-devel.

Best,

luke

On Sun, 14 Jul 2019, Benjamin Jean-Marie Tremblay wrote:

> Dear R-devel,
>
> I have encountered a crash-inducing scenario and would like to enquire as to
> whether this would be considered a bug. To reproduce the crash:
>
> X <- sample(letters, 3000, TRUE)
> D <- data.frame(X, 1:3000, X, X, X, X, X)
> D$X1.3000 <- paste0("GSM", D)
>
> The reason why I'm not sure if this would be considered a bug is because I
> typed this by accident, when what I meant was:
>
> D$X1.3000 <- paste0("GSM", D$X1.3000)
>
> I can never image a scenario where I would intentionally perform the former.
>
> This issue seems to have something to do with the size of the data.frame, as
> smaller examples will work fine:
>
> D <- data.frame(A = 1:10, B = letters[1:10])
> D$A <- paste0("A", D)
>
> Also just doing the paste0 part without trying to replace a data.frame column
> not crash R for me.
>
> I can submit this on Bugzilla should this be deemed sufficiently buggy.
>
> I am running 3.6.0 on macOS (x86_64-apple-darwin15.6.0).
>
> Sincerely,
>
> B.T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Mitigating Stalls Caused by Call Deparse on Error

2019-07-15 Thread Tierney, Luke
Better to add this to the wishlist item. This all needs to be looked
at together, and nothing is likely to happen until after
vacation/conference season.  It will disappear from everyone's radar
if it is just in R_devel.

Best,

luke

On Sun, 14 Jul 2019, brodie gaslam wrote:

> Luke, thanks for considering the issue.  I would like to
> try to separate the problem into two parts, as I _think_
> your comments address primarily part 2 below:
>
> 1. How can we avoid significant and possibly crippling
>    stalls on error with these non-standard calls.
> 2. What is the best way to view these non-standard calls.
>
> I agree that issue 2. requires further thought and
> discussion under a wishlist issue ([on bugzilla now][1]). 
> While I did raise issue 2., the patch itself makes no
> attempt to resolve it.
>
> The proposed patch resolves issue 1., which is a big
> usability problem.  Right now if you have the misfortune of
> using `do.call` with a big object and trigger an error, you
> have the choice of waiting a possibly long time for
> the deparse to complete, or killing your entire R session
> externally.
>
> It seems a shame to allow a big usability issue for `do.call`
> to remain when there is a simple solution at hand, especially
> since the complete deparse of large objects likely serves no
> purpose in this case. Obviously, if storing the actual calls
> instead of their deparsed equivalents in .Traceback causes
> problems I'm not anticipating, then that's different. 
> Is that the case?
>
> Best,
>
> Brodie.
>
> [1]: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17580
>
> On Sunday, July 14, 2019, 8:52:45 AM EDT, Tierney, Luke 
>  wrote:
>
>
>
>
>
> This is probably best viewed in the context of other issue with
> displaying calls, such as issues arising from calls constructed in
> non-standard evaluation contexts. Might be good to move to a wishlist
> item in bugzilla.
>
> Best,
>
> luke
>
> On Sat, 13 Jul 2019, brodie gaslam via R-devel wrote:
>
>> When large calls cause errors R may stall for extended periods.  This
>> is particularly likely to happen with `do.call`, as in this example
>> with a 24 second stall:
>>
>>     x <- runif(1e7)
>>     system.time(do.call(paste0, list(abs, x)))  # intentional error
>>     ## Error in (function (..., collapse = NULL)  :
>>     ##   cannot coerce type 'builtin' to vector of type 'character'
>>     ## Calls: system.time -> do.call -> 
>>     ## Timing stopped at: 23.81 0.149 24.04
>>
>>     str(.Traceback)
>>     ## Dotted pair list of 3
>>     ##  $ : chr [1:2500488] "(function (..., collapse = NULL) " 
>> ".Internal(paste0(list(...), collapse)))(.Primitive(\"abs\"), 
>> c(0.718117154669017, " "0.494785501621664, 0.1453434410505, 
>> 0.635028422810137, 0.0353180423844606, " "0.688418723642826, 
>> 0.889682895969599, 0.728154224809259, 0.292572240810841, " ...
>>     ##  $ : chr "do.call(paste0, list(abs, x))"
>>     ##  $ : chr "system.time(do.call(paste0, list(abs, x)))"
>>
>> The first time I noticed this I thought my session had frozen/crashed
>> as the standard interrupt ^C does not work during the deparse.  The
>> stall happens when on error the call stack is deparsed prior to being
>> saved to `.Traceback`.  The deparsing is done by `deparse1m` in native
>> code, with the value of `getOption('deparse.max.lines')` which
>> defaults to all lines.
>>
>> Since there is little value to seeing millions of lines of deparsed
>> objects in `traceback()`, a simple work-around is to change the
>> `deparse.max.lines` value:
>>
>>     options(deparse.max.lines=1)
>>     system.time(do.call(paste0, list(abs, x)))
>>     ## Error in (function (..., collapse = NULL)  :
>>     ##   cannot coerce type 'builtin' to vector of type 'character'
>>     ## Calls: system.time -> do.call -> 
>>     ## Timing stopped at: 0 0 0
>>
>> Unfortunately this will affect all `deparse` calls, and it seems
>> undesirable to pre-emptively enable it just for calls that might cause
>> large deparses on error.
>>
>> An alternative is to store the actual calls instead of their deparsed
>> character equivalents in `.Traceback`.  This defers the deparsing to
>> when `traceback()` is used.  As per `?traceback`, it should be
>> relatively safe to modify `.Traceback` in this way:
>>
>>> It is undocumented where .Traceback is stored nor that it is
>>> visible, and this is subject to change.
>>
>> Deferring the deparsing to `traceback

Re: [Rd] [External] Re: Possible bug in `class<-` when a class-specific '[[.' method is defined

2019-07-15 Thread Tierney, Luke
Pasting the entire example into RStudio and hitting return to evaluate
does not show this. Evaluating the finall line to print counttt
separately does.

Looks like RStudio is calling `[[` on your object when examining the
environment for the Environment panel. If this concerns you then you
should contact RStudio.

Best,

luke

On Mon, 15 Jul 2019, Rui Barradas wrote:

> Hello,
>
> Clean R 3.6.1 session on Ubuntu 19.04, RStudio 1.1.453. sessionInfo() at the 
> end.
>
> I can reproduce this.
>
> counttt <- 0
>
> `[[.MYCLASS` = function(x, ...) {
>  counttt <<- counttt + 1
>  # browser()
>  x = NextMethod()
>  return(x)
> }
>
> df <- as.data.frame(matrix(1:20, nrow=5))
> class(df) <- c("MYCLASS","data.frame")
> counttt
> #[1] 9
>
>
> But there's more. I tried to print the values of x in the method and got 
> really strange results
>
> counttt <- 0
>
> `[[.MYCLASS` = function(x, ...) {
>  counttt <<- counttt + 1
>  print(x)
>  # browser()
>  x = NextMethod()
>  return(x)
> }
>
> df <- as.data.frame(matrix(1:20, nrow=5))
> class(df) <- c("MYCLASS","data.frame")
> counttt
> #[1] 151
>
>
> If I change print to print.data.frame it goes up to
>
> counttt
> #[1] 176
>
> With print.default back to 9. What is the print method called in the second 
> example?
>
>
> sessionInfo()
> R version 3.6.1 (2019-07-05)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 19.04
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>
> locale:
> [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
> [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
> [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods
> [7] base
>
> loaded via a namespace (and not attached):
>  [1] sos_2.0-0   nlme_3.1-140matrixStats_0.54.0
>  [4] fs_1.2.7xts_0.11-2  usethis_1.5.0
>  [7] lubridate_1.7.4 devtools_2.0.2  RColorBrewer_1.1-2
> [10] rprojroot_1.3-2 rbenchmark_1.0.0tools_3.6.1
> [13] backports_1.1.4 R6_2.4.0rpart_4.1-15
> [16] Hmisc_4.2-0 lazyeval_0.2.2  colorspace_1.4-1
> [19] nnet_7.3-12 npsurv_0.4-0withr_2.1.2
> [22] tidyselect_0.2.5gridExtra_2.3   prettyunits_1.0.2
> [25] processx_3.3.0  curl_3.3compiler_3.6.1
> [28] cli_1.1.0   htmlTable_1.13.1randomNames_1.4-0.0
> [31] dvmisc_1.1.3desc_1.2.0  tseries_0.10-46
> [34] scales_1.0.0checkmate_1.9.1 lmtest_0.9-36
> [37] fracdiff_1.4-2  mvtnorm_1.0-10  quadprog_1.5-6
> [40] callr_3.2.0 stringr_1.4.0   digest_0.6.18
> [43] foreign_0.8-71  rio_0.5.16  base64enc_0.1-3
> [46] stocks_1.1.4pkgconfig_2.0.2 htmltools_0.3.6
> [49] sessioninfo_1.1.1   readxl_1.3.1htmlwidgets_1.3
> [52] rlang_0.3.4 TTR_0.23-4  rstudioapi_0.10
> [55] quantmod_0.4-14 MLmetrics_1.1.1 zoo_1.8-5
> [58] zip_2.0.1   acepack_1.4.1   dplyr_0.8.0.1
> [61] car_3.0-2   magrittr_1.5Formula_1.2-3
> [64] Matrix_1.2-17   Rcpp_1.0.1  munsell_0.5.0
> [67] abind_1.4-5 stringi_1.4.3   forecast_8.6
> [70] yaml_2.2.0  carData_3.0-2   MASS_7.3-51.3
> [73] pkgbuild_1.0.3  plyr_1.8.4  grid_3.6.1
> [76] parallel_3.6.1  forcats_0.4.0   crayon_1.3.4
> [79] lattice_0.20-38 haven_2.1.0 splines_3.6.1
> [82] hms_0.4.2   knitr_1.22  ps_1.3.0
> [85] pillar_1.4.0pkgload_1.0.2   urca_1.3-0
> [88] glue_1.3.1  lsei_1.2-0  babynames_1.0.0
> [91] latticeExtra_0.6-28 data.table_1.12.2   remotes_2.0.4
> [94] cellranger_1.1.0testthat_2.1.0  gtable_0.3.0
> [97] purrr_0.3.2 assertthat_0.2.1ggplot2_3.1.1
> [100] openxlsx_4.1.0  xfun_0.6survey_3.35-1
> [103] survival_2.44-1.1   timeDate_3043.102   tibble_2.1.1
> [106] memoise_1.1.0   cluster_2.0.8   toOrdinal_1.1-0.0
> [109] fitdistrplus_1.0-14 brew_1.0-6
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 13:16 de 15/07/19, Duncan Murdoch escreveu:
>> On 07/07/2019 11:49 a.m., Ghiggi Gionata wrote:
>>> Hi all !
>>> 
>>> I noticed a strange behaviour of the function `class<-` when a 
>>> class-specific '[[.' method is defined.
>>> 
>>> Here below a reproducible example :
>>> 
>>> 
>>> #---.
>>> 
>>> counttt <- 0
>>> 
>>> `[[.MYCLASS` = function(x, ...) {
>>>    counttt <<- counttt + 1
>>>    # browser()
>>>    x = NextMethod()
>>>    return(x)
>>> }
>>> 
>>> df <- as.data.frame(matrix(1:20, nrow=5))
>>> class(df) <- c("MYCLASS","data.frame")
>>> counttt
>>> 
>>> # The same occurs when using structure(, class=) or 

Re: [Rd] [External] Mitigating Stalls Caused by Call Deparse on Error

2019-07-14 Thread Tierney, Luke
This is probably best viewed in the context of other issue with
displaying calls, such as issues arising from calls constructed in
non-standard evaluation contexts. Might be good to move to a wishlist
item in bugzilla.

Best,

luke

On Sat, 13 Jul 2019, brodie gaslam via R-devel wrote:

> When large calls cause errors R may stall for extended periods.  This
> is particularly likely to happen with `do.call`, as in this example
> with a 24 second stall:
>
>     x <- runif(1e7)
>     system.time(do.call(paste0, list(abs, x)))  # intentional error
>     ## Error in (function (..., collapse = NULL)  :
>     ##   cannot coerce type 'builtin' to vector of type 'character'
>     ## Calls: system.time -> do.call -> 
>     ## Timing stopped at: 23.81 0.149 24.04
>
>     str(.Traceback)
>     ## Dotted pair list of 3
>     ##  $ : chr [1:2500488] "(function (..., collapse = NULL) " 
> ".Internal(paste0(list(...), collapse)))(.Primitive(\"abs\"), 
> c(0.718117154669017, " "0.494785501621664, 0.1453434410505, 
> 0.635028422810137, 0.0353180423844606, " "0.688418723642826, 
> 0.889682895969599, 0.728154224809259, 0.292572240810841, " ...
>     ##  $ : chr "do.call(paste0, list(abs, x))"
>     ##  $ : chr "system.time(do.call(paste0, list(abs, x)))"
>
> The first time I noticed this I thought my session had frozen/crashed
> as the standard interrupt ^C does not work during the deparse.  The
> stall happens when on error the call stack is deparsed prior to being
> saved to `.Traceback`.  The deparsing is done by `deparse1m` in native
> code, with the value of `getOption('deparse.max.lines')` which
> defaults to all lines.
>
> Since there is little value to seeing millions of lines of deparsed
> objects in `traceback()`, a simple work-around is to change the
> `deparse.max.lines` value:
>
>     options(deparse.max.lines=1)
>     system.time(do.call(paste0, list(abs, x)))
>     ## Error in (function (..., collapse = NULL)  :
>     ##   cannot coerce type 'builtin' to vector of type 'character'
>     ## Calls: system.time -> do.call -> 
>     ## Timing stopped at: 0 0 0
>
> Unfortunately this will affect all `deparse` calls, and it seems
> undesirable to pre-emptively enable it just for calls that might cause
> large deparses on error.
>
> An alternative is to store the actual calls instead of their deparsed
> character equivalents in `.Traceback`.  This defers the deparsing to
> when `traceback()` is used.  As per `?traceback`, it should be
> relatively safe to modify `.Traceback` in this way:
>
>> It is undocumented where .Traceback is stored nor that it is
>> visible, and this is subject to change.
>
> Deferring the deparsing to `traceback()` will give us the 
> opportunity to use a different `max.lines` setting as we do here
> with the patch applied:
>
>     system.time(do.call(paste0, list(abs, x)))
>     ## Error in (function (..., collapse = NULL)  :
>     ##   cannot coerce type 'builtin' to vector of type 'character'
>     ## Timing stopped at: 0.028 0 0.029
>
>     system.time(traceback(max.lines=3))
>     ## 3: (function (..., collapse = NULL)
>     ##    .Internal(paste0(list(...), collapse)))(.Primitive("abs"), 
> c(0.535468587651849,
>     ##    0.0540027911774814, 0.732930393889546, 0.565360915614292, 
> 0.544816034380347,
>     ## ...
>     ## 2: do.call(paste0, list(abs, x))
>     ## 1: system.time(do.call(paste0, list(abs, x)))
>     ##    user  system elapsed
>     ##   0.000   0.000   0.003
>
>
> More generally, it might be better to have a different smaller default
> value for the lines to deparse when calls  are _displayed_ as parts of
> lists, as is the case with `traceback()`, or in `print(sys.calls())` and
> similar.
>
> I attach a patch that does this.  I have run some basic tests
> and `make check-devel` passes. I can file an issue on bugzilla
> if that is a better place to have this conversation (assuming there
> is interest in it).
>
> Best,
>
> Brodie
>
> PS: for some reason my mail client is refusing to attach the patch so I paste 
> it
> starting on the next line.
> Index: src/gnuwin32/Rdll.hide
> ===
> --- src/gnuwin32/Rdll.hide    (revision 76827)
> +++ src/gnuwin32/Rdll.hide    (working copy)
> @@ -94,6 +94,7 @@
>   R_GetMaxNSize
>   R_GetMaxVSize
>   R_GetTraceback
> + R_GetTracebackParsed
>   R_GetVarLocSymbol
>   R_GetVarLocValue
>   R_HandlerStack
> Index: src/include/Defn.h
> ===
> --- src/include/Defn.h    (revision 76827)
> +++ src/include/Defn.h    (working copy)
> @@ -1296,6 +1296,7 @@
>  void NORET ErrorMessage(SEXP, int, ...);
>  void WarningMessage(SEXP, R_WARNING, ...);
>  SEXP R_GetTraceback(int);
> +SEXP R_GetTracebackParsed(int);
>  
>  R_size_t R_GetMaxVSize(void);
>  void R_SetMaxVSize(R_size_t);
> Index: src/library/base/R/traceback.R
> ===
> --- src/library/base/R/traceback.R   

Re: [Rd] [External] Re: Suggested Patch: Library returns matching installed packages when typo present

2019-06-22 Thread Tierney, Luke
On Fri, 21 Jun 2019, Henrik Bengtsson wrote:

>> On 6/21/19 10:56 AM, Tierney, Luke wrote:
>> [...]
>> Something that would be useful and is being considered is having a
>> mechanism for registering default condition handlers. This would allow
>> the condition to be re-signaled with a custom class and then having
>> a custom conditionMessage method is less likely to cause conflicts.
>
> Is it correct that you are proposing something that allows us to do:
>
> registerDefaultConditionHandlers(
>  packageStartupMessage = function(c) {
>## Do something with condition 'c'
>...
>## Suppress futher processing
>invokeRestart("muffleMessage")
>  }
> )
>
> at the core, which avoids having us to wrap up calls in
> withCallingHandlers() at top-level calls, e.g.
>
>> withCallingHandlers({
>  library("foobar"),
> }, packageStartupMessage = function(c) {
>  ## Do something with condition 'c'
>  ...
>  ## Suppress futher processing
>  invokeRestart("muffleMessage")
> })
>
> ?
>
> Then, if I read this correctly, I'd say, this would be a very useful
> addition to base R.  This will provide a core framework that opens up
> for several neat extensions, e.g. the one that Marcel suggests - some
> people prefer a message/warning on misspelled package names, while
> others might want to see if it can be automatically installed, and so
> on.  And, it will (=should) be all in the hands of the end user to
> control this, i.e. various packages should not override this similar
> to how we don't expect them to override other personal R settings we
> have.

Something along those lines. Not thoroughly thought through yet, and
there are lots of other things ahead in the queue ...

>
> With this in place, it's not hard to imagine a third-party package
> that provides useful handlers that users can pick from, e.g.
>
>  buttlr::i_am_a("first_time_r_user")
>
> to get extra information for some of the common warnings and errors,
> which is in the same spirit as your idea on:
>
>> [...] This would allow an IDE, for example, to provide a dialog for
>> choosing the alternate package and retrying without the need to call
>> library() again. [...]
>
> I also think such a framework could replace some of the "legacy
> handlers" we currently have in place, e.g. R options 'warn',
> 'warnPartialMatchArgs', '...', and even 'error', and give more
> granular control over those use cases.  For instance, instead of a
> warning or a partial argument match, I might want to produce an error
> unless it comes from one particular package, say, which is something
> that is a bit tricky to do today.

As I am sure you know changing things that have been around for a long
time is a lot more complicated than adding new things ...

>
> /Henrik
>
> PS. Somewhat related to this, standardizing muffling and signaling of
> conditions could be worth looking into as well.  For instance, being
> able to resignal an error 'e' with a *generic* signalCondition(e)
> instead of having to know that you should call stop(e) for 'error'
> conditions and maybe another function if the error is of another
> class.

No. This is working as intended. Signaling protocols and class
hierarchies are separate. _Usually_ you will signal an error with
stop() and a warning with warning() but you can do it the other way as
well. Condition classes determine what handlers are eligible. The
choice of signaling function determines the protocol for signaling
whatever condition the function is given, including what extra
restarts might be available and what happens if no handler is
available or all handlers decline by returning. stop() and
warning/message both use signalCondition to allow handlers to handle
the condition.  For stop(), the protocol guarantees that stop will
never return since it invokes an abort restart if the condition is not
handled. warning/message signal with a muffling restart in place. If
you use signalCondition directly you can establish your own protocol.

Best,

luke


> On Fri, Jun 21, 2019 at 8:55 AM Marcel Ramos
>  wrote:
>>
>> Hi Luke,
>>
>> Thank you for your response.
>>
>> On 6/21/19 10:56 AM, Tierney, Luke wrote:
>>
>> Thanks for the suggestion. However I don't think it is the right way
>> to go. I also don't care for what install.packages() does. Signaling a
>> warning and then an error means someone has to catch both the error
>> and the warning, or suppress the warning, in order to handle the error
>> programmatically.
>>
>> I do care for what install.packages() does because it's preferable
>> to have consistency in the user interface.
>>
>>

Re: [Rd] [External] Suggested Patch: Library returns matching installed packages when typo present

2019-06-21 Thread Tierney, Luke
Thanks for the suggestion. However I don't think it is the right way
to go. I also don't care for what install.packages() does. Signaling a
warning and then an error means someone has to catch both the error
and the warning, or suppress the warning, in order to handle the error
programmatically.

Now that library() signals a structured error there are other options.
One possibility, at least as an interim, is to define a
conditionMessage method, e.g. as

conditionMessage.packageNotFoundError <- function(c) {
 lib.loc <- c$lib.loc
 msg <- c$message
 package <- c$package
 if(length(lib.loc)) {
 allpkgs <- .packages(TRUE, lib.loc)
 if (!is.na(w <- match(tolower(package), tolower(allpkgs {
 msg2 <- sprintf("Perhaps you meant %s ?", sQuote(allpkgs[w]))
 return(paste(msg, msg2, sep = "\n"))
 }
 }
 msg
}

This is something you can do yourself, though it is generally not a
good idea to define a method when you don't own either the generic or
the class.

Something that would be useful and is being considered is having a
mechanism for registering default condition handlers. This would allow
the condition to be re-signaled with a custom class and then having
a custom conditionMessage method is less likely to cause conflicts.

Also worth looking into is establishing a restart around the error
signal.  This would allow an IDE, for example, to provide a dialog for
choosing the alternate package and retrying without the need to call
library() again. This is currently done in loadNamespace() but not yet
in library(). Can have downsides as well -- if the library() call is
in a notebook, for example, then you do want to fix the call ...  It
is arguably more useful in loadNamespace since that can get called
implicitly inside a longer computation that you don't necessarily want
to start over.

Best,

luke

On Fri, 21 Jun 2019, Marcel Ramos wrote:

> Dear R-core devs,
>
> I hope this email finds you well.
>
> Please see the proposed patch to R-devel below:
>
> Scenario:
>
> When loading a package using `library`, a package may not be found if the 
> cases are not matching:
>
> ```
>> library(ORG.Hs.eg.db)
> Error in library(ORG.Hs.eg.db) :
>  there is no package called 'ORG.Hs.eg.db'
> ```
>
>
> Suggested Patch:
>
> Returns a message matching what `install.packages` returns in such situations:
>
> ```
>> library("ORG.Hs.eg.db")
> Error in library("ORG.Hs.eg.db") :
>   there is no package called 'ORG.Hs.eg.db'
> In addition: Warning message:
> Perhaps you meant 'org.Hs.eg.db' ?
> ```
>
> This patch will be helpful with 'fat-finger' typos. It will match a package
> called with `library` against installed packages.
>
>
> svn diff:
>
> Index: src/library/base/R/library.R
> ===
> --- src/library/base/R/library.R(revision 76727)
> +++ src/library/base/R/library.R(working copy)
> @@ -300,8 +300,13 @@
> pkgpath <- find.package(package, lib.loc, quiet = TRUE,
> verbose = verbose)
> if(length(pkgpath) == 0L) {
> -if(length(lib.loc) && !logical.return)
> +if(length(lib.loc) && !logical.return) {
> +allpkgs <- .packages(TRUE, lib.loc)
> +if (!is.na(w <- match(tolower(package), 
> tolower(allpkgs
> +warning(sprintf("Perhaps you meant %s ?",
> +sQuote(allpkgs[w])), call. = FALSE, domain = NA)
> stop(packageNotFoundError(package, lib.loc, sys.call()))
> +}
> txt <- if(length(lib.loc))
> gettextf("there is no package called %s", sQuote(package))
> else
>
>
> Thank you!
>
> Best regards,
>
> Marcel
>
>
>
> --
> Marcel Ramos
> Bioconductor Core Team
> Roswell Park Comprehensive Care Center
> Dept. of Biostatistics & Bioinformatics
> Elm & Carlton Streets
> Buffalo, New York 14263
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   

Re: [Rd] [External] Re: R C API resize matrix

2019-06-17 Thread Tierney, Luke
On Mon, 17 Jun 2019, Simon Urbanek wrote:

> Matrix is just a vector with the dim attribute. Assuming it is not referenced 
> by anyone, you can set any values to the dim attribute. As for the vector, 
> you can use SET_LENGTH() to shorten it - but I'm not sure how official it is 
> - it was originally designed to work, but there were abuses of TRUELENGTH so 
> not sure where we stand now (shortened vectors used to fool the garbage 
> collector as far as object sizes go). I wouldn't do it unless you're dealing 
> with rally huge matrices.

Don't do that. SET_LENGTH isn't part of the API and using it outside
specific internal code confuses the garbage collector.

There is support for a growable vector but it's not a at a point where
the interface is stable enough to be used in packages. So again please
don't go there.

Also for a matrix unless you are just dropping trailing columns you
would have to move data in memory.

Best,

luke

>
> Cheers,
> Simon
>
>
>> On Jun 14, 2019, at 5:31 PM, Morgan Morgan  wrote:
>>
>> Hi,
>>
>> Is there a way to resize a matrix defined as follows:
>>
>> SEXP a = PROTECT(allocMatrix(INTSXP, 10, 2));
>> int *pa  = INTEGER(a)
>>
>> To row = 5 and col = 1 or do I have to allocate a second matrix "b" with
>> pointer *pb and do a "for" loop to transfer the value of a to b?
>>
>> Thank you
>> Best regards
>> Morgan
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: try() in R CMD check --as-cran

2019-06-07 Thread Tierney, Luke
A simplified version without a package:

Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_"="abort,verbose")
tryCatch(1:3 || 1, error = identity)

Running this aborts the session since it calls R_Suicide without first
signaling a condition to try any available handlers. Should be easy to
change, but I don't know if there are any downsides for the CRAN
workflow. I'll look into it.

Best,

luke


On Fri, 7 Jun 2019, William Dunlap wrote:

> I've attached a package, ppp_0.1.tar.gz, which probably will not get
> through to R-help, that illustrates this.
> It contains one function which, by default, triggers a condition-length>1
> issue:
>   f <- function(x = 1:3)
>   {
>   if (x > 1) {
>   x <- -x
>   }
>   stop("this function always gives an error")
>   }
> and the help file example is
>   try(f())
>
> Then
>   env _R_CHECK_LENGTH_1_CONDITION_=abort,verbose R-3.6.0 CMD check
> --as-cran ppp_0.1.tar.gz
> results in
> * checking examples ... ERROR
> Running examples in ‘ppp-Ex.R’ failed
> The error most likely occurred in:
>
>> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
>> ### Name: f
>> ### Title: Cause an error
>> ### Aliases: f
>> ### Keywords: error
>>
>> ### ** Examples
>>
>> try(f())
> --- FAILURE REPORT --
> --- failure: the condition has length > 1 ---
> --- srcref ---
> :
> --- package (from environment) ---
> ppp
> --- call from context ---
> f()
> --- call from argument ---
> if (x > 1) {
>x <- -x
> }
> --- R stacktrace ---
> where 1: f()
> where 2: doTryCatch(return(expr), name, parentenv, handler)
> where 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> where 4: tryCatchList(expr, classes, parentenv, handlers)
> where 5: tryCatch(expr, error = function(e) {
>call <- conditionCall(e)
>if (!is.null(call)) {
>if (identical(call[[1L]], quote(doTryCatch)))
>call <- sys.call(-4L)
>dcall <- deparse(call)[1L]
>prefix <- paste("Error in", dcall, ": ")
>LONG <- 75L
>sm <- strsplit(conditionMessage(e), "\n")[[1L]]
>w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")
>if (is.na(w))
>w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],
>type = "b")
>if (w > LONG)
>prefix <- paste0(prefix, "\n  ")
>}
>else prefix <- "Error : "
>msg <- paste0(prefix, conditionMessage(e), "\n")
>.Internal(seterrmessage(msg[1L]))
>if (!silent && isTRUE(getOption("show.error.messages"))) {
>cat(msg, file = outFile)
>.Internal(printDeferredWarnings())
>}
>invisible(structure(msg, class = "try-error", condition = e))
> })
> where 6: try(f())
>
> --- value of length: 3 type: logical ---
> [1] FALSE  TRUE  TRUE
> --- function from context ---
> function (x = 1:3)
> {
>if (x > 1) {
>x <- -x
>}
>stop("this function always gives an error")
> }
> 
> 
> --- function search by body ---
> Function f in namespace ppp has this body.
> --- END OF FAILURE REPORT --
> Fatal error: the condition has length > 1
> * checking PDF version of manual ... OK
> * DONE
>
> Status: 1 ERROR, 1 NOTE
> See
>  ‘/tmp/bill/ppp.Rcheck/00check.log’
> for details.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Jun 7, 2019 at 10:21 AM Duncan Murdoch 
> wrote:
>
>> On 07/06/2019 12:32 p.m., William Dunlap wrote:
>>> The length-condition-not-equal-to-one checks will cause R to shutdown
>>> even if the code in a tryCatch().
>>
>> That's strange.  I'm unable to reproduce it with my tries, and John's
>> package is no longer online.  Do you have an example I could look at?
>>
>> Duncan Murdoch
>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com 
>>>
>>>
>>> On Fri, Jun 7, 2019 at 7:47 AM Duncan Murdoch >> > wrote:
>>>
>>> On 07/06/2019 9:46 a.m., J C Nash wrote:
>>> > Should try() not stop those checks from forcing an error?
>>>
>>> try(stop("msg"))  will print the error message, but won't stop
>>> execution.  Presumably the printed message is what is causing you
>>> problems.  If you want to suppress that, use
>>>
>>> try(stop("msg"), silent = TRUE)
>>>
>>> Duncan Murdoch
>>>
>>> >
>>> > I recognize that this is the failure -- it is indeed the check
>>> I'm trying to
>>> > catch -- but I don't want tests of such checks to fail my package.
>>> >
>>> > JN
>>> >
>>> > On 2019-06-07 9:31 a.m., Sebastian Meyer wrote:
>>> >> The failure stated in the R CMD check failure report is:
>>> >>
>>> >>>   --- failure: length > 1 in coercion to logical ---
>>> >>
>>> >> This comes from --as-cran performing useful extra checks via
>>> setting the
>>> >> environment variable _R_CHECK_LENGTH_1_LOGIC2_, which means:
>>> >>
>>> >>> check if either argument of the binary operators && and || has
>>> length greater than one.
>>> >>
>>> >> (see

Re: [Rd] [External] undefined symbol errors when compiling package using ALTREP API

2019-06-05 Thread Tierney, Luke
For now you can use

R_altrep_inherits(x, R_compact_intseq_class)

The variable R_compact_intseq_class should currently be visible to
packages on all platforms, though that may change if we eventually
provide a string-based lookup mechanism, e.g. somehting like

R_find_altrep_class("compact_intseq", "base")

Best,

luke


On Tue, 4 Jun 2019, Mark Klik wrote:

> Hi Gabriel,
>
> thanks for your detailed explanation, that definitely clarifies the design
> choices that were made in setting up the ALTREP framework and I can see how
> those choices make sure existing code won't break.
>
> My specific use-case for wanting to check whether a vector is an ALTREP is
> the following: the fst package wraps an external C++ library (fstlib,
> independent from R) that was made for high speed serialization of
> dataframe's. Sequences are fairly common in dataframe's and I'm planning to
> add the concept of a sequence to the (R-agnostic) fst format. When I can
> detect, e.g. a 'compact_intseq' ALTREP vector and just retrieve it's 3
> integer internal representation, serialization could be very fast.
> Alternatively, as you describe, the vector needs to be expanded first
> before serialization, which will actually be slower than using an already
> expanded vector and can take a lot of RAM for large datasets.
>
> So being able to make use of the internal representation of (a few of the)
> base ALTREP vectors can be very interesting for (non-R) serialization
> schemes.
>
> thanks for your time!
> Mark
>
>
> On Tue, Jun 4, 2019 at 11:50 PM Gabriel Becker 
> wrote:
>
>> Hi Mark,
>>
>> So depending pretty strongly on what you mean by "ALTREP aware", packages
>> aren't necessarily supposed to be ALTREP aware. What I mean by this is that
>> as of right now, ALTREP objects are designed to be interacted with by
>> non-ALTREP-implementing package code, *more-or-less *exactly as standard
>> (non-AR) SEXPs are: via the published C API. The more or less comes from
>> the fact that in some cases, doing things that are good ideas on standard
>> SEXPS will work, but may not be a good idea for ALTREPs.
>>
>> The most "low-hanging-fruit" example of something that was best practice
>> for standard vectors but is not a good idea for ALTREP vectors is grabbing
>> a DATAPTR and iterating over the values without modification in a tight
>> loop.  This will work (absent allocation  failure or, I suppose, the ALTREP
>> being specifically designed to refuse to give you a full DATAPTR), but with
>> ALTREP in place its no longer what you want to do.
>>
>> That said, you don't want to check whether something is an ALTREP yourself
>> and branch your code, what you want to do is use the ITERATE_BY_REGION
>> macro in R_ext/Itermacros.h for ALL SEXPs, which will be nearly as for
>> standard vectors and work safely for ALTREP vectors.
>>
>> Basically any time you find yourself wanting to check if something is an
>> ALTREP and if so, call a specific ALT*_BLAH method, the intention is that
>> there should be a universal API point you can call which will work for both
>> types.
>>
>> This is true, e.g., of INTEGER_IS_SORTED (which will always work and just
>> returns UNKNOWN_SORTEDNESS, ie INT_MIN, ie NA_INTEGER for non-ALTREPs).,
>> for REAL_GET_REGION, (which populates a double* with the requested values
>> for both standard and ALTREP REALSXPs), etc.
>>
>> Does the above make sense?
>>
>> If you feel a universal API point is missing, you can raise that here,
>> though I can't promise that will ultimately result in the method being
>> added.
>>
>> Best,
>> ~G
>>
>> On Tue, Jun 4, 2019 at 2:22 PM Mark Klik  wrote:
>>
>>> thanks for clearing that up, so these methods are actually not meant to be
>>> exported on Windows and OSX?
>>> Some of the ALTREP methods that now use 'attribute_hidden' would be very
>>> useful to packages that aim to be ALTREP aware, should the currently
>>> (exported) API be considered final?
>>>
>>> thanks  for your time & best,
>>> Mark
>>>
>>> On Tue, Jun 4, 2019 at 6:52 PM Tierney, Luke 
>>> wrote:
>>>
>>>> On Tue, 4 Jun 2019, Mark Klik wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm developing a package (lazyvec) that makes full use of the ALTREP
>>>>> framework (R >= 3.6.0).
>>>>> One application of the package is to wrap existing ALTREP vectors in a
>>>> new
>>>>> ALTREP vector and pass all calls from R to the contained obj

Re: [Rd] [External] undefined symbol errors when compiling package using ALTREP API

2019-06-04 Thread Tierney, Luke
On Tue, 4 Jun 2019, Mark Klik wrote:

> Hello,
>
> I'm developing a package (lazyvec) that makes full use of the ALTREP
> framework (R >= 3.6.0).
> One application of the package is to wrap existing ALTREP vectors in a new
> ALTREP vector and pass all calls from R to the contained object. The
> purpose of this is to provide a diagnostic framework for working with
> ALTREP vectors and show information about internal calls.
>
> The package builds on Windows and OSX but fails to build on Linux as can be
> seen from the link to the Travis build:
> https://travis-ci.org/fstpackage/lazyvec/jobs/539442806
>
> The reason of build failure is that many ALTREP methods generate 'undefined
> symbol' errors upon building the package (on Linux). I've checked the R
> source code and the undefined symbols seems to be related to the
> 'attribute_hidden' before the function definition. For example, the method
> 'ALTVEC_EXTRACT_SUBSET' is defined as:
>
> SEXP attribute_hidden ALTVEC_EXTRACT_SUBSET(SEXP x, SEXP indx, SEXP call)
>
> My question is why these differences between Windows / OSX and Linux exist
> and if they are intentional?

It is intentional that this not be part of the public API. This is
true of almost all functions with an ALTREP prefix. You need a
different approach that avoids using these directly.

Best,

luke

> Do I need special build parameters to make sure my package builds correctly
> on Linux?
>
> thanks for all the hard work!
>
> best,
> Mark
>
> PS: some additional info:
>
> package github repository: https://github.com/fstpackage/lazyvec
> AppVeyor package build logs:
> https://ci.appveyor.com/project/fstpackage/lazyvec
> Travis package build logs: https://travis-ci.org/fstpackage/lazyvec/builds
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: print.() not called when autoprinting

2019-05-22 Thread Tierney, Luke
On Wed, 22 May 2019, Lionel Henry wrote:

> Hi Martin,
>
>> On 22 May 2019, at 03:50, Martin Maechler  wrote:
>>
>> I'm pretty sure that all teaching and documentation about S and R
>> has suggested that  print(f)  and auto-printing should result in
>> the same output _ AFAIR also for S4 objects

Looks like we intend auto-printing to do what evaluating the expression

if (base::isS4(base::.Last.value)) methods::show(base::.Last.value) else 
base::print(base::.lastValue)

would do. The simplest approach would be to save this expression and
pass it to eval, or define an equivalent .autoprint function. That
would avoid code duplication and divergence.

I'm guessing the main reason we don't do this is that going through
the print() closure bumps NAMED on the value and forces a duplicate on
a subsequent assignment. This issue would disappear if we could manage
the transition to reference counting. In the meantime it would seem
best to keep any changes consistent with evaluating this expression.

Best,

luke

>
> I agree with the principle that autoprint and print() should be
> equivalent for users. However it also seems that print calls in
> packages should be independent of user customisations. For instance a
> package author might gather tabular data in a matrix or data frame and
> print() it as part of a larger print method. In that case, user
> customisations might cause a mess.
>
> Would it make sense to resort to autoprint customisation when the
> topenv() of the parent.frame() of print() is the global environment,
> and ignore the customisation otherwise? This should ensure consistent
> printing behaviour at the REPL and in scripts. Checking the topenv()
> allows print() calls inserted to debug lapply'd functions to behave the
> same as when called from top level or within a loop.
>
> Lionel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] make running on.exit expr uninterruptible

2019-05-22 Thread Tierney, Luke
suspendInterrupts has dynamic extent, so you need to make sure it
covers the entire computation. Defining your f like this is one option:

f <- function() {
 ff <- function() {
 on.exit(cntr_on.exit <<- cntr_on.exit + 1L)
 cntr_f <<- cntr_f + 1L
 ## allowInterrupts(... interruptable stuff ...)
 TRUE
 }
 suspendInterrupts(ff())
}

You can move the suspendInterrupts higher up in the computation, but
then it becomes more important to use allowInterrupts at appropriate
points.

It would be possible to have R suspend interrupts around function
calling infrastructure to provide stronger guarantees about
non-interruptable on.exit/finally actions, but there are both upsides
and downsides to doing that.

Best,

luke


On Wed, 22 May 2019, Andreas Kersting wrote:

> Hi,
>
> Is there currently any way to guarantee that on.exit does not fail to execute 
> the recorded expression because of a user interrupt arriving during function 
> exit? Consider:
>
> f <- function() {
>  suspendInterrupts({
>on.exit(suspendInterrupts(cntr_on.exit <<- cntr_on.exit + 1L))
>cntr_f <<- cntr_f + 1L
>  })
>  TRUE
> }
>
> It is possible to interrupt this function such that cntr_f is incremented 
> while cntr_on.exit is not (you might need to adjust timeout_upper to trigger 
> the error on your machine):
>
> timeout_upper <- 0.1
> repeat {
>  cntr_f <- 0L
>  cntr_on.exit <- 0L
>
>  # timeout code borrowed from R.utils::withTimeout but with setTimeLimit()
>  # (correctly) place inside tryCatch (otherwise timeout can occur before it 
> can
>  # be caught) and with time limit reset before going into the error handler
>  res_list <- lapply(seq(0, timeout_upper, length.out = 1000), 
> function(timeout) {
>on.exit({
>  setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
>})
>tryCatch({
>  setTimeLimit(cpu = timeout, elapsed = timeout, transient = TRUE)
>  res <- f()
>
>  # avoid timeout while running error handler
>  setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
>
>  res
>}, error = function(ex) {
>  msg <- ex$message
>  pattern <- gettext("reached elapsed time limit", "reached CPU time 
> limit",
> domain = "R")
>  pattern <- paste(pattern, collapse = "|")
>  if (regexpr(pattern, msg) != -1L) {
>FALSE
>  }
>  else {
>stop(ex)
>  }
>})
>  })
>  print(sum(unlist(res_list)))  # number of times f completed
>  stopifnot(cntr_on.exit == cntr_f)
> }
>
> Example output:
>
> 1] 1000
> [1] 1000
> [1] 1000
> [1] 1000
> [1] 999
> [1] 1000
> [1] 1000
> [1] 999
> [1] 998
> [1] 1000
> [1] 998
> [1] 1000
> [1] 1000
> [1] 1000
> [1] 1000
> [1] 999
> Error: cntr_on.exit == cntr_f is not TRUE
>
> I was bitten by this because an on.exit expression, which releases a file 
> lock, was interrupted (before it actually executed) such that subsequent 
> calls block indefinitely.
>
> Regards,
> Andreas
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Patch to replace "his" in Writing R Extensions

2019-05-21 Thread Tierney, Luke
Thanks. Addressed in r76559 (trunk) and r76560 (R-3-6-branch).

Best,

luke

On Tue, 21 May 2019, Maëlle SALMON via R-devel wrote:

> Dear R-devel team,
>
> Many thanks for the great resource that is "Writing R Extensions"!
>
> I noticed two occurrences of "his", one to refer to the R package user, 
> another to refer to the R package author. Folks in these two groups are not 
> all men, so I suggest changing the word to "their" to make it gender-neutral. 
> Attached is a patch for your consideration.
>
> Thanks for your time, best regards,
>
> Maëlle.

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] most robust way to call R API functions from a secondary thread

2019-05-20 Thread Tierney, Luke
Your analysis looks pretty complete to me and your solutions seems
plausible.  That said, I don't know that I would have the level of
confidence yet that we haven't missed an important point that I would
want before going down this route.

Losing stack checking is risky; it might be eventually possible to
provide some support for this to be handled via a thread-local
variable. Ensuring that R_ToplevelExec can't jump before entering the
body function would be a good idea; if you want to propose a patch we
can have a look.

Best,

luke

On Sun, 19 May 2019, Andreas Kersting wrote:

> Hi,
>
> As the subject suggests, I am looking for the most robust way to call an 
> (arbitrary) function from the R API from another but the main POSIX thread in 
> a package's code.
>
> I know that, "[c]alling any of the R API from threaded code is ‘for experts 
> only’ and strongly discouraged. Many functions in the R API modify internal R 
> data structures and might corrupt these data structures if called 
> simultaneously from multiple threads. Most R API functions can signal errors, 
> which must only happen on the R main thread." 
> (https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support)
>
> Let me start with my understanding of the related issues and possible 
> solutions:
>
> 1) R API functions are generally not thread-safe and hence one must ensure, 
> e.g. by using mutexes, that no two threads use the R API simultaneously
>
> 2) R uses longjmps on error and interrupts as well as for condition handling 
> and it is undefined behaviour to do a longjmp from one thread to another; 
> interrupts can be suspended before creating the threads by setting 
> R_interrupts_suspended = TRUE; by wrapping the calls to functions from the R 
> API with R_ToplevelExec(), longjmps across thread boundaries can be avoided; 
> the only reason for R_ToplevelExec() itself to fail with an R-style error 
> (longjmp) is a pointer protection stack overflow
>
> 3) R_CheckStack() might be executed (indirectly), which will (probably) 
> signal a stack overflow because it only works correctly when called form the 
> main thread (see 
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Threading-issues);
>  in particular, any function that does allocations, e.g. via allocVector3() 
> might end up calling it via GC -> finalizer -> ... -> eval; the only way 
> around this problem which I could find is to adjust R_CStackLimit, which is 
> outside of the official API; it can be set to -1 to disable the check or be 
> changed to a value appropriate for the current thread
>
> 4) R sets signal handlers for several signals and some of them make use of 
> the R API; hence, issues 1) - 3) apply; signal masks can be used to block 
> delivery of signals to secondary threads in general and to the main thread 
> while other threads are using the R API
>
>
> I basically have the following questions:
>
> a) Is my understanding of the issues accurate?
> b) Are there more things to consider when calling the R API from secondary 
> threads?
> c) Are the solutions proposed appropriate? Are there scenarios in which they 
> will fail to solve the issue? Or might they even cause new problems?
> d) Are there alternative/better solutions?
>
> Any feedback on this is highly appreciated.
>
> Below you can find a template which, combines the proposed solutions (and 
> skips all non-illustrative checks of return values). Additionally, 
> R_CheckUserInterrupt() is used in combination with R_UnwindProtect() to 
> regularly check for interrupts from the main thread, while still being able 
> to cleanly cancel the threads before fun_running_in_main_thread() is left via 
> a longjmp. This is e.g. required if the secondary threads use memory which 
> was allocated in fun_running_in_main_thread() using e.g. R_alloc().
>
> Best regards,
> Andreas Kersting
>
>
>
> #include 
> #include 
> #include 
> #include 
>
> extern uintptr_t R_CStackLimit;
> extern int R_PPStackTop;
> extern int R_PPStackSize;
>
> #include 
> LibExtern Rboolean R_interrupts_suspended;
> LibExtern int R_interrupts_pending;
> extern void Rf_onintr(void);
>
> // mutex for exclusive access to the R API:
> static pthread_mutex_t r_api_mutex = PTHREAD_MUTEX_INITIALIZER;
>
> // a wrapper arround R_CheckUserInterrupt() which can be passed to 
> R_UnwindProtect():
> SEXP check_interrupt(void *data) {
>  R_CheckUserInterrupt();
>  return R_NilValue;
> }
>
> // a wrapper arround Rf_onintr() which can be passed to R_UnwindProtect():
> SEXP my_onintr(void *data) {
>  Rf_onintr();
>  return R_NilValue;
> }
>
> // function called by R_UnwindProtect() to cleanup on interrupt
> void cleanfun(void *data, Rboolean jump) {
>  if (jump) {
>// terminate threads cleanly ...
>  }
> }
>
> void fun_calling_R_API(void *data) {
>  // call some R API function, e.g. mkCharCE() ...
> }
>
> void *threaded_fun(void *td) {
>
>  // ...
>
>  pthread_mutex_lock(_api_mutex);
>
>  // avoid false stack overflow error:

Re: [Rd] [External] ALTREP: Bug reports

2019-05-16 Thread Tierney, Luke
On Thu, 16 May 2019, 介非王 wrote:

> Hello,
>
> I have encountered two bugs when using ALTREP APIs.
>
> 1. STDVEC_DATAPTR
>
> From RInternal.h file it has a comment:
>
> /* ALTREP support */
>> void *(STDVEC_DATAPTR)(SEXP x);
>
>
> However, this comment might not be true, the easiest way to verify it is to
> define a C++ function:
>
> void C_testFunc(SEXP a)
>> {
>> STDVEC_DATAPTR(a);
>> }
>
>
> and call it in R via
>
>> a=1:10
>>> C_testFunc(a)
>> Error in C_testFunc(a) : cannot get STDVEC_DATAPTR from ALTREP object
>
>
> We can inspect the internal type and call ALTREP function to check if it
> is an ALTREP:
>
>> .Internal(inspect(a))
>> @0x1b5a3310 13 INTSXP g0c0 [NAM(7)]  1 : 10 (compact)
>>> #This is a wrapper of ALTREP
>>> is.altrep(a)
>> [1] TRUE
>
>
> I've also defined an ALTREP type and it did not work either. I guess this
> might be a bug? Or did I miss something?

STDVEC_DATAPTR returns the data pointer of a standard (non-ALTREP)
vector.  It should not be necessary to use it in package code; if you
callit on an ALTREP you are likely to get a segfault.

>
> 2. Wrapper objects in ALTREP
>
> If the duplicate function is defined to return the object itself:

Don't do that. Mutable objects don't work. Look at the vignette in
https://github.com/ALTREP-examples/Rpkg-mutable for more on this.

Best,

luke

>
> SEXP vector_dulplicate(SEXP x, Rboolean deep) {
> return(x);
> }
>
> In R an ALTREP object will behave like an environment (pass-by-reference).
> However, if we do something like(pseudo code):
>
> n=100
>> x=runif(n)
>> alt1=createAltrep(x)
>> alt2=alt1
>> alt2[1]=10
>> .Internal(inspect(alt1))
>> .Internal(inspect(alt2))
>
>
> The result would be:
>
>> .Internal(inspect(alt1))
>> @0x156f4d18 14 REALSXP g0c0 [NAM(7)]
>>> .Internal(inspect(alt2 ))
>> @0x156a33e0 14 REALSXP g0c0 [NAM(7)]  wrapper
>> [srt=-2147483648,no_na=0]
>>   @0x156f4d18 14 REALSXP g0c0 [NAM(7)]
>
>
> It seems like the object alt2 automatically gets wrapped by R. Although at
> the R level it seems fine because there are no differences between alt1 and
> alt2, if we define a C function as:
>
> SEXP C_peekSharedMemory(SEXP x) {
>> return(R_altrep_data1(x));
>
> }
>
>
> and call it in R to get the internal data structure of an ALTREP object.
>
> C_peekSharedMemory(alt1)
>> C_peekSharedMemory(alt2)
>
>
> The first one correctly returns its internal data structure, but the second
> one returns the ALTREP object it wraps since the wrapper itself is an
> ALTREP. This behavior is unexpected. Since the dulplicate function returns
> the object itself, I will expect alt1 and alt2 should be the same object.
> Even if they are essentially not the same, calling the same function should
> at least return the same result. Other than that, It seems like R does not
> always wrap an ALTREP object. If we change n from 100 to 10 and check the
> internal again, alt2 will not get wrapped. This makes the problem even more
> difficult since we cannot predict when would the wrapper appear.
>
> Here is the source code for the wrapper:
> https://github.com/wch/r-source/blob/trunk/src/main/altclasses.c#L1399
>
> Here is a working example if one can build the sharedObject package from
> https://github.com/Jiefei-Wang/sharedObject
>
> n=100
>> x=runif(n)
>> so1=sharedObject(x,copyOnWrite = FALSE)
>> so2=so1
>> so2[1]=10
>> .Internal(inspect(so1))
>> .Internal(inspect(so2))
>
>
> Here is my session info:
>
> R version 3.6.0 alpha (2019-04-08 r76348)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows >= 8 x64 (build 9200)
>> Matrix products: default
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252
>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>
>> [5] LC_TIME=English_United States.1252
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>> other attached packages:
>> [1] sharedObject_0.0.99
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.0 tools_3.6.0Rcpp_1.0.1
>
>
> Best,
> Jiefei
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: ALTREP: Design concept of alternative string

2019-05-10 Thread Tierney, Luke
On Fri, 10 May 2019, 介非王 wrote:

> Hi Gabriel,
>
> Thanks for your explanation, I totally understand that it is almost
> impossible to change the data structure of STRSXP. However, what I'm
> proposing is not about changing the internal representation, but rather
> about how we design and use the ALTREP API.
>
> I might do not state the workarounds clearly as English is not my first
> language. Please let me explain them again in detail.
>
> 1. Update the existing R functions. When the ALTREP API Dataptr_or_null
> returns NULL, use get_element instead(or as best as we can). I have seen
> this pattern for some R functions, but somehow there are still some
> functions left that do not follow this rule. For example, print function
> will blindly call Dataptr (It even did not call Dataptr_or_null first) and
> forces me to allocate a large chunk of memory in R. Updating these
> functions would not completely solve the problem we are discussing but will
> make it less serious.

Fixing print() is pretty high priority (I thought we had done so for R
3.6.0 but apparently not). Others will come in over time; filing a
request with bugzilla is one way to push up priority for a particular
function or set of functions.

Keep in mind that one option for your implementation is to signal an
error if a data pointer is requested. You could make that dependent on
some sort of option setting or make the error continuable by providing
a restart.

> 2. Update the ALTREP API, return a vector of const char *, and internally
> wrap them as CHARSXP. This can be a way to "hack" the R data structure with
> only a little cost to create the CHARSXP header.

That doesn't seem feasible but I may not be understanding what you mean.

> 3. Provide character ALTREP. Instead of using string ALTREP, we can define
> an alternative CHARSXP. By doing it we will completely solve the problem
> since the return value of the Dataptr of CHARSXP is a const char*. We do
> not have to change any internal representation of characters, it just
> requires a remap of the DATAPTR macro( or function?).

Allowing ALTREP CHARSXP objects might be something to consider in the
future, but the combination of caching and encoding issues make that
very complex. I'm nat sure it would be a good idea or even
feasible. In any case it won't happen anytime soon.

Best,

luke

>
> Again, I sincerely appreciate your time and the detailed you provided. I'm
> looking forward to seeing any method to solve this problem in the current
> and future R release.
>
> Best,
> Jiefei
>
> Gabriel Becker  于2019年5月9日周四 下午2:07写道:
>
>> Hi Jiefei,
>>
>> The issue here is that while the memory consequences of what you're
>> describing may be true, this is simply how R handles character vector (what
>> you're calling string) values internally. It doesn't actually have anything
>> to do with ALTREP. Standard character vector SEXPs have an array of CHARSXP
>> pointers in their payload (what is returned by DATAPTR) as well.
>>
>> As far as I know, this is important for string caching  and is actually
>> intended to save memory when the same string value appears many times in an
>> R session (and takes up more bytes than a pointer), though I haven't dug
>> around R's low-level string handling a ton. Either way though, this would
>> be a much much larger change than just changing the ALTREP API (which for
>> things like this explicitly and intentionally matches how the C api behaves
>> for non-ALTREP SEXPs for compatability).
>>
>> Likewise the reason that get_element is going to return a CHARSXP, is
>> because that is what STRING_ELT(x, i) returns (equivalent to (SEXP)
>> DATAPTR(x)[i] ), so I don't think that can be changed either.
>>
>> One other thing to note, though, is that if your'e asking for the dataptr
>> (and it isn't read only) then you're basically stepping out of ALTREP space
>> anyway, so it makes sense that a normally laid-out STRSXP (with it's
>> CHARSXP payload).
>>
>> Best,
>> ~G
>>
>> On Thu, May 9, 2019 at 8:09 AM 介非王  wrote:
>>
>>> Hello from Bioconductor,
>>>
>>> I'm developing a package to share R objects across clusters using boost
>>> library. The concept is similar to mmap package:
>>> https://cran.r-project.org/web/packages/mmap/index.html . However, I
>>> have a
>>> problem when I was trying to write Dataptr_method for the alternative
>>> string.
>>>
>>> Based on my understanding, the return value of the Dataptr_method function
>>> should be a vector of CHARSXP pointers. This design might be problematic
>>> in
>>> two ways:
>>>
>>> 1. The behavior of Dataptr_method function is inconsistent for string and
>>> the other ALTREP types. For the other types we return a vector of pure
>>> data
>>> in memory allocated outside of R, but for the string, we return a vector
>>> of
>>> R objects allocated by R.
>>>
>>> 2. It causes an unnecessary duplication of the data. In order to return
>>> CHARSXPs to R, It forces me to allocate CHARSXPs and copy the entire data
>>> to the R 

Re: [Rd] [External] Re: Background R session on Unix and SIGINT

2019-04-30 Thread Tierney, Luke
A Simon pointed out the interrupt is recorded but not processed until
a safe point.

When reading from a fifo or pipe R runs non-interactive, which means
is sits in a read() system call and the interrupt isn't seen until
sometime during evaluation when a safe checkpoint is reached.

When reading from a terminal R will use select() to wait for input and
periodically wake and check for interrupts. In that case the interrupt
will probably be seen sooner.

If the interactive behavior is what you want you can add --interactive
to the arguments used to start R.

Best,

luke

On Tue, 30 Apr 2019, Gábor Csárdi wrote:

> OK, I managed to create an example without callr, but it is still
> somewhat cumbersome. Anyway, here it is.
>
> Terminal 1:
> mkfifo fif
> R --no-readline --slave --no-save --no-restore < fif
>
> Terminal 2:
> cat > fif
> Sys.getpid()
>
> This will make Terminal 1 print the pid of the R process, so we can
> send a SIGINT:
>
> Terminal 3:
> kill -INT pid
>
> The R process is of course still running happily.
>
> Terminal 2 again:
> tryCatch(Sys.sleep(10), interrupt = function(e) e)
>
> and then Terminal 1 prints the interrupt condition:
> 
>
> This is macOS and 3.5.3, although I don't think it matters much.
>
> Thanks much!
> G.
>
> On Tue, Apr 30, 2019 at 9:50 PM Simon Urbanek
>  wrote:
>>
>> Can you give an example without callr? The key is how is the process stated 
>> and what it is doing which is entirely opaque in callr.
>>
>> Windows doesn't have signals, so the process there is entirely different. 
>> Most of the WIN32 processing is event-based.
>>
>> Cheers,
>> Simon
>>
>>
>>> On Apr 30, 2019, at 4:17 PM, Gábor Csárdi  wrote:
>>>
>>> Yeah, I get that they are async.
>>>
>>> What happens is that the background process is not doing anything when
>>> the process gets a SIGINT. I.e. the background process is just
>>> listening on its standard input.
>>>
>>> AFAICT for an interactive process such a SIGINT is just swallowed,
>>> with a newline outputted to the terminal.
>>>
>>> But apparently, for this background process, it is not swallowed, and
>>> it is triggered later. FWIW it does not happen on Windows, not very
>>> surprisingly.
>>>
>>> Gabor
>>>
>>> On Tue, Apr 30, 2019 at 9:13 PM Simon Urbanek
>>>  wrote:

 Interrupts are not synchronous in R - the signal only flags the request 
 for interruption. Nothing actually happens until R_CheckUserInterrupt() is 
 called at an interruptible point. In you case your code is apparently not 
 calling R_CheckUserInterrupt() until later as a side-effect of the next 
 evaluation.

 Cheers,
 Simon


> On Apr 30, 2019, at 3:44 PM, Gábor Csárdi  wrote:
>
> Hi All,
>
> I realize that this is not a really nice reprex, but anyone has an
> idea why a background R session would "remember" an interrupt (SIGINT)
> on Unix?
>
> rs <- callr::r_session$new()
> rs$interrupt() # just sends a SIGINT
> #> [1] TRUE
>
> rs$run(function() 1+1)
> #> Error: interrupt
>
> rs$run(function() 1+1)
> #> [1] 2
>
> It seems that the main loop somehow stores the SIGINT it receives
> while it is waiting on stdin, and then it triggers it when some input
> comes in Maybe. Just speculating
>
> Thanks,
> Gabor
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

>>>
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Status of R_unif_index

2019-04-04 Thread Tierney, Luke
Seems reasonable, done now.

Best,

luke

On Fri, 22 Mar 2019, Ralf Stubner wrote:

> Dear List,
>
> section "6.3 Random number generation" of WRE [1] lists unif_rand(),
> norm_rand() and exp_rand() as the interface to R's RNG. Now
> R_ext/Random.h also has
>
>double R_unif_index(double);
>
> Can this be also treated as an official API function that may be called
> from a package?
>
> Thanks
> Ralf
>
> [1]
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-numbers
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: stopifnot -- eval(*) inside for()

2019-04-01 Thread Tierney, Luke
On Mon, 1 Apr 2019, Martin Maechler wrote:

>> Suharto Anggono Suharto Anggono via R-devel
>> on Sun, 31 Mar 2019 15:26:13 + writes:
>
>> Ah, with R 3.5.0 or R 3.4.2, but not with R 3.3.1, 'eval'
>> inside 'for' makes compiled version behave like
>> non-compiled version.
>
> Ah.. ... thank you for detecting that  " eval() inside for()" behaves
> specially  in how error message get a call or not.

Don't count on that remaining true indefinitely. The standard behavior
is better and we'll eventually get the case where 'eval' and a few
others are called to behave the same.

Best,

luke

> Let's focus only on this issue here.
>
> I'm adding a 0-th case to make even clearer what you are saying:
>
>  >  options(error = expression(NULL))
>  >  library(compiler)
>  >  enableJIT(0)
>
>  > f0 <- function(x) { x ; x^2 } ; f0(is.numeric(y))
>  Error in f0(is.numeric(y)) (from #1) : object 'y' not found
>  > (function(x) { x ; x^2 })(is.numeric(y))
>  Error in (function(x) { (from #1) : object 'y' not found
>  > f0c <- cmpfun(f0) ; f0c(is.numeric(y))
>
> so by default, not only the error message but the originating
> call is shown as well.
>
> However, here's your revealing examples:
>
>  > f <- function(x) for (i in 1) {x; eval(expression(i))}
>  > f(is.numeric(y))
>  > # Error: object 'y' not found
>  > fc <- cmpfun(f)
>  > fc(is.numeric(y))
>  > # Error: object 'y' not found
>
> I've tried more examples and did not find any difference
> between simple interpreted and bytecompiled code {apart
> from "keep.source=TRUE" keeping source, sometimes visible}.
> So I don't understand yet why you think the byte compiler plays
> a role.
>
> Rather the crucial difference seems  the error happens inside a
> loop which contains an explicit eval(.), and that eval() may
> even be entirely unrelated to the statement in which the error
> happens [above: The error happens when the promise 'x' is
> evaluated, *before* eval() is called at all].
>
>
>> Is this accidental feature going to be relied upon?
>
>[i.e.  *in  stopifnot() R code (which in R-devel and R 3.5.x has
>had an eval() inside the for()-loop)]
>
> That is a good question.
> What I really like about the R-devel case:  We do get errors
> signalled that do *not* contain the full stopifnot() call.
>
> With the newish introduction of the `exprs = { ... ... }` variant,
> it is even more natural to have large `exprs` in a stopifnot() call,
> and when there's one accidental error in there, it's quite
> unhelpful to see the full stopifnot(..) call {many lines
> of R code} obfuscating the one statement which produced the
> error.
>
> So it seems I am asking for a new feature in R,
> namely to temporarily say: Set the call to errors to NULL "in
> the following".
> In R 3.5.x, I had used withCallingHandlers(...) to achieve that
> and do even similar for warnings... but needed to that for every
> expression and hence inside the for loop  and the consequence
> was a relatively large slowdown of stopifnot()..  which
> triggered all the changes since.
>
> Whereas what we see here ["eval() inside for()"] is a cheap
> automatic suppression of 'call' for the "internal errors", i.e.,
> those we don't trigger ourselves via stop(simplError(...)).
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] [External] different generator for random integers in R-devel

2019-03-19 Thread Tierney, Luke
Use RNGversion("3.5.0") instead.

Best,

luke


On Tue, 19 Mar 2019, Stadler, Michael wrote:

> Dear maintainers of the BioC build system
>
> I have a shy questions regarding R-devel versions on the build machines:
>
> Currently, these use different versions of R-devel: 
> malbec2 | Linux ...   | R Under development (unstable) (2019-01-21 r75999)
> tokay2  | Windows ... | R Under development (unstable) (2019-03-09 r76216)
> merida2 | OS X ...| R Under development (unstable) (2018-11-27 r75683)
> celaya2 | OS X ...| R Under development (unstable) (2019-01-22 r76000)
>
> A recent change in R-devel
> (mostly in r76160, see 
> https://github.com/wch/r-source/commit/e4acfd1f240a42b3fc1474a4d97017114e4eb053)
> has changed the behaviour of sample()
> (see also https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17494 and 
> https://stat.ethz.ch/pipermail/r-devel/2018-September/076817.html).
>
> In order to maintain backward compatibility, we are using RNGkind(..., 
> sample.kind = "Rounding")
> to temporarily fall back to the behaviour of R-3.5. However, this will cause 
> an error on R-devel < r76160,
> where RNGkind() does not yet understand the "sample.kind" argument (currently 
> malbec2, merida2 and celaya2).
>
> This will obviously resolve itself once R-devel gets updated on the build 
> machines, latest upon release of BioC 3.9.
> I am just curious whether R-devel will be updated on the builders before 
> that, so this issue does not mask other
> potential problems in our package (QuasR).
>
> Thanks,
> Michael
>
>
>
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

2019-03-06 Thread Tierney, Luke
This is now fixed in R-devel.

Best,

luke

On Sun, 3 Mar 2019, Berry, Charles wrote:

> When  `length( skewed.probs ) > 200' uniform samples are generated in R-devel.
>
> R-3.5.1 behaves as expected.
>
> `epsilon` can be a lot bigger than illustrated and still the uniform 
> distribution is produced.
>
>
> Chuck
>
>> set.seed(123)
>>
>> epsilon <- 1e-10
>>
>> ## uniform to 200 then small
>> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200)))
>> ## uniform to 201 then small
>> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201)))
>>
>> brks  <- c(0,99,199,200,201,Inf)
>> tab200 <- sample( length(p200), 1, prob=p200, replace=TRUE)
>> tab201 <- sample( length(p201), 1, prob=p201, replace=TRUE)
>>
>> cbind(
> +   s200=table(cut(tab200, brks)),
> +   p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 1 ,1),
> +   s201=table(cut(tab201, brks )),
> +   p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 1 ,1))
>  s200 p200 s201   p201
> (0,99]5017 4950  984 4925.4
> (99,199]  4925 5000  959 4975.1
> (199,200]   58   509   49.8
> (200,201]006   49.8
> (201,Inf]00 80420.0
>>
>>
>>
>>
>> sessionInfo()
> R Under development (unstable) (2019-03-02 r76189)
> Platform: x86_64-apple-darwin18.2.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib
> LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

2019-03-03 Thread Tierney, Luke
Thanks. We'll need to look into how best to address this.

Best,

luke

On Sun, 3 Mar 2019, Berry, Charles wrote:

> When  `length( skewed.probs ) > 200' uniform samples are generated in R-devel.
>
> R-3.5.1 behaves as expected.
>
> `epsilon` can be a lot bigger than illustrated and still the uniform 
> distribution is produced.
>
>
> Chuck
>
>> set.seed(123)
>>
>> epsilon <- 1e-10
>>
>> ## uniform to 200 then small
>> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200)))
>> ## uniform to 201 then small
>> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201)))
>>
>> brks  <- c(0,99,199,200,201,Inf)
>> tab200 <- sample( length(p200), 1, prob=p200, replace=TRUE)
>> tab201 <- sample( length(p201), 1, prob=p201, replace=TRUE)
>>
>> cbind(
> +   s200=table(cut(tab200, brks)),
> +   p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 1 ,1),
> +   s201=table(cut(tab201, brks )),
> +   p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 1 ,1))
>  s200 p200 s201   p201
> (0,99]5017 4950  984 4925.4
> (99,199]  4925 5000  959 4975.1
> (199,200]   58   509   49.8
> (200,201]006   49.8
> (201,Inf]00 80420.0
>>
>>
>>
>>
>> sessionInfo()
> R Under development (unstable) (2019-03-02 r76189)
> Platform: x86_64-apple-darwin18.2.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib
> LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] issue with sample in R 3.6.0.

2019-03-01 Thread Tierney, Luke
Thanks; fixed now in R-devel. Not an issue on Linux/Mac OS but is on
64-bit Windows, where sizeof(long) is 4.

Best,

luke

On Fri, 1 Mar 2019, Joseph Wood wrote:

> Hello,
>
> I think there is an issue in the sampling rejection algorithm in R 3.6.0.
>
> The do_sample2 function in src/main/unique.c still has 4.5e15 as an
> upper limit, implying that numbers greater than INT_MAX are still to
> be supported by sample in base R.
>
> Please review the examples below:
>
> set.seed(123)
> max(sample(2^31, 1e5))
> [1] 2147430096
>
> set.seed(123)
> max(sample(2^31 + 1, 1e5))
> [1] 1
>
> set.seed(123)
> max(sample(2^32, 1e5))
> [1] 1
>
> set.seed(123)
> max(sample(2^35, 1e5))
> [1] 8
>
> set.seed(123)
> max(sample(2^38, 1e5))
> [1] 64
>
> set.seed(123)
> max(sample(2^38, 1e5))
> [1] 64
>
> set.seed(123)
> max(sample(2^42, 1e5))
> [1] 1024
>
> From the above, we see that if N is greater than 2^31, then N is
> bounded by (2^(ceiling(log2(N)) – 32)).
>
> Looking at the source code to src/main/RNG.c, we have the following:
>
> static double rbits(int bits)
> {
>int_least64_t v = 0;
>for (int n = 0; n <= bits; n += 16) {
>int v1 = (int) floor(unif_rand() * 65536);
>v = 65536 * v + v1;
>}
>// mask out the bits in the result that are not needed
>return (double) (v & ((1L << bits) - 1));
> }
>
> The last line has (v & ((1L << bits) - 1)) where v is declared as
> int_least64_t. If you notice, we are operating on v with the long
> integer literal 1L. I’m pretty sure this is the source of the issue.
> By changing 1L to at least a 64 bit integer, it appears that we
> correct the problem:
>
> double rbits(int bits)
> {
>int_least64_t v = 0;
>for (int n = 0; n <= bits; n += 16) {
>int v1 = (int) floor(unif_rand() * 65536);
>v = 65536 * v + v1;
>}
>
>int_least64_t one64 = 1L;
>// mask out the bits in the result that are not needed
>return (double) (v & ((one64 << bits) - 1));
> }
>
> Regards,
> Joseph Wood
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Tierney, Luke
Fixed in R-devel and R-patched.

Best,

luke

On Wed, 27 Feb 2019, Tierney, Luke wrote:

> Thanks for the report. Should be fixed shortly.
>
> Best,
>
> luke
>
> On Tue, 26 Feb 2019, Brian Montgomery via R-devel wrote:
>
>> The following code crashes after about 300 iterations on my 
>> x86_64-w64-mingw32 machine on R 3.5.2 --vanilla.  
>> Others have duplicated this (see 
>> https://github.com/tidyverse/magrittr/issues/190 if necessary), but I don't 
>> know how machine/OS-dependent it may be.  
>> If it doesn't crash for you, please try increasing the length of the x 
>> vector.
>>
>> Substituting the commented-out line for the one below it works correctly 
>> (prints out 1:1000 and ends normally) every time.
>>
>> x <- 1:20
>> y <- rep(letters[1:5], length(x) / 5L)
>> for (i in 1:1000) {
>>   # x[y == 'a'] <- x[y == 'b']
>>   x <- `[<-`(x, y == 'a', x[y == 'b'])
>>   cat(i, '')
>> }
>> cat('\n')
>>
>> The point of using this syntax is to make it work better with pipes, but the 
>> errors occur without pipes or magrittr.
>>
>> Thank you for your help!
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Tierney, Luke
On Tue, 26 Feb 2019, Brian Montgomery via R-devel wrote:

> The following code crashes after about 300 iterations on my 
> x86_64-w64-mingw32 machine on R 3.5.2 --vanilla.  
> Others have duplicated this (see 
> https://github.com/tidyverse/magrittr/issues/190 if necessary), but I don't 
> know how machine/OS-dependent it may be.  
> If it doesn't crash for you, please try increasing the length of the x vector.
>
> Substituting the commented-out line for the one below it works correctly 
> (prints out 1:1000 and ends normally) every time.
>
> x <- 1:20
> y <- rep(letters[1:5], length(x) / 5L)
> for (i in 1:1000) {
>   # x[y == 'a'] <- x[y == 'b']
>   x <- `[<-`(x, y == 'a', x[y == 'b'])
>   cat(i, '')
> }
> cat('\n')
>
> The point of using this syntax is to make it work better with pipes, but the 
> errors occur without pipes or magrittr.

Calling replacement functions this way is a Really Bad Idea. Some
assume they are being called properly and will end up mutating data
they should not when called this way.

Best,

luke

>
> Thank you for your help!
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Tierney, Luke
Thanks for the report. Should be fixed shortly.

Best,

luke

On Tue, 26 Feb 2019, Brian Montgomery via R-devel wrote:

> The following code crashes after about 300 iterations on my 
> x86_64-w64-mingw32 machine on R 3.5.2 --vanilla.  
> Others have duplicated this (see 
> https://github.com/tidyverse/magrittr/issues/190 if necessary), but I don't 
> know how machine/OS-dependent it may be.  
> If it doesn't crash for you, please try increasing the length of the x vector.
>
> Substituting the commented-out line for the one below it works correctly 
> (prints out 1:1000 and ends normally) every time.
>
> x <- 1:20
> y <- rep(letters[1:5], length(x) / 5L)
> for (i in 1:1000) {
>   # x[y == 'a'] <- x[y == 'b']
>   x <- `[<-`(x, y == 'a', x[y == 'b'])
>   cat(i, '')
> }
> cat('\n')
>
> The point of using this syntax is to make it work better with pipes, but the 
> errors occur without pipes or magrittr.
>
> Thank you for your help!
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bias issue in sample() (PR 17494)

2019-02-26 Thread Tierney, Luke
On Tue, 26 Feb 2019, Kirill Müller wrote:

> Ralf
>
>
> I don't doubt this is expected with the current implementation, I doubt the 
> implementation is desirable. Suggesting to turn this to
>
> pbirthday(1e6, classes = 2^53)
> ## [1] 5.550956e-05

That isn't a small number given simulation sizes people routinely run
these days. Just about right to miss an issue in a pilot run and get
bitten on the real one.

In the inversion generator for normals we already use a higher
resolution uniform produced from two regular ones. I considered
switching to that approach for all uniforms, either in addition to or
instead of changing the uniform integer sampling algorithm used in
sample(). But that would have been even more disruptive:

- all simulation results (except normals) would change;
- there would be a performance penalty;
- the streams would be used up twice as fast;

I would also probably be necessary to rethink things like how to use
the L'Ecuyer generator to produce multiple streams in the `parallel`
package.

We may need to take this route in the future, but it didn't seem like
a good idea at this time.

Best,

luke

>
> (which is still non-zero, but much less likely to cause confusion.)
>
>
> Best regards
>
> Kirill
>
> On 26.02.19 10:18, Ralf Stubner wrote:
>> Kirill,
>> 
>> I think some level of collision is actually expected! R uses a 32bit MT
>> that can produce 2^32 different doubles. The probability for a collision
>> within a million draws is
>> 
>>> pbirthday(1e6, classes = 2^32)
>> [1] 1
>> 
>> Greetings
>> Ralf
>> 
>> 
>> On 26.02.19 07:06, Kirill Müller wrote:
>>> Gabe
>>> 
>>> 
>>> As mentioned on Twitter, I think the following behavior should be fixed
>>> as part of the upcoming changes:
>>> 
>>> R.version.string
>>> ## [1] "R Under development (unstable) (2019-02-25 r76160)"
>>> .Machine$double.digits
>>> ## [1] 53
>>> set.seed(123)
>>> RNGkind()
>>> ## [1] "Mersenne-Twister" "Inversion"    "Rejection"
>>> length(table(runif(1e6)))
>>> ## [1] 999863
>>> 
>>> I don't expect any collisions when using Mersenne-Twister to generate a
>>> million floating point values. I'm not sure what causes this behavior,
>>> but it's documented in ?Random:
>>> 
>>> "Do not rely on randomness of low-order bits from RNGs. Most of the
>>> supplied uniform generators return 32-bit integer values that are
>>> converted to doubles, so they take at most 2^32 distinct values and long
>>> runs will return duplicated values (Wichmann-Hill is the exception, and
>>> all give at least 30 varying bits.)"
>>> 
>>> The "Wichman-Hill" bit is interesting:
>>> 
>>> RNGkind("Wichmann-Hill")
>>> length(table(runif(1e6)))
>>> ## [1] 100
>>> length(table(runif(1e6)))
>>> ## [1] 100
>>> 
>>> Mersenne-Twister has a much much larger periodicity than Wichmann-Hill,
>>> it would be great to see the above behavior also for Mersenne-Twister.
>>> Thanks for considering.
>>> 
>>> 
>>> Best regards
>>> 
>>> Kirill
>>> 
>>> 
>>> On 20.02.19 08:01, Gabriel Becker wrote:
>>>> Luke,
>>>> 
>>>> I'm happy to help with this. Its great to see this get tackled (I've
>>>> cc'ed
>>>> Kelli Ottoboni who helped flag this issue).
>>>> 
>>>> I can prepare a patch for the RNGkind related stuff and the doc update.
>>>> 
>>>> As for ???, what are your (and others') thoughts about the possibility of
>>>> a) a reproducibility API which takes either an R version (or maybe
>>>> alternatively a date) and sets the RNGkind to the default for that
>>>> version/date, and/or b) that sessionInfo be modified to capture (and
>>>> display) the RNGkind in effect.
>>>> 
>>>> Best,
>>>> ~G
>>>> 
>>>> 
>>>> On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke 
>>>> wrote:
>>>> 
>>>>> Before the next release we really should to sort out the bias issue in
>>>>> sample() reported by Ottoboni and Stark in
>>>>> https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
>>>>> filed aa a bug report by Duncan Murdoch at
>>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
>>>>> 
>>>>> Her

[Rd] bias issue in sample() (PR 17494)

2019-02-19 Thread Tierney, Luke


Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

 set.seed(123)
 m <- (2/5) * 2^32
 x <- sample(m, 100, replace = TRUE)
 table(x %% 2, x > m / 2)
 ##
 ##FALSE   TRUE
 ## 0 300620 198792
 ## 1 200196 300392

 table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2)
 ##
 ##  0  1
 ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

2019-01-31 Thread Tierney, Luke
You should really take this up with RStudio. Calling object.size on
every top level assignment as they appear to do is a bad idea, even
without ALTREP. object.size is only a cheap operation for simple
atomic vectors. For anything with recursive sturcture it needs to walk
the object, so the effort is proprtional to object size:

> x <- rep("A", 1e8)
> system.time(object.size(x))
user  system elapsed
   1.222   0.624   1.850 
> x <- rep(list(1), 1e8)
> system.time(object.size(x))
user  system elapsed
   1.247   0.022   1.273

The current help for object.size says

  Provides an estimate of the memory that is being used to store an
  R object.

If this is interpreted as the current memory use, which could change
in the ALTREP context (or for environments, though there the changes
are ignored), then we could define object.size for ALTREP objects to
avoid any ALTREP-specific computation. I'm not convinced yet that this
is a good idea, but it even if we do change this at the R level,
RStudio would still be well-advised to have another look at what they
are doing.

Best,

luke

On Tue, 15 Jan 2019, Travers Ching wrote:

>
> Below is a toy alt-rep string example, that generates N random strings:
>
> https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c
>
> example:
> `x <- altrandomStrings(1e8)`
> `head(x)`
> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
> `object.size(1e8)`
>
> Object.size will call the `set_altstring_Elt_method` for every single
> element, materializing (slowly) every element of the vector.  This is
> a problem mostly in R-studio since object.size is called
> automatically, defeating the purpose of alt-rep entirely.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Rprintf in a multi-threaded environment

2019-01-29 Thread Tierney, Luke
No functions in the R API are safe to call from any thread other than
the R main thread.

-- Many may need to allocate from the R heap (more as ALTREP evaolves)
and that is not thread safe;

-- Many (and with some compilation options nearly all) can signal an
error, and the subsequent jump can only work in the main thread.

So: do not make R API calls from any thread other than the main
thread. Ever.

[Separate processes as created by approaches in the parallel package
  do not have these issues, though simple forking as in multicore can
  have other issues as pointed out by Martin Morgan.]

Best,

luke

On Tue, 29 Jan 2019, Yang Liao wrote:

> Hi,
>
> I'm not sure if some C developers have gone through this problem: it seems 
> that Rprintf cannot work safely in a multi-threaded environment. In 
> particular, if I call Rprintf() from a then-created thread while the stack 
> size checking is enabled (ie the "R_CStackLimit" pointer isn't set to -1), it 
> is very likely to end up with some fatal error messages like:
>
> Error: C stack usage  847645293284 is too close to the limit
>> Error: C stack usage  847336061668 is too close to the limit
>> Error: C stack usage  847666277092 is too close to the limit
>> Error: C stack usage  847346551524 is too close to the limit
>> Error: C stack usage  847367531236 is too close to the limit
>> Error: C stack usage  847357041380 is too close to the limit
>> Error: C stack usage  847378021092 is too close to the limit
>> Error: C stack usage  847655787236 is too close to the limit
>
> , and the R session terminates in a segfault.
> After I used all means to confirm that there was no memory leakage and the 
> real stack use was minimum, I thought it can only be the Rprintf issue. I 
> then disabled all screen outputs from the then-created threads and the error 
> was gone. It was also reported on stackoverflow:
> https://stackoverflow.com/questions/50092949/why-does-rcout-and-rprintf-cause-stack-limit-error-when-multithreading
> I tried using a semaphore to protect all Rprintf calls but it didn't prevent 
> the error.
>
> Since my program needs to report some messages from the worker threads 
> (created by the main thread), I wonder if there is a solution to safely do 
> so, or I have to pipe the messages to the main thread, which in turn calls 
> Rprintf? I hope not to change "R_CStackLimit" to disable the stack size 
> checks because it generates a "NOTE" in R check.
>
> Cheers,
> Yang
>
> ___
>
> The information in this email is confidential and intend...{{dropped:15}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] Objectsize function visiting every element for alt-rep strings

2019-01-22 Thread Tierney, Luke
On Mon, 21 Jan 2019, Martin Maechler wrote:

>> Travers Ching
>> on Tue, 15 Jan 2019 12:50:45 -0800 writes:
>
>> I have a toy alt-rep string package that generates
>> randomly seeded strings.  example: library(altstringisode)
>> x <- altrandomStrings(1e8) head(x) [1]
>> "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1"
>> "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8)
>
>> Object.size will call the set_altstring_Elt_method for
>> every single element, materializing (slowly) every element
>> of the vector.  This is a problem mostly in R-studio since
>> object.size is called automatically, defeating the purpose
>> of alt-rep.

There is no sensible way in general to figure out how large the
strings would be without computing them. There might be specifically
for a deferred sequence conversion but it would require a fair bit of
effort to figure out that would be better spent elsewhere.

I've never been a big fan of object.size since what it is trying to
compute isn't very well defined in the context of sharing and possible
internal state changes (even before ALTREP byte code compilation could
change the internals of a function [which object.size sees] and
assigning into environments or evaluating promises can change
environments [which object.size ignores]). The issue is not unlike the
one faced by identical(), which has a bunch of options for the
different ways objects can be identical, and might need even more.

We could in general have object.size for and ALTREP return the
object.size results of the current internal representation, but that
might not always be appropriate. Again, what object.size is trying to
compute isn't very well defined.

RStudio does seem to call object.size on every assignment to
.GlobalEnv. That might be worth revisiting.


Best,

luke

>
> Hmm.  But still, the idea had been that object.size()  *shuld*
> return the size of the "de-ALTREP'ed" object *but* should not
> de-ALTREP it.
> That's what happens for integers, but indeed fails to happen for
> such as.character(.)ed integers.
>
> From my eRum presentation (which took from the official ALTREP documentation
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html ) :
>
>  > x <- 1:1e15
>  > object.size(x) # 8000'000'000'000'048 bytes : 8000 TBytes -- ok, not really
>  8048 bytes
>  > is.unsorted(x) # FALSE : i.e., R's *knows* it is sorted
>  [1] FALSE
>  > xs <- sort(x)  #
>  > .Internal(inspect(x))
>  @80255f8 14 REALSXP g0c0 [NAM(7)]  1 : 1000 (compact)
>  >
>
>  > cx <- as.character(x)
>  > .Internal(inspect(cx))
>  @80485d8 16 STRSXP g0c0 [NAM(1)]   
>@80255f8 14 REALSXP g1c0 [MARK,NAM(7)]  1 : 1000 (compact)
>  > system.time( print(object.size(x)), gc=FALSE)
>  8048 bytes
> user  system elapsed
>0.000   0.000   0.001
>  > system.time( print(object.size(cx)), gc=FALSE)
>  Error: cannot allocate vector of size 8388608.0 Gb
>  Timing stopped at: 11.43 0 11.46
>  >
>
> One could consider it a bug that object.size(cx) is indeed
> inspecting every string, i.e., accessing cx[i] for all i.
> Note that it is *not*  deALTREPing cx  itself :
>
>> x <- 1:1e6
>> cx <- as.character(x)
>> .Internal(inspect(cx))
>
> @7f5b1a0 16 STRSXP g0c0 [NAM(1)]   
>  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 100 (compact)
>> system.time( print(object.size(cx)), gc=FALSE)
> 6448 bytes
>   user  system elapsed
>  0.369   0.005   0.374
>> .Internal(inspect(cx))
> @7f5b1a0 16 STRSXP g0c0 [NAM(7)]   
>  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 100 (compact)
>>
>
>> Is there a way to avoid the problem of forced
>> materialization in rstudio?
>
>> PS: Is there a way to tell if a post has been received by
>> the mailing list?  How long does it take to show up in the
>> archives?
>
> [ that (waiting time) distribution is quite right skewed... I'd
>  guess it's median to be less than 10 minutes... but we had
>  artificially delayed it somewhat in the past to fight
>  spammers, and ETH (the hosting instituttion) and others have
>  increased spam and virus filtering so everything has become
>  quite a bit slower ]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compiler + stopifnot bug

2019-01-03 Thread Tierney, Luke
Should be fixed in r75946.

Best,

luke

On Fri, 4 Jan 2019, Tierney, Luke wrote:

> Thanks for the reports. Will look into it soon and report back.
>
> Luke
>
> Sent from my iPhone
>
>> On Jan 3, 2019, at 2:15 PM, Martin Morgan  wrote:
>>
>> For what it's worth this also introduced
>>
>>> df = data.frame(v = package_version("1.2"))
>>> rbind(df, df)$v
>> [[1]]
>> [1] 1 2
>>
>> [[2]]
>> [1] 1 2
>>
>> instead of
>>
>>> rbind(df, df)$v
>>[1] '1.2' '1.2'
>>
>> which shows up in Travis builds of Bioconductor packages
>>
>>  https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014506.html
>>
>> and elsewhere
>>
>> Martin Morgan
>>
>> On 1/3/19, 7:05 PM, "R-devel on behalf of Duncan Murdoch" 
>>  wrote:
>>
>>>On 03/01/2019 3:37 p.m., Duncan Murdoch wrote:
>>> I see this too; by bisection, it seems to have first appeared in r72943.
>>
>>Sorry, that was a typo.  I meant r75943.
>>
>>Duncan Murdoch
>>
>>>
>>> Duncan Murdoch
>>>
>>>> On 03/01/2019 2:18 p.m., Iñaki Ucar wrote:
>>>> Hi,
>>>>
>>>> I found the following issue in r-devel (2019-01-02 r75945):
>>>>
>>>> `foo<-` <- function(x, value) {
>>>>bar(x) <- value * x
>>>>x
>>>> }
>>>>
>>>> `bar<-` <- function(x, value) {
>>>>stopifnot(all(value / x == 1))
>>>>x + value
>>>> }
>>>>
>>>> `foo<-` <- compiler::cmpfun(`foo<-`)
>>>> `bar<-` <- compiler::cmpfun(`bar<-`)
>>>>
>>>> x <- c(2, 2)
>>>> foo(x) <- 1
>>>> x # should be c(4, 4)
>>>> #> [1] 3 3
>>>>
>>>> If the functions are not compiled or the stopifnot call is removed,
>>>> the snippet works correctly. So it seems that something is messing
>>>> around with the references to "value" when the call to stopifnot gets
>>>> compiled, and the wrong "value" is modified. Note also that if "x <-
>>>> 2", then the result is correct, 4.
>>>>
>>>> Regards,
>>>>
>>>
>>
>>__
>>R-devel@r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compiler + stopifnot bug

2019-01-03 Thread Tierney, Luke
Thanks for the reports. Will look into it soon and report back.

Luke

Sent from my iPhone

> On Jan 3, 2019, at 2:15 PM, Martin Morgan  wrote:
> 
> For what it's worth this also introduced
> 
>> df = data.frame(v = package_version("1.2"))
>> rbind(df, df)$v
> [[1]]
> [1] 1 2
> 
> [[2]]
> [1] 1 2
> 
> instead of
> 
>> rbind(df, df)$v
>[1] '1.2' '1.2'
> 
> which shows up in Travis builds of Bioconductor packages
> 
>  https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014506.html
> 
> and elsewhere
> 
> Martin Morgan
> 
> On 1/3/19, 7:05 PM, "R-devel on behalf of Duncan Murdoch" 
>  wrote:
> 
>>On 03/01/2019 3:37 p.m., Duncan Murdoch wrote:
>> I see this too; by bisection, it seems to have first appeared in r72943.
> 
>Sorry, that was a typo.  I meant r75943.
> 
>Duncan Murdoch
> 
>> 
>> Duncan Murdoch
>> 
>>> On 03/01/2019 2:18 p.m., Iñaki Ucar wrote:
>>> Hi,
>>> 
>>> I found the following issue in r-devel (2019-01-02 r75945):
>>> 
>>> `foo<-` <- function(x, value) {
>>>bar(x) <- value * x
>>>x
>>> }
>>> 
>>> `bar<-` <- function(x, value) {
>>>stopifnot(all(value / x == 1))
>>>x + value
>>> }
>>> 
>>> `foo<-` <- compiler::cmpfun(`foo<-`)
>>> `bar<-` <- compiler::cmpfun(`bar<-`)
>>> 
>>> x <- c(2, 2)
>>> foo(x) <- 1
>>> x # should be c(4, 4)
>>> #> [1] 3 3
>>> 
>>> If the functions are not compiled or the stopifnot call is removed,
>>> the snippet works correctly. So it seems that something is messing
>>> around with the references to "value" when the call to stopifnot gets
>>> compiled, and the wrong "value" is modified. Note also that if "x <-
>>> 2", then the result is correct, 4.
>>> 
>>> Regards,
>>> 
>> 
> 
>__
>R-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0

2018-12-22 Thread Tierney, Luke
Modifying findGlobals to return empty variable lists for non-closure
functions was an easy change so an updated codetools is ready to
submit when the CRAN queue opens again in the new year.

Best,

luke

On Thu, 20 Dec 2018, luke-tier...@uiowa.edu wrote:

> That's where the error is signaled, but the issue is in
>
>> 4: lapply(objs, FUN = function(obj) {
>>   value = env[[obj]]
>>   if (is.function(value))
>>   findGlobals(value)
>>   else character(0)
>>   })
>> 3: findLogicalRdir(pkgname, c("T", "F"))
>
> Change is.function(value) to typeof(value) == "closure" and you should be OK.
>
> Best,
>
> luke
>
> On Thu, 20 Dec 2018, Martin Morgan wrote:
>
>> this comes from `findGlobals()`
>> 
>>> foo <- `[`
>>> findGlobals(foo)
>> Error in makeUsageCollector(fun, ...) : only works for closures
>>> traceback()
>> 4: stop("only works for closures")
>> 3: makeUsageCollector(fun, ...)
>> 2: collectUsage(fun, enterGlobal = enter)
>> 1: findGlobals(foo)
>> 
>> In the bigger context it is in code that looks for poor 'coding practice', 
>> in this particular case looking for use of T / F rather than TRUE / FALSE, 
>> where the logic is to parse each function for use of global variables, and 
>> then to search for T / F amongst those.
>> 
>> The full traceback when run on the package at 
>> https://github.com/mtmorgan/PkgA/tree/BiocCheck-sbs
>> 
>> * Checking coding practice...
>> Error in makeUsageCollector(fun, ...) : only works for closures
>>> traceback()
>> 9: stop("only works for closures")
>> 8: makeUsageCollector(fun, ...)
>> 7: collectUsage(fun, enterGlobal = enter)
>> 6: findGlobals(value)
>> 5: FUN(X[[i]], ...)
>> 4: lapply(objs, FUN = function(obj) {
>>   value = env[[obj]]
>>   if (is.function(value))
>>   findGlobals(value)
>>   else character(0)
>>   })
>> 3: findLogicalRdir(pkgname, c("T", "F"))
>> 2: checkCodingPractice(package_dir, parsedCode, package_name)
>> 1: BiocCheck::BiocCheck(".")
>> 
>> Martin
>> 
>> On 12/19/18, 8:32 AM, "Bioc-devel on behalf of Tierney, Luke" 
>>  
>> wrote:
>>
>>codetools already checks only closures in checkUsageENv and hande
>>checkUsagePackage, so this is anissue on the Bioc side.
>>
>>Best,
>>
>>luke
>>
>>On Tue, 18 Dec 2018, Tierney, Luke wrote:
>>
>>> Codetools should probably be ignoring those. Will have a look
>>>
>>> Sent from my iPhone
>>>
>>>> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori 
>>  wrote:
>>>>
>>>> Can you please open an issue for this so we don't lose track of it -
>>>>
>>>> https://github.com/Bioconductor/BiocCheck/issues
>>>>
>>>>
>>>>
>>>> Lori Shepherd
>>>>
>>>> Bioconductor Core Team
>>>>
>>>> Roswell Park Cancer Institute
>>>>
>>>> Department of Biostatistics & Bioinformatics
>>>>
>>>> Elm & Carlton Streets
>>>>
>>>> Buffalo, New York 14263
>>>>
>>>> 
>>>> From: Bioc-devel  on behalf of 
>> Shian Su 
>>>> Sent: Monday, December 17, 2018 8:34:10 PM
>>>> To: bioc-devel
>>>> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() 
>> version 1.18.0
>>>>
>>>> Hi all,
>>>>
>>>> If you put
>>>>
>>>> foo <- `[`
>>>>
>>>> Somewhere in a package, it will trigger
>>>>
>>>> Error in makeUsageCollector(fun, ...) : only works for closures
>>>>
>>>> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
>>>>
>>>> if (typeof(fun) != "closure")
>>>>stop("only works for closures")
>>>>
>>>> In codetools::makeUsageCollector(), but
>>>>
>>>>> typeof(`[`)
>>>> ## "special"
>>>>
>>>> Not that it matters for my use-case because I had discovered 
>> magrittr???s extract alias, but it might be an edge case worth cov

Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0

2018-12-20 Thread Tierney, Luke
That's where the error is signaled, but the issue is in

> 4: lapply(objs, FUN = function(obj) {
>   value = env[[obj]]
>   if (is.function(value))
>   findGlobals(value)
>   else character(0)
>   })
> 3: findLogicalRdir(pkgname, c("T", "F"))

Change is.function(value) to typeof(value) == "closure" and you should be OK.

Best,

luke

On Thu, 20 Dec 2018, Martin Morgan wrote:

> this comes from `findGlobals()`
>
>> foo <- `[`
>> findGlobals(foo)
> Error in makeUsageCollector(fun, ...) : only works for closures
>> traceback()
> 4: stop("only works for closures")
> 3: makeUsageCollector(fun, ...)
> 2: collectUsage(fun, enterGlobal = enter)
> 1: findGlobals(foo)
>
> In the bigger context it is in code that looks for poor 'coding practice', in 
> this particular case looking for use of T / F rather than TRUE / FALSE, where 
> the logic is to parse each function for use of global variables, and then to 
> search for T / F amongst those.
>
> The full traceback when run on the package at 
> https://github.com/mtmorgan/PkgA/tree/BiocCheck-sbs
>
> * Checking coding practice...
> Error in makeUsageCollector(fun, ...) : only works for closures
>> traceback()
> 9: stop("only works for closures")
> 8: makeUsageCollector(fun, ...)
> 7: collectUsage(fun, enterGlobal = enter)
> 6: findGlobals(value)
> 5: FUN(X[[i]], ...)
> 4: lapply(objs, FUN = function(obj) {
>   value = env[[obj]]
>   if (is.function(value))
>   findGlobals(value)
>   else character(0)
>   })
> 3: findLogicalRdir(pkgname, c("T", "F"))
> 2: checkCodingPractice(package_dir, parsedCode, package_name)
> 1: BiocCheck::BiocCheck(".")
>
> Martin
>
> On 12/19/18, 8:32 AM, "Bioc-devel on behalf of Tierney, Luke" 
>  wrote:
>
>codetools already checks only closures in checkUsageENv and hande
>checkUsagePackage, so this is anissue on the Bioc side.
>
>Best,
>
>luke
>
>On Tue, 18 Dec 2018, Tierney, Luke wrote:
>
>> Codetools should probably be ignoring those. Will have a look
>>
>> Sent from my iPhone
>>
>>> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori 
>  wrote:
>>>
>>> Can you please open an issue for this so we don't lose track of it -
>>>
>>> https://github.com/Bioconductor/BiocCheck/issues
>>>
>>>
>>>
>>> Lori Shepherd
>>>
>>> Bioconductor Core Team
>>>
>>> Roswell Park Cancer Institute
>>>
>>> Department of Biostatistics & Bioinformatics
>>>
>>> Elm & Carlton Streets
>>>
>>> Buffalo, New York 14263
>>>
>>> 
>>> From: Bioc-devel  on behalf of Shian 
> Su 
>>> Sent: Monday, December 17, 2018 8:34:10 PM
>>> To: bioc-devel
>>> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() 
> version 1.18.0
>>>
>>> Hi all,
>>>
>>> If you put
>>>
>>> foo <- `[`
>>>
>>> Somewhere in a package, it will trigger
>>>
>>> Error in makeUsageCollector(fun, ...) : only works for closures
>>>
>>> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
>>>
>>> if (typeof(fun) != "closure")
>>>stop("only works for closures")
>>>
>>> In codetools::makeUsageCollector(), but
>>>
>>>> typeof(`[`)
>>> ## "special"
>>>
>>> Not that it matters for my use-case because I had discovered 
> magrittr???s extract alias, but it might be an edge case worth covering, 
> especially since the error message is so cryptic.
>>>
>>> Kind regards,
>>> Shian Su
>>>
>>> ___
>>>
>>> The information in this email is confidential and 
> intend...{{dropped:29}}
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>--
&g

Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0

2018-12-19 Thread Tierney, Luke
codetools already checks only closures in checkUsageENv and hande
checkUsagePackage, so this is anissue on the Bioc side.

Best,

luke

On Tue, 18 Dec 2018, Tierney, Luke wrote:

> Codetools should probably be ignoring those. Will have a look
>
> Sent from my iPhone
>
>> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori  
>> wrote:
>>
>> Can you please open an issue for this so we don't lose track of it -
>>
>> https://github.com/Bioconductor/BiocCheck/issues
>>
>>
>>
>> Lori Shepherd
>>
>> Bioconductor Core Team
>>
>> Roswell Park Cancer Institute
>>
>> Department of Biostatistics & Bioinformatics
>>
>> Elm & Carlton Streets
>>
>> Buffalo, New York 14263
>>
>> 
>> From: Bioc-devel  on behalf of Shian Su 
>> 
>> Sent: Monday, December 17, 2018 8:34:10 PM
>> To: bioc-devel
>> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 
>> 1.18.0
>>
>> Hi all,
>>
>> If you put
>>
>> foo <- `[`
>>
>> Somewhere in a package, it will trigger
>>
>> Error in makeUsageCollector(fun, ...) : only works for closures
>>
>> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
>>
>> if (typeof(fun) != "closure")
>>stop("only works for closures")
>>
>> In codetools::makeUsageCollector(), but
>>
>>> typeof(`[`)
>> ## "special"
>>
>> Not that it matters for my use-case because I had discovered magrittr???s 
>> extract alias, but it might be an edge case worth covering, especially since 
>> the error message is so cryptic.
>>
>> Kind regards,
>> Shian Su
>>
>> ___
>>
>> The information in this email is confidential and intend...{{dropped:29}}
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0

2018-12-18 Thread Tierney, Luke
Codetools should probably be ignoring those. Will have a look

Sent from my iPhone

> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori  
> wrote:
> 
> Can you please open an issue for this so we don't lose track of it -
> 
> https://github.com/Bioconductor/BiocCheck/issues
> 
> 
> 
> Lori Shepherd
> 
> Bioconductor Core Team
> 
> Roswell Park Cancer Institute
> 
> Department of Biostatistics & Bioinformatics
> 
> Elm & Carlton Streets
> 
> Buffalo, New York 14263
> 
> 
> From: Bioc-devel  on behalf of Shian Su 
> 
> Sent: Monday, December 17, 2018 8:34:10 PM
> To: bioc-devel
> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 
> 1.18.0
> 
> Hi all,
> 
> If you put
> 
> foo <- `[`
> 
> Somewhere in a package, it will trigger
> 
> Error in makeUsageCollector(fun, ...) : only works for closures
> 
> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
> 
> if (typeof(fun) != "closure")
>stop("only works for closures")
> 
> In codetools::makeUsageCollector(), but
> 
>> typeof(`[`)
> ## "special"
> 
> Not that it matters for my use-case because I had discovered magrittr�s 
> extract alias, but it might be an edge case worth covering, especially since 
> the error message is so cryptic.
> 
> Kind regards,
> Shian Su
> 
> ___
> 
> The information in this email is confidential and intend...{{dropped:29}}
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] STRING_IS_SORTED claims as.character(1:100) is sorted

2018-11-16 Thread Tierney, Luke
Thanks. Fixed in R_devel and R-patched. [STRING_IS_SORTED was not yet
used anywhere so this did not affect any computations.]

Best,

luke

On Thu, 15 Nov 2018, Michael Sannella via R-devel wrote:

> If I have loaded the C code:
>SEXP altrep_STRING_IS_SORTED(SEXP x)
>{
>return ScalarInteger(STRING_IS_SORTED(x));
>}
> and defined the function:
>issort <- function(x) .Call("altrep_STRING_IS_SORTED",x)
>
> I am seeing the following results in R 3.5.1/Linux:
>> issort(LETTERS)
>[1] NA
>> issort(as.character(1:100))  ## should return NA
>[1] 1
>> issort(as.character(100:1))  ## should return NA
>[1] -1
>> issort(as.character(1:100+1L))
>[1] NA
>
> issort(as.character(1:100)) should return NA, since the string vector
> "1","2",..."10",... is not sorted.  I suspect that the problem is that
> the Is_sorted method for deferred_string is just calling the Is_sorted
> method for the source object 1:100 (which _is_ a sorted integer
> vector).  It should probably just return NA for any source object.
>
>  ~~ Michael Sannella
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] v3 serialization of compact_intseq altrep should write modified data

2018-10-22 Thread Tierney, Luke
Try this C code:

SEXP set_intseq_data(SEXP x)
{
 if (MAYBE_SHARED(x))
error("Oops, not supposed to do this!");
 void* ptr = DATAPTR(x);
 ((int*)ptr)[3] = 1234;
 return R_NilValue;
}

Lots of things will break if you modify objects that have been marked
as immutable (and hence where MAYBE_SHARED returns TRUE).

For now the implementation of compact sequences marks them as
immutable and so assumes the expanded version will not be changed.
That implementation detail might be changed at some point but C code
should not make assumptions.

Best,

luke

On Mon, 22 Oct 2018, Michael Sannella via R-devel wrote:

> Experimenting with altrep objects and v3 serialization, I discovered a
> possible bug.  Calling DATAPTR on a compact_intseq object returns a
> pointer to the expanded integer sequence in memory.  If you modify
> this data, the object values appear to be changed.  However, if the
> compact_intseq object is then serialized (with version=3), only the
> original integer sequence info is written.
>
> For example, suppose I have compiled and loaded the following C code:
>  SEXP set_intseq_data(SEXP x)
>  {
>  void* ptr = DATAPTR(x);
>  ((int*)ptr)[3] = 1234;
>  return R_NilValue;
>  }
>
> I see the following behavior in R 3.5.1:
>  > x <- 1:10
>  > x
>   [1]  1  2  3  4  5  6  7  8  9 10
>  > .Call("set_intseq_data", x)
>  NULL
>  > x
>   [1]123 123456789   10
>  > save(x, file="temp.rda", version=3)
>  > load(file="temp.rda")
>  > x
>   [1]  1  2  3  4  5  6  7  8  9 10
>  >
>
> I would have expected the modified vector data to be serialized to the
> file, and be restored when it is loaded.
>
>  ~~ Michael Sannella
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unlockEnvironment()?

2018-10-10 Thread Tierney, Luke
On Wed, 10 Oct 2018, William Dunlap via R-devel wrote:

> R lets one lock an environment with both an R function,
> base::lockEnvironment, and a C function, R_LockEnvironment, but, as far as
> I can tell, no corresponding function to unlock an environment.  Is this
> omission on principle or just something that has not been done yet?

Absolutely on principle!

Best,

luke

>
> I ask because several packages, including the well-used R6 and rlang
> packages, fiddle with some bits in with SET_ENVFLAGS and ENVFLAGS to unlock
> an environment.  (See grep output below.)
>
> About 5000 (1/3 of CRAN) packages depend on R6 or rlang.  Should R supply a
> more disciplined way of unlocking an environment?
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> $ { find . -type f -print0 | xargs -0 grep -n -C 2 ENVFLAGS ; } 2>/dev/null
> ./R6/tests/manual/encapsulation.R-5-unlockEnvironment <-
> cfunction(signature(env = "environment"), body = '
> ./R6/tests/manual/encapsulation.R-6-  #define FRAME_LOCK_MASK (1<<14)
> ./R6/tests/manual/encapsulation.R:7:  #define FRAME_IS_LOCKED(e)
> (ENVFLAGS(e) & FRAME_LOCK_MASK)
> ./R6/tests/manual/encapsulation.R:8:  #define UNLOCK_FRAME(e)
> SET_ENVFLAGS(e, ENVFLAGS(e) & (~ FRAME_LOCK_MASK))
> ./R6/tests/manual/encapsulation.R-9-
> ./R6/tests/manual/encapsulation.R-10-  if (TYPEOF(env) == NILSXP)
> ./BMA/R/iBMA.glm.R-21-*/
> ./BMA/R/iBMA.glm.R-22-#define FRAME_LOCK_MASK (1<<14)
> ./BMA/R/iBMA.glm.R:23:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
> FRAME_LOCK_MASK)
> ./BMA/R/iBMA.glm.R:24:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e, ENVFLAGS(e) &
> (~ FRAME_LOCK_MASK))
> ./BMA/R/iBMA.glm.R-25-'
> ./BMA/R/iBMA.glm.R-26-
> --
> ./BMA/R/iBMA.surv.R-22-*/
> ./BMA/R/iBMA.surv.R-23-#define FRAME_LOCK_MASK (1<<14)
> ./BMA/R/iBMA.surv.R:24:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
> FRAME_LOCK_MASK)
> ./BMA/R/iBMA.surv.R:25:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e, ENVFLAGS(e)
> & (~ FRAME_LOCK_MASK))
> ./BMA/R/iBMA.surv.R-26-'
> ./BMA/R/iBMA.surv.R-27-
> ./pkgload/src/unlock.c-20-*/
> ./pkgload/src/unlock.c-21-#define FRAME_LOCK_MASK (1 << 14)
> ./pkgload/src/unlock.c:22:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
> FRAME_LOCK_MASK)
> ./pkgload/src/unlock.c:23:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e,
> ENVFLAGS(e) & (~FRAME_LOCK_MASK))
> ./pkgload/src/unlock.c-24-
> ./pkgload/src/unlock.c-25-extern SEXP R_TrueValue;
> ./SOD/src/tmp.cpp-11394-SEXP (ENCLOS)(SEXP x);
> ./SOD/src/tmp.cpp-11395-SEXP (HASHTAB)(SEXP x);
> ./SOD/src/tmp.cpp:11396:int (ENVFLAGS)(SEXP x);
> ./SOD/src/tmp.cpp:11397:void (SET_ENVFLAGS)(SEXP x, int v);
> ./SOD/src/tmp.cpp-11398-void SET_FRAME(SEXP x, SEXP v);
> ./SOD/src/tmp.cpp-11399-void SET_ENCLOS(SEXP x, SEXP v);
> --
> ./SOD/src/tmp.h-11393-SEXP (ENCLOS)(SEXP x);
> ./SOD/src/tmp.h-11394-SEXP (HASHTAB)(SEXP x);
> ./SOD/src/tmp.h:11395:int (ENVFLAGS)(SEXP x);
> ./SOD/src/tmp.h:11396:void (SET_ENVFLAGS)(SEXP x, int v);
> ./SOD/src/tmp.h-11397-void SET_FRAME(SEXP x, SEXP v);
> ./SOD/src/tmp.h-11398-void SET_ENCLOS(SEXP x, SEXP v);
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R_ext/Altrep.h should be more C++-friendly

2018-10-10 Thread Tierney, Luke
Thanks for the suggestion. Committed in R_devel.

Best,

luke

On Mon, 8 Oct 2018, Michael Sannella wrote:

> I am not able to #include "R_ext/Altrep.h" from a C++ file.  I think
> it needs two changes:
> 
> 1. add the same __cplusplus check as most of the other header files:
>     #ifdef  __cplusplus
>     extern "C" {
>     #endif
>         ...
>     #ifdef  __cplusplus
>     }
>     #endif
> 
> 2. change the line
>     R_new_altrep(R_altrep_class_t class, SEXP data1, SEXP data2);
>  to
>     R_new_altrep(R_altrep_class_t cls, SEXP data1, SEXP data2);
>  since C++ doesn't like an argument named 'class'
> 
>   ~~ Michael Sannella
> 
> 
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug with OutDec option and deferred_string altrep object

2018-10-09 Thread Tierney, Luke
This is now fixed in R-devel. Will port to R_patched in due course.
R_inspect also now handles pairlists ending with dotted pairs.

Best,

luke

On Tue, 9 Oct 2018, Tierney, Luke wrote:

> Thanks for the report. The approach you outlines below should work --
> I'll look into it.
>
> Best,
>
> luke
>
> On Mon, 8 Oct 2018, Michael Sannella wrote:
>
>> While implementing R's new 'altrep' functionality in the TERR engine,
>> I discovered a bug in R's 'deferred_string' altrep object: it is not
>> using the correct value of the 'OutDec' option when it expands a
>> deferred_string.  See the following example:
>>
>> R 3.5.1: (same results in R 3.6.0 devel engine built 10/5)
>>     > options(scipen=0, OutDec=".")
>>     > as.character(123.456)
>>     [1] "123.456"
>>     > options(scipen=-5, OutDec=",")
>>     > as.character(123.456)
>>     [1] "1,23456e+02"
>>     > xx <- as.character(123.456)
>>     > options(scipen=0, OutDec=".")
>>     > xx
>>     [1] "1.23456e+02"
>>     >
>>
>> In the example above, the variable 'xx' is set to a deferred_string
>> while OutDec is ','.  However, when the string is actually formatted
>> (when xx is printed), it uses the current option value OutDec='.' to
>> format the string.  I think that deferred_string should use the value
>> OutDec=',' from when as.character was called.
>>
>> Note that the behavior is different with the 'scipen' option: The
>> deferred_string object records the scipen=-5 value when as.character
>> is called, and uses this value when xx is printed.  Looking at the
>> deferred_string object, it appears that CDR(R_altrep_data1()) is
>> set to a scalar integer containing the scipen value at the time the
>> deferred_string was created.
>>
>> Ideally, the deferred_string object would save both the scipen and
>> OutDec option values.  I'd suggest saving these values as regular
>> pairlist values, say by setting the data1 field to pairlist(,
>> scipen=-5L, OutDec=',') for the value of xx above.  To save space, you
>> could avoid saving these values in the common case where scipen=0L,
>> OutDec='.'.  It would also be better if the data1 field was a
>> well-formed pairlist; the current value of the data1 field causes
>> R_inspect to segfault.
>>
>> I understand that you probably wouldn't want to change the
>> deferred_string structure.  An alternative fix would be to avoid this
>> case by:
>>   1. Never create a deferred_string if OutDec is not '.'.
>>   2. When expanding an element of a deferred_string, temporarily set
>> OutDec to '.'.
>>
>>   ~~ Michael Sannella
>>
>>
>>
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug with OutDec option and deferred_string altrep object

2018-10-08 Thread Tierney, Luke
Thanks for the report. The approach you outlines below should work --
I'll look into it.

Best,

luke

On Mon, 8 Oct 2018, Michael Sannella wrote:

> While implementing R's new 'altrep' functionality in the TERR engine,
> I discovered a bug in R's 'deferred_string' altrep object: it is not
> using the correct value of the 'OutDec' option when it expands a
> deferred_string.  See the following example:
> 
> R 3.5.1: (same results in R 3.6.0 devel engine built 10/5)
>     > options(scipen=0, OutDec=".")
>     > as.character(123.456)
>     [1] "123.456"
>     > options(scipen=-5, OutDec=",")
>     > as.character(123.456)
>     [1] "1,23456e+02"
>     > xx <- as.character(123.456)
>     > options(scipen=0, OutDec=".")
>     > xx
>     [1] "1.23456e+02"
>     >
> 
> In the example above, the variable 'xx' is set to a deferred_string
> while OutDec is ','.  However, when the string is actually formatted
> (when xx is printed), it uses the current option value OutDec='.' to
> format the string.  I think that deferred_string should use the value
> OutDec=',' from when as.character was called.
> 
> Note that the behavior is different with the 'scipen' option: The
> deferred_string object records the scipen=-5 value when as.character
> is called, and uses this value when xx is printed.  Looking at the
> deferred_string object, it appears that CDR(R_altrep_data1()) is
> set to a scalar integer containing the scipen value at the time the
> deferred_string was created.
> 
> Ideally, the deferred_string object would save both the scipen and
> OutDec option values.  I'd suggest saving these values as regular
> pairlist values, say by setting the data1 field to pairlist(,
> scipen=-5L, OutDec=',') for the value of xx above.  To save space, you
> could avoid saving these values in the common case where scipen=0L,
> OutDec='.'.  It would also be better if the data1 field was a
> well-formed pairlist; the current value of the data1 field causes
> R_inspect to segfault.
> 
> I understand that you probably wouldn't want to change the
> deferred_string structure.  An alternative fix would be to avoid this
> case by:
>   1. Never create a deferred_string if OutDec is not '.'.
>   2. When expanding an element of a deferred_string, temporarily set
> OutDec to '.'.
> 
>   ~~ Michael Sannella
> 
> 
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Bug report: cbind with numeric and raw gives incorrect result

2018-09-25 Thread Tierney, Luke
Thanks for the report and patch. Fixed in R-devel and R_patched.

Best,

luke

On Tue, 25 Sep 2018, brodie gaslam via R-devel wrote:

>
>
> For what it's worth the following patch fixes that particular problem on my 
> system.  I have not checked very carefully to make sure this does not cause 
> other problems, but at a high level it seems to make sense.  In this 
> particular part of the code I believe `mode` is taken to be the highest type 
> of "column" encountered by `ctype` and based on conditionals it can (I think) 
> be up to REALSXP here.  This leads to a `INTEGER(REALSXP)` call, which 
> presumably messes up the underlying double bit representation.
>
> Again, I looked at this very quickly so I could be completely wrong, but I 
> did at least build R with this patch and then no longer observed the odd 
> behavior reported by mikefc.
>
> Index: src/main/bind.c
> ===
> --- src/main/bind.c    (revision 75340)
> +++ src/main/bind.c    (working copy)
> @@ -1381,11 +1381,16 @@
>              MOD_ITERATE1(idx, k, i, i1, {
>              LOGICAL(result)[n++] = RAW(u)[i1] ? TRUE : FALSE;
>              });
> -        } else {
> +        } else if (mode == INTSXP) {
>              R_xlen_t i, i1;
>              MOD_ITERATE1(idx, k, i, i1, {
>              INTEGER(result)[n++] = (unsigned char) RAW(u)[i1];
>              });
> +        } else {
> +            R_xlen_t i, i1;
> +            MOD_ITERATE1(idx, k, i, i1, {
> +            REAL(result)[n++] = (unsigned char) RAW(u)[i1];
> +            });
>          }
>          }
>      }
>
>
>
>
>
>
> On Tuesday, September 25, 2018, 7:58:31 AM EDT, mikefc 
>  wrote:
>
>
>
>
>
> Hi there,
>
> using cbind with a numeric and raw argument produces an incorrect result.
>
> I've posted some details below,
>
> kind regards,
> Mike.
>
>
>
> e.g.
>> cbind(0, as.raw(0))
>     [,1]          [,2]
> [1,]    0 6.950136e-310
>
>
>
> A longer example shows that the result is not a rounding error, is not
> consistent, and repeated applications get different results.
>
>> cbind(0, as.raw(1:10))
>               [,1]          [,2]
> [1,]  0.00e+00  0.00e+00
> [2,]  0.00e+00  0.00e+00
> [3,]  0.00e+00  0.00e+00
> [4,]  0.00e+00  0.00e+00
> [5,]  0.00e+00  6.950135e-310
> [6,] 4.243992e-314  6.950135e-310
> [7,] 8.487983e-314  6.324040e-322
> [8,] 1.273197e-313  0.00e+00
> [9,] 1.697597e-313 -4.343725e-311
> [10,] 2.121996e-313  1.812216e-308
>
>
> This bug occurs on
> * mac os (with R 3.5.1)
> * linux (with R 3.4.4)
> * Windows (with R 3.5.0)
>
>
>
>
> My Session Info
> R version 3.5.1 (2018-07-02)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/
> A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/
> Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] stats    graphics  grDevices utils    datasets  methods  base
>
> other attached packages:
> [1] memoise_1.1.0  ggplot2_3.0.0  nonogramp_0.1.0 purrr_0.2.5
> dplyr_0.7.6
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.18    rstudioapi_0.7  bindr_0.1.1      magrittr_1.5
> tidyselect_0.2.4 munsell_0.5.0    colorspace_1.3-2 R6_2.2.2
> rlang_0.2.1.9000 stringr_1.3.1    plyr_1.8.4      tools_3.5.1
> grid_3.5.1
> [14] packrat_0.4.9-3  gtable_0.2.0    withr_2.1.2      digest_0.6.15
> lazyeval_0.2.1  assertthat_0.2.0 tibble_1.4.2    crayon_1.3.4
> bindrcpp_0.2.2  pryr_0.1.4      codetools_0.2-15 glue_1.3.0
> labeling_0.3
> [27] stringi_1.2.4    compiler_3.5.1  pillar_1.3.0    scales_0.5.0
> pkgconfig_2.0.1
>
>     [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bias in R's random integers?

2018-09-21 Thread Tierney, Luke
Not sure what should happen theoretically for the code in vseq.c, but
I see the same pattern with the R generators I tried (default,
Super-Duper, and L'Ecuyer) and with with bash $RANDOM using

N <- 1
X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
X <- X1 + 2 ^ 15 * (X2 > 2^14)

and with numbers from random.org

library(random)
X <- randomNumbers(N, 0, 2^16-1, col = 1)

So I'm not convinced there is an issue.

Best,

luke

On Fri, 21 Sep 2018, Steve Grubb wrote:

> Hello,
>
> Top posting. Several people have asked about the code to replicate my
> results. I have cleaned up the code to remove an x/y coordinate bias for
> displaying the results directly on a 640 x 480 VGA adapter. You can find the
> code here:
>
> http://people.redhat.com/sgrubb/files/vseq.c
>
> To collect R samples:
> X <- runif(1, min = 0, max = 65535)
> write.table(X, file = "~/r-rand.txt", sep = "\n", row.names = FALSE)
>
> Then:
> cat ~/r-rand.txt | ./vseq > ~/r-rand.csv
>
> And then to create the chart:
>
> library(ggplot2);
> num.csv <- read.csv("~/random.csv", header=T)
> qplot(X, Y, data=num.csv);
>
> Hope this helps sort this out.
>
> Best Regards,
> -Steve
>
> On Thursday, September 20, 2018 5:09:23 PM EDT Steve Grubb wrote:
>> On Thursday, September 20, 2018 11:15:04 AM EDT Duncan Murdoch wrote:
>>> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
 On 9/20/18 1:43 AM, Carl Boettiger wrote:
> For a well-tested C algorithm, based on my reading of Lemire, the
> unbiased "algorithm 3" in https://arxiv.org/abs/1805.10941 is part
> already of the C standard library in OpenBSD and macOS (as
> arc4random_uniform), and in the GNU standard library.  Lemire also
> provides C++ code in the appendix of his piece for both this and the
> faster "nearly divisionless" algorithm.
>
> It would be excellent if any R core members were interested in
> considering bindings to these algorithms as a patch, or might express
> expectations for how that patch would have to operate (e.g. re
> Duncan's
> comment about non-integer arguments to sample size).  Otherwise, an R
> package binding seems like a good starting point, but I'm not the
> right
> volunteer.

 It is difficult to do this in a package, since R does not provide
 access
 to the random bits generated by the RNG. Only a float in (0,1) is
 available via unif_rand().
>>>
>>> I believe it is safe to multiply the unif_rand() value by 2^32, and take
>>> the whole number part as an unsigned 32 bit integer.  Depending on the
>>> RNG in use, that will give at least 25 random bits.  (The low order bits
>>> are the questionable ones.  25 is just a guess, not a guarantee.)
>>>
>>> However, if one is willing to use an external
>>>
 RNG, it is of course possible. After reading about Lemire's work [1], I
 had planned to integrate such an unbiased sampling scheme into the
 dqrng
 package, which I have now started. [2]

 Using Duncan's example, the results look much better:
> library(dqrng)
> m <- (2/5)*2^32
> y <- dqsample(m, 100, replace = TRUE)
> table(y %% 2)
>
   0  1

 500252 499748
>>>
>>> Another useful diagnostic is
>>>
>>>plot(density(y[y %% 2 == 0]))
>>>
>>> Obviously that should give a more or less uniform density, but for
>>> values near m, the default sample() gives some nice pretty pictures of
>>> quite non-uniform densities.
>>>
>>> By the way, there are actually quite a few examples of very large m
>>> besides m = (2/5)*2^32 where performance of sample() is noticeably bad.
>>> You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 + 2a)
>>> * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 +
>>> 3a)*2^32, etc.
>>>
>>> So perhaps I'm starting to be convinced that the default sample() should
>>> be fixed.
>>
>> I find this discussion fascinating. I normally test random numbers in
>> different languages every now and again using various methods. One simple
>> check that I do is to use Michal Zalewski's method when he studied Strange
>> Attractors and Initial TCP/IP Sequence Numbers:
>>
>> http://lcamtuf.coredump.cx/newtcp/
>> https://pdfs.semanticscholar.org/
>> adb7/069984e3fa48505cd5081ec118ccb95529a3.pdf
>>
>> The technique works by mapping the dynamics of the generated numbers into a
>> three-dimensional phase space. This is then plotted in a graph so that you
>> can visually see if something odd is going on.
>>
>> I used   runif(1, min = 0, max = 65535)  to get a set of numbers. This
>> is the resulting plot that was generated from R's numbers using this
>> technique:
>>
>> http://people.redhat.com/sgrubb/files/r-random.jpg
>>
>> And for comparison this was generated by collecting the same number of
>> samples from the bash shell:
>>
>> http://people.redhat.com/sgrubb/files/bash-random.jpg
>>
>> The 

Re: [Rd] cannot destroy connection (?) created by readLines in a tryCatch

2017-12-14 Thread Tierney, Luke
Your guess is wrong. More when I have a sensible keyboard

Sent from my iPhone

On Dec 15, 2017, at 10:21 AM, Gabriel Becker 
> wrote:

On Thu, Dec 14, 2017 at 12:17 PM, Gábor Csárdi 
>
wrote:

On Thu, Dec 14, 2017 at 7:56 PM, Gabriel Becker 
>
wrote:
Gabor,

You can grab the connection and destroy it via getConnection and then a
standard close call.

Yeah, that's often a possible workaround, but since this connection
was opened by
readLines() internally, I don't necessarily know which one it is. E.g.
I might open multiple
connections to the same file, so I can't choose based on the file name.

Btw. this workaround seems to work for me:

read_lines <- function(con, ...) {
 if (is.character(con)) {
   con <- file(con)
   on.exit(close(con))
 }
 readLines(con, ...)
}

This is basically the same as readLines(), but on.exit() does its job here.
That's another clue that it might be an on.exit() issue. Wild guess:
on.exit() does not run if an internal function errors.


It seems to be the  setting of a warning handler in tryCatch that does it
actually; without that, it works as expected, even when errors are caught.

tryCatch(readLines(tempfile(), warn=FALSE), error=function(x) NA)

[1] NA

*Warning message:*

*In file(con, "r") :*

*  cannot open file
'/var/folders/79/l_n_5qr152d2d9d9xs0591lhgn/T//RtmpzIZ6Qh/file1ed2e57f2ea':
No such file or directory*

showConnections(all=TRUE)

 description class  mode text   isopen   can read can write

0 "stdin" "terminal" "r"  "text" "opened" "yes""no"

1 "stdout""terminal" "w"  "text" "opened" "no" "yes"

2 "stderr""terminal" "w"  "text" "opened" "no" "yes"

tryCatch(readLines(tempfile(), warn=FALSE), warning=function(x) NA)

[1] NA

showConnections(all=TRUE)

 description


0 "stdin"


1 "stdout"


2 "stderr"


3
"/var/folders/79/l_n_5qr152d2d9d9xs0591lhgn/T//RtmpzIZ6Qh/file1ed2300ce801"

 class  mode text   isopen   can read can write

0 "terminal" "r"  "text" "opened" "yes""no"

1 "terminal" "w"  "text" "opened" "no" "yes"

2 "terminal" "w"  "text" "opened" "no" "yes"

3 "file" "r"  "text" "closed" "yes""yes"


~G



(it actually lists that it is "closed" already, but
still in the set of existing connections. I can't speak to that
difference).

It is closed but not destroyed.

G.

tryCatch(

+   readLines(tempfile(), warn = FALSE)[1],

+   error = function(e) NA,

+   warning = function(w) NA

+ )

[1] NA

rm(list=ls(all.names = TRUE))

gc()

used (Mb) gc trigger (Mb) max used (Mb)

Ncells 257895 13.8 592000 31.7   416371 22.3

Vcells 536411  4.18388608 64.0  1795667 13.7



showConnections(all = TRUE)

 description

0 "stdin"

1 "stdout"

2 "stderr"

3
"/var/folders/79/l_n_5qr152d2d9d9xs0591lhgn/T//
RtmpZRcxmh/file128a13bffc77"

 class  mode text   isopen   can read can write

0 "terminal" "r"  "text" "opened" "yes""no"

1 "terminal" "w"  "text" "opened" "no" "yes"

2 "terminal" "w"  "text" "opened" "no" "yes"

3 "file" "r"  "text" "closed" "yes""yes"

con = getConnection(3)

con

A connection with

description
"/var/folders/79/l_n_5qr152d2d9d9xs0591lhgn/T//
RtmpZRcxmh/file128a13bffc77"

class   "file"

mode"r"

text"text"

opened  "closed"

can read"yes"

can write   "yes"

close(con)

showConnections(all=TRUE)

 description class  mode text   isopen   can read can write

0 "stdin" "terminal" "r"  "text" "opened" "yes""no"

1 "stdout""terminal" "w"  "text" "opened" "no" "yes"

2 "stderr""terminal" "w"  "text" "opened" "no" "yes"



HTH,
~G

On Thu, Dec 14, 2017 at 10:02 AM, Gábor Csárdi 
>
wrote:

Consider this code. This is R 3.4.2, but based on a quick look at the
NEWS, this has not been fixed.

tryCatch(
 readLines(tempfile(), warn = FALSE)[1],
 error = function(e) NA,
 warning = function(w) NA
)

rm(list=ls(all.names = TRUE))
gc()

showConnections(all = TRUE)

If you run it, you'll get a connection you cannot close(), i.e. the
last showConnections() call prints:

❯ showConnections(all = TRUE)
 description
0 "stdin"
1 "stdout"
2 "stderr"
3
"/var/folders/59/0gkmw1yj2w7bf2dfc3jznv5wgn/T//Rtmpc7JqVS/
filecc2044b2ccec"
 class  mode text   isopen   can read can write
0 "terminal" "r"  "text" "opened" "yes""no"
1 "terminal" "w"  "text" "opened" "no" "yes"
2 "terminal" "w"  "text" "opened" "no" "yes"
3 "file" "r"  "text" "closed" "yes""yes"

AFAICT, readLines should close the connection:

❯ readLines
function (con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding =
"unknown",
   skipNul = FALSE)
{
   if (is.character(con)) {
   con <- file(con, "r")
   on.exit(close(con))
   }
   .Internal(readLines(con, n, ok, warn, encoding, skipNul))
}


so maybe this just a 

Re: [Rd] Storage of byte code-compiled functions in sysdata.rda

2016-05-01 Thread Tierney, Luke
Can you provide a complete reproducible example?

Sent from my iPhone

> On May 1, 2016, at 6:51 AM, Peter Ruckdeschel  
> wrote:
> 
> Hi r-devels,
> 
> we are seeing a new problem with our packages RobAStRDA (just new on CRAN, 
> thanks
> to Uwe and Kurt!) and RobExtremes (to be submitted).
> 
> It must be something recent with the way you internally treat/store byte-code 
> compiled
> functions, as we have no problems with R-3.1.3, but do see an "Error in 
> fct(x) : byte code
> version mismatch" with R-devel SVNrev r70532.
> 
> Background: 
> Starting from several x-y grids, in the sysdata.rda file of RobAStRDA, we 
> store the results 
> of calls to approxfun/splinefun to these grids from within a session with pkg 
> RobAStRDA 
> require()d.  From pkg RobExtremes we then call these interpolating functions 
> by means of 
> a call (essentially) as:
> 
> getFromNamespace(".RMXE", ns = 
> "RobAStRDA")[["GEVFamily"]][["fun.N"]][[1]](1.3)
> 
> upon which we get the announced "Error in fct(x) : byte code version 
> mismatch" while the same 
> code does work for R-3.1.3.
> 
> The list element "fun.N" in the above call already accounts for a different 
> behaviour for
> pre R-3.0.0 (would have given "fun.O") and post R-3.0.0 ("fun.N") results of 
> approxfun/
> splinefun, but the interpolating functions in branch "fun.N" have been 
> produced in
> R-devel SVNrev r70532, so we would have expected our code 
> getFromNamespace(.) above to 
> work in R-devel as well.
> 
> Could you give us any hints how to
> 
> (a) store the interpolating functions resulting from approxfun/splinefun in 
> pkg RobAStRDA
>correctly in recent R-versions and
> (b) call these functions in pkg RobExtremes ?
> 
> We already did import stats::approxfun and stats::splinefun into the 
> NAMESPACEs of pkgs
> RobAStRDA and RobExtremes.
> 
> Thanks for your help already,
> Peter
> 
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making parent.env- an error for package namespaces and package imports

2014-10-18 Thread Tierney, Luke
I'll look into it

Sent from my iPhone

 On Oct 17, 2014, at 1:13 AM, Karl Millar kmil...@google.com wrote:
 
 I'd like to propose a change to the R language so that calling
 'parent.env-' on a package namespace or package imports is a runtime
 error.
 
 Currently the documentation warns that it's dangerous behaviour and
 might go away:
 The replacement function ‘parent.env-’ is extremely dangerous as
 it can be used to destructively change environments in ways that
 violate assumptions made by the internal C code.  It may be
 removed in the near future.
 
 This change would both eliminate some potential dangerous behaviours,
 and make it significantly easier for runtime compilation systems to
 optimize symbol lookups for code in packages.
 
 The following patch against current svn implements this functionality.
 It allows calls to 'parent.env-' only until the namespace is locked,
 allowing the namespace to be built correctly while preventing user
 code from subsequently messing with it.
 
 I'd also like to make calling parent.env- on an environment on the
 call stack an error, for the same reasons, but it's not so obvious to
 me how to implement that efficiently right now.  Could we at least
 document that as being 'undefined behaviour'?
 
 Thanks,
 
 Karl
 
 
 Index: src/main/builtin.c
 ===
 --- src/main/builtin.c (revision 66783)
 +++ src/main/builtin.c (working copy)
 @@ -356,6 +356,24 @@
 return( ENCLOS(arg) );
 }
 
 +static Rboolean R_IsImportsEnv(SEXP env)
 +{
 +if (isNull(env) || !isEnvironment(env))
 +return FALSE;
 +if (ENCLOS(env) != R_BaseNamespace)
 +return FALSE;
 +SEXP name = getAttrib(env, R_NameSymbol);
 +if (!isString(name) || length(name) != 1)
 +return FALSE;
 +
 +const char *imports_prefix = imports:;
 +const char *name_string = CHAR(STRING_ELT(name, 0));
 +if (!strncmp(name_string, imports_prefix, strlen(imports_prefix)))
 +return TRUE;
 +else
 +return FALSE;
 +}
 +
 SEXP attribute_hidden do_parentenvgets(SEXP call, SEXP op, SEXP args, SEXP 
 rho)
 {
 SEXP env, parent;
 @@ -371,6 +389,10 @@
  error(_(argument is not an environment));
 if( env == R_EmptyEnv )
  error(_(can not set parent of the empty environment));
 +if (R_EnvironmentIsLocked(env)  R_IsNamespaceEnv(env))
 +  error(_(can not set the parent environment of a namespace));
 +if (R_EnvironmentIsLocked(env)  R_IsImportsEnv(env))
 +  error(_(can not set the parent environment of package imports));
 parent = CADR(args);
 if (isNull(parent)) {
  error(_(use of NULL environment is defunct));
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel