[Rd] ALTREP wrappers and factors

2019-07-18 Thread Bemis, Kylie
Hello,

I’m experimenting with ALTREP and was wondering if there is a preferred way to 
create an ALTREP wrapper vector without using .Internal(wrap_meta(…)), which R 
CMD check doesn’t like since it uses an .Internal() function.

I was trying to create a factor that used an ALTREP integer, but attempting to 
set the class and levels attributes always ended up duplicating and 
materializing the integer vector. Using the wrapper avoided this issue.

Here is my initial ALTREP integer vector:

> fc0 <- factor(c("a", "a", "b"))
>
> y <- matter::as.matter(as.integer(fc0))
> y <- matter:::as.altrep(y)
>
> .Internal(inspect(y))
@7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter vector (mode=3, len=3, mem=0)

Here is what I get without a wrapper:

> fc1 <- structure(y, class="factor", levels=levels(x))
> .Internal(inspect(fc1))
@7fb0cae66408 13 INTSXP g0c2 [OBJ,NAM(2),ATT] (len=3, tl=0) 1,1,2
ATTRIB:
  @7fb0ce771868 02 LISTSXP g0c0 []
TAG: @7fb0c80043d0 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
@7fb0c9fcbe90 16 STRSXP g0c1 [NAM(7)] (len=1, tl=0)
  @7fb0c80841a0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "factor"
TAG: @7fb0c8004050 01 SYMSXP g1c0 [MARK,NAM(7),LCK,gp=0x4000] "levels" (has 
value)
@7fb0d1dd58c8 16 STRSXP g0c2 [MARK,NAM(7)] (len=2, tl=0)
  @7fb0c81bf4c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
  @7fb0c90ba728 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "b"

Here is what I get with a wrapper:

> fc2 <- structure(.Internal(wrap_meta(y, 0, 0)), class="factor", 
> levels=levels(x))
> .Internal(inspect(fc2))
@7fb0ce764630 13 INTSXP g0c0 [OBJ,NAM(2),ATT]  wrapper [srt=0,no_na=0]
  @7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter vector (mode=3, len=3, mem=0)
ATTRIB:
  @7fb0ce764668 02 LISTSXP g0c0 []
TAG: @7fb0c80043d0 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
@7fb0c9fcb010 16 STRSXP g0c1 [NAM(7)] (len=1, tl=0)
  @7fb0c80841a0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "factor"
TAG: @7fb0c8004050 01 SYMSXP g1c0 [MARK,NAM(7),LCK,gp=0x4000] "levels" (has 
value)
@7fb0d1dd58c8 16 STRSXP g0c2 [MARK,NAM(7)] (len=2, tl=0)
  @7fb0c81bf4c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
  @7fb0c90ba728 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "b"

Is there a way to do this that doesn’t rely on .Internal() and won’t produce R 
CMD check warnings?

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io











[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Any plans for ALTREP lists (VECSXP)?

2019-07-23 Thread Bemis, Kylie
Hello,

I was wondering if there were any plans for ALTREP lists (VECSXP)?

It seems to me that they could be supported in a similar way to how ALTSTRING 
works, with Elt() and Set_elt() methods, or would there be some problems with 
that I’m not seeing due to lists not being atomic vectors?

I was taking an approach of converting each list element (of a file-based list 
data structure) to an ALTREP representation to build up an “ALTREP list”.

This seems fine for shorter lists with large elements, but I noticed that for 
longer lists with smaller elements, this could be far more time-consuming than 
simply reading the entire list into memory and returning a non-ALTREP list:

> x
<34840 length> matter_list :: out-of-memory list
(1.1 MB real | 543.3 MB virtual)

> system.time(y <- as.list(x))
   user  system elapsed
  1.116   2.175   5.053

> system.time(z <- as.altrep(x))
   user  system elapsed
 36.295   4.717  41.216

> .Internal(inspect(y))
@108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
  @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 
404.093,404.096,404.099,404.102,404.105,...
  @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 
409.924,409.927,409.931,409.934,409.937,...
  @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 
400.3,400.303,400.306,400.309,400.312,...
  @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 
402.179,402.182,402.185,402.188,402.191,...
  @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 
403.021,403.024,403.027,403.03,403.033,...
  ...

> .Internal(inspect(z))
@10821 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
  @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, 
mem=0)
  @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, 
mem=0)
  @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, 
mem=0)
  @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, 
mem=0)
  @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, 
mem=0)
  ...

In this situation, it would be much faster and simpler for me to return a 
theoretical ALTREP list that serves SEXP elements on-demand, similar to how 
ALTSTRING seems to be implemented.

I don’t know how many other people would get a use out of ALTREP lists, but I 
certainly would.

Are there any plans for this?

Thanks!

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io











[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Any plans for ALTREP lists (VECSXP)?

2019-08-16 Thread Bemis, Kylie
Thanks for the suggestions, everyone.

Is it not a pressing issue requiring alternatives, since the ‘matter_list’ 
object already behaves like a list, and I am just looking for a way to present 
a native R list (VECSXP) when a regular list is required.

In this case (in my typical use case), the ‘matter_list’ is homogenous and I 
use it like a ragged array; however, in general each element could be a 
different atomic vector type (specifically raw, logical, integer, or double).

Here, as.altrep() is an S4 method for converting my custom ‘matter’-class 
out-of-memory objects into their native R representations using ALTREP.

Seems to work well for the ‘matter' vectors, matrices, and arrays, where it 
just .Call()s my C function for making the corresponding ALTREP object, but the 
lists were giving me trouble because there I use lapply() to extract and 
uncompress the ‘matter_list’ metadata for each list element into a separate S4 
‘matter_vec’ out-of-memory vector, each of which is then used to create an 
ALTREP object for the corresponding list element. So it gets costly...

The cost is mostly in re-creating all of the metadata as regular R objects that 
end up occupying the R_altrep_data1() spot for all of the individual list 
elements. If I could make an ALTREP list, I could leave the metadata as-is and 
avoid all of that.

Anyway, not a pressing issue for me either, just something I noticed where 
having an ALTREP list could be useful, so I was wondering if it was in the 
plans, which Luke answered.

Thanks,

-Kylie

On Jul 23, 2019, at 8:27 PM, Gabriel Becker 
mailto:gabembec...@gmail.com>> wrote:

Hi Kylie,

Is it a list with only numerics in it? (I only see REALSXPs there, but 
obviously inspect isn't showing all of them). If so, you could load it up into 
one big vector and then also keep partitioning information around. Bioconductor 
does this (see ?IRanges::CompressedList ). The potential benefit here being 
that the underlying large vector could then be a big out-of-memory altrep. How 
helpful this would be depends somewhat on what you want to do with it, of 
course, but it is something that comes to mind.

Also, I would expect some overhead but that seems like a lot (without having 
done super much in the way of benchmarking). What exactly is as.altrep doing?

Best,
~G

On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel 
mailto:r-devel@r-project.org>> wrote:
Hi Kylie,

As an alternative in the short term, you could consider deriving from
S4Vector's List class, implementing the getListElement() method to
lazily create the objects.

Michael

On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie 
mailto:k.be...@northeastern.edu>> wrote:
>
> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING 
> works, with Elt() and Set_elt() methods, or would there be some problems with 
> that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based 
> list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for 
> longer lists with smaller elements, this could be far more time-consuming 
> than simply reading the entire list into memory and returning a non-ALTREP 
> list:
>
> > x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
> > system.time(y <- as.list(x))
>user  system elapsed
>   1.116   2.175   5.053
>
> > system.time(z <- as.altrep(x))
>user  system elapsed
>  36.295   4.717  41.216
>
> > .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 
> 404.093,404.096,404.099,404.102,404.105,...
>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 
> 409.924,409.927,409.931,409.934,409.937,...
>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 
> 400.3,400.303,400.306,400.309,400.312,...
>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 
> 402.179,402.182,402.185,402.188,402.191,...
>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 
> 403.021,403.024,403.027,403.03,403.033,...
>   ...
>
> > .Internal(inspect(z))
> @10821 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=1129, mem=0)
>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, 
> mem=0)
>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=1878, mem=0)
>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=2266, mem=0)
>   @7f904c75ce90 14 REALSXP 

Re: [Rd] ALTREP wrappers and factors

2019-08-16 Thread Bemis, Kylie
Using R_tryWrap() at the C-level works perfectly and does what I need. Thanks, 
Gabe!

Yes, my reference count is maxed (I assume) because I am using 
MARK_NOT_MUTABLE().

Which makes me think I may want to return a wrapped matter/ALTREP object by 
default, so the user can set the names() and dim(), etc., without triggering a 
potentially-costly duplication. The data payload is intended to be immutable, 
but the attributes aren’t.

Decoupling the attributes and other metadata from the data payload seems like a 
good thing to have generally.

Are there any potential drawbacks of using R_tryWrap() that I should know 
about, besides an additional method dispatch happening somewhere?

Thanks again!

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io<https://kuwisdelu.github.io>










On Jul 19, 2019, at 4:00 AM, Gabriel Becker 
mailto:gabembec...@gmail.com>> wrote:

Hi Jiefei and Kylie,

Great to see people engaging with the ALTREP framework and identifying places 
we may need more tooling. Comments inline.

On Thu, Jul 18, 2019 at 12:22 PM King Jiefei 
mailto:szwj...@gmail.com>> wrote:

If that is the case and you are 100% sure the reference number should be 1
for your variable *y*, my solution is to call *SET_NAMED *in C++ to reset
the reference number. Note that you need to unbind your local variable
before you reset the number. To return an unbound SEXP,  the C++ function
should be placed at the end of your *matter:::as.altrep *function. I don't
know if there is any simpler way to do that and I'll be happy to see any
opinion.

So as far as I know, manually setting the NAMED value on any SEXP the garbage 
collector is aware of is a direct violation of C-API contract and not something 
that package code should ever be doing.

Its not at all clear to me that you can ever be 100% sure that the reference 
number should be 1 when it is not currently one for an R object that exists at 
the R-level (as opposed to only in pure C code). Sure, maybe the object is 
created within the body of your R function instead of being passed in, but what 
if someone is debugging your function and assigns the value to the global 
environment using <<-  for later inspection; now  you have an invalidly low 
NAMED value, ie you have a segfault coming. I know of no way for you to prevent 
this or even know it has happened.



On Thu, Jul 18, 2019 at 3:28 AM Bemis, Kylie 
mailto:k.be...@northeastern.edu>>
wrote:

> Hello,
>
> I’m experimenting with ALTREP and was wondering if there is a preferred
> way to create an ALTREP wrapper vector without using
> .Internal(wrap_meta(…)), which R CMD check doesn’t like since it uses an
> .Internal() function.

So there is the .doSortWrap  (and its currently inexplicably identical clone 
.doWrap) function in base, which is an R level function that calls down to 
.Internal(wrap_meta(...)), which you can use, but it doesn't look general 
enough for what  I think you need (it was written for things that have just 
been sorted, thus the name). Specifically, its not able to indicate that things 
are of unknown sortedness as currently written.  If matter vectors are 
guaranteed to be sorted for some reason, though, you can use this. I'll talk to 
Luke about whether we want to generalize this, it would be easy to have this 
support the full space of metadata for wrappers and be a general purpose 
wrapper-maker, but that isn't what it is right now.

At the C-level, it looks like we do make R_tryWrap available (it appears in 
Rinternals.h, and not within a USE_RINTERNALS section),so you can call that 
from your own C(++) code. This creates a wrapper that has no metadata on it (or 
rather it has metadata but  the metadata indicates that no special info is 
known about the vector).

>
> I was trying to create a factor that used an ALTREP integer, but
> attempting to set the class and levels attributes always ended up
> duplicating and materializing the integer vector. Using the wrapper avoided
> this issue.
>
> Here is my initial ALTREP integer vector:
>
> > fc0 <- factor(c("a", "a", "b"))
> >
> > y <- matter::as.matter(as.integer(fc0))
> > y <- matter:::as.altrep(y)
> >
> > .Internal(inspect(y))
> @7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter vector (mode=3, len=3, mem=0)
>
> Here is what I get without a wrapper:
>
> > fc1 <- structure(y, class="factor", levels=levels(x))
> > .Internal(inspect(fc1))
> @7fb0cae66408 13 INTSXP g0c2 [OBJ,NAM(2),ATT] (len=3, tl=0) 1,1,2
> ATTRIB:
>   @7fb0ce771868 02 LISTSXP g0c0 []
> TAG: @7fb0c80043d0 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has
> value)
> @7fb0c9fcbe90 16 STRSXP g0c1 [NAM(7)] (len=1, tl=0)
>   @7fb0c80841a0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cache

Re: [Rd] edit() doubles backslashes when keep.source=TRUE

2020-05-15 Thread Bemis, Kylie
Nightly binary builds of R-devel for macOS are available: 
http://mac.r-project.org

~~~
Kylie Ariel Bemis (she/her)
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io










On May 15, 2020, at 12:48 PM, brodie gaslam via R-devel 
mailto:r-devel@r-project.org>> wrote:


On Friday, May 15, 2020, 12:13:04 PM EDT, Dirk Eddelbuettel 
mailto:e...@debian.org>> wrote:
On 15 May 2020 at 15:41, Martin Maechler wrote:
| 
|
|Why does nobody anymore  help R development by working with
|"R-devel", or at least then the alpha, beta and the "RC"
|(Release Candidate) versions that we release daily for about one
|month before the final release?
|
|Notably a highly staffed enterprise such as Rstudio (viz the bug
|report 17800 above), but also others could really help by
|starting to use the "next version" of R on a routine basis ...
|
| 

Seconded. Without testing we can never know. R Core does their part.

I provided weekly Debian binaries. One each for the two alphas releases, for
the beta release, for the release candidate.  It is easy to use these, for
example in a Docker container.

It is also easy to use this on a normal machine as they are standard (Debian)
packages: install, try some tests, uninstall, revert to previous version by
installing that.

Dirk

This is a very reasonably request, and all useRs who benefit from the
tireless work of R-core should consider doing it.  I have considered
it, but compiling R from sources on OS X has been my stumbling block.
At least last time I tried I got stuck at the  Fortran step. It doesn't
help I have very limited experience compiling  software of the complexity
of R.  Really, I've only done it within the warm welcoming confines of the
vagrant image Tomas Kalibera set up for `rchk`.

I also use r-devel on docker, but that isn't very practical for
day-to-day usage, which is what I think we need.

What would it take to generate pre-release binaries for OS X (and Windows)?  I
imagine if such were available the volume of testers would increase
dramatically (at least, I haven't seen them if they exist).
Maybe something the R Consortium would consider funding?

Best,

B.

__
R-devel@r-project.org mailing list
https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Ck.bemis%40northeastern.edu%7C66883f8d39094f87847608d7f8efd23e%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637251581223782724&sdata=cVYbvv%2B2fqwKpMUCM6iBGu4wLOLQvQUwv4SOapZf5mM%3D&reserved=0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel