jonkeane commented on code in PR #45951:
URL: https://github.com/apache/arrow/pull/45951#discussion_r2029875356
##########
r/src/altrep.cpp:
##########
@@ -1267,23 +1293,23 @@ sexp test_arrow_altrep_copy_by_dataptr(sexp x) {
if (TYPEOF(x) == INTSXP) {
cpp11::writable::integers out(Rf_xlength(x));
- int* ptr = reinterpret_cast<int*>(DATAPTR(x));
+ int* ptr = reinterpret_cast<int*>(INTEGER(x));
for (R_xlen_t i = 0; i < n; i++) {
out[i] = ptr[i];
}
return out;
} else if (TYPEOF(x) == REALSXP) {
cpp11::writable::doubles out(Rf_xlength(x));
- double* ptr = reinterpret_cast<double*>(DATAPTR(x));
+ double* ptr = reinterpret_cast<double*>(REAL(x));
for (R_xlen_t i = 0; i < n; i++) {
out[i] = ptr[i];
}
return out;
} else if (TYPEOF(x) == STRSXP) {
cpp11::writable::strings out(Rf_xlength(x));
- SEXP* ptr = reinterpret_cast<SEXP*>(DATAPTR(x));
for (R_xlen_t i = 0; i < n; i++) {
- out[i] = ptr[i];
+ SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i));
+ out[i] = str_elt;
Review Comment:
Ah, got it. Yeah if it's not helping us I agree removing it is the way to
go. I tried the reprexes with `unique()` on my branch here as well as on
released arrow and the results are comparable so I think we're ok.
I'm curious though — were there other tests we had that would have caught
the multiple materialization issue? Or were these the ones that did that?
Released arrow:
```
> library(arrow)
> df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20)))
> write_parquet(df1,"./test.parquet")
> df2 <- read_parquet("./test.parquet")
> system.time(unique(df1$x))
user system elapsed
0.014 0.000 0.015
> system.time(unique(df2$x))
user system elapsed
0.113 0.001 0.114
```
This branch:
```
> library(arrow)
> df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20)))
> write_parquet(df1,"./test.parquet")
df2 <- read_parquet("./test.parquet")
system.time(unique(df1$x))
system.time(unique(df2$x))
> df2 <- read_parquet("./test.parquet")
> system.time(unique(df1$x))
user system elapsed
0.014 0.001 0.014
> system.time(unique(df2$x))
user system elapsed
0.106 0.001 0.107
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]