jonkeane commented on code in PR #45951: URL: https://github.com/apache/arrow/pull/45951#discussion_r2029875356
########## r/src/altrep.cpp: ########## @@ -1267,23 +1293,23 @@ sexp test_arrow_altrep_copy_by_dataptr(sexp x) { if (TYPEOF(x) == INTSXP) { cpp11::writable::integers out(Rf_xlength(x)); - int* ptr = reinterpret_cast<int*>(DATAPTR(x)); + int* ptr = reinterpret_cast<int*>(INTEGER(x)); for (R_xlen_t i = 0; i < n; i++) { out[i] = ptr[i]; } return out; } else if (TYPEOF(x) == REALSXP) { cpp11::writable::doubles out(Rf_xlength(x)); - double* ptr = reinterpret_cast<double*>(DATAPTR(x)); + double* ptr = reinterpret_cast<double*>(REAL(x)); for (R_xlen_t i = 0; i < n; i++) { out[i] = ptr[i]; } return out; } else if (TYPEOF(x) == STRSXP) { cpp11::writable::strings out(Rf_xlength(x)); - SEXP* ptr = reinterpret_cast<SEXP*>(DATAPTR(x)); for (R_xlen_t i = 0; i < n; i++) { - out[i] = ptr[i]; + SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i)); + out[i] = str_elt; Review Comment: Ah, got it. Yeah if it's not helping us I agree removing it is the way to go. I tried the reprexes with `unique()` on my branch here as well as on released arrow and the results are comparable so I think we're ok. I'm curious though — were there other tests we had that would have caught the multiple materialization issue? Or were these the ones that did that? Released arrow: ``` > library(arrow) > df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20))) > write_parquet(df1,"./test.parquet") > df2 <- read_parquet("./test.parquet") > system.time(unique(df1$x)) user system elapsed 0.014 0.000 0.015 > system.time(unique(df2$x)) user system elapsed 0.113 0.001 0.114 ``` This branch: ``` > library(arrow) > df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20))) > write_parquet(df1,"./test.parquet") df2 <- read_parquet("./test.parquet") system.time(unique(df1$x)) system.time(unique(df2$x)) > df2 <- read_parquet("./test.parquet") > system.time(unique(df1$x)) user system elapsed 0.014 0.001 0.014 > system.time(unique(df2$x)) user system elapsed 0.106 0.001 0.107 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org