jonkeane commented on code in PR #45951:
URL: https://github.com/apache/arrow/pull/45951#discussion_r2029875356


##########
r/src/altrep.cpp:
##########
@@ -1267,23 +1293,23 @@ sexp test_arrow_altrep_copy_by_dataptr(sexp x) {
 
   if (TYPEOF(x) == INTSXP) {
     cpp11::writable::integers out(Rf_xlength(x));
-    int* ptr = reinterpret_cast<int*>(DATAPTR(x));
+    int* ptr = reinterpret_cast<int*>(INTEGER(x));
     for (R_xlen_t i = 0; i < n; i++) {
       out[i] = ptr[i];
     }
     return out;
   } else if (TYPEOF(x) == REALSXP) {
     cpp11::writable::doubles out(Rf_xlength(x));
-    double* ptr = reinterpret_cast<double*>(DATAPTR(x));
+    double* ptr = reinterpret_cast<double*>(REAL(x));
     for (R_xlen_t i = 0; i < n; i++) {
       out[i] = ptr[i];
     }
     return out;
   } else if (TYPEOF(x) == STRSXP) {
     cpp11::writable::strings out(Rf_xlength(x));
-    SEXP* ptr = reinterpret_cast<SEXP*>(DATAPTR(x));
     for (R_xlen_t i = 0; i < n; i++) {
-      out[i] = ptr[i];
+      SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i));
+      out[i] = str_elt;

Review Comment:
   Ah, got it. Yeah if it's not helping us I agree removing it is the way to 
go. I tried the reprexes with `unique()` on my branch here as well as on 
released arrow and the results are comparable so I think we're ok. 
   
   I'm curious though — were there other tests we had that would have caught 
the multiple materialization issue? Or were these the ones that did that?
   
   Released arrow:
   ```
   > library(arrow)
   > df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20)))
   > write_parquet(df1,"./test.parquet")
   > df2 <- read_parquet("./test.parquet")
   > system.time(unique(df1$x))
      user  system elapsed 
     0.014   0.000   0.015 
   > system.time(unique(df2$x))
      user  system elapsed 
     0.113   0.001   0.114 
   ```
   
   This branch:
   ```
   > library(arrow)
   > df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20)))
   > write_parquet(df1,"./test.parquet")
   df2 <- read_parquet("./test.parquet")
   system.time(unique(df1$x))
   system.time(unique(df2$x))
   > df2 <- read_parquet("./test.parquet")
   > system.time(unique(df1$x))
      user  system elapsed 
     0.014   0.001   0.014 
   > system.time(unique(df2$x))
      user  system elapsed 
     0.106   0.001   0.107 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to