nealrichardson commented on code in PR #43351:
URL: https://github.com/apache/arrow/pull/43351#discussion_r1689716982


##########
r/src/arrow_cpp11.h:
##########
@@ -138,7 +138,12 @@ inline R_xlen_t r_string_size(SEXP s) {
 }  // namespace unsafe
 
 inline SEXP utf8_strings(SEXP x) {
-  return cpp11::unwind_protect([x] {
+  return cpp11::unwind_protect([&] {
+    // ensure that x is not actually altrep first
+    bool was_altrep = ALTREP(x);
+    if (was_altrep) {
+      x = PROTECT(Rf_duplicate(x));

Review Comment:
   Add a comment about why we have to duplicate?



##########
r/src/arrow_cpp11.h:
##########
@@ -152,6 +157,9 @@ inline SEXP utf8_strings(SEXP x) {
         SET_STRING_ELT(x, i, Rf_mkCharCE(Rf_translateCharUTF8(s), CE_UTF8));

Review Comment:
   Did we want to check whether `Rf_translateCharUTF8()` actually modified 
anything? Or do we trust that `SET_STRING_ELT` is a no-op in that case? I would 
imagine that in most cases, we already have ascii/utf-8 strings, so this whole 
function should be basically free. That should be easily verified by 
microbenchmarking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to