gaborcsardi commented on code in PR #43351:
URL: https://github.com/apache/arrow/pull/43351#discussion_r1686735951


##########
r/src/arrow_cpp11.h:
##########
@@ -148,7 +148,7 @@ inline SEXP utf8_strings(SEXP x) {
 
     for (R_xlen_t i = 0; i < n; i++, ++p_x) {
       SEXP s = *p_x;
-      if (s != NA_STRING) {
+      if (s != NA_STRING && ALTREP(s)) {

Review Comment:
   Yeah, that C++ looks correct, except you don't need to translate twice:
   ```c++
   SEXP new_s = Rf_mkCharCE(Rf_translateCharUTF8(s);
   if (new_s != s) {
       SET_STRING_ELT(x, i, Rf_mkCharCE(new_s));
   }
   ```
   
   However, I think the issue is that if `x` is ALTREP, then it does not matter 
if it is materialized of not, it is still ALTREP, so you still cannot change it 
with `SET_STRING_ELT`. 
   
   To avoid this, you'd need to call `Rf_duplicate()` on `x`. That should 
create a non-altrep copy of it. 
   
   You can always call `Rf_duplicate()`, or you can call it when you encounter 
a non-utf8 string. Note that you need to `PROTECT()` the result of 
`Rf_duplicate()` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to