jonkeane commented on code in PR #43351: URL: https://github.com/apache/arrow/pull/43351#discussion_r1686703091
########## r/src/arrow_cpp11.h: ########## @@ -148,7 +148,7 @@ inline SEXP utf8_strings(SEXP x) { for (R_xlen_t i = 0; i < n; i++, ++p_x) { SEXP s = *p_x; - if (s != NA_STRING) { + if (s != NA_STRING && ALTREP(s)) { Review Comment: > I also think that STRING_PTR_RO() should materialize an ALTREP character vector, otherwise you can't iterate over the CHARSXP pointers. Hmm, so maybe the actual problem here is that that isn't working on the ALTREP vectors coming from vroom? Here is the error we started seeing when we changed this condition from `s != NA_STRING && !IS_UTF8(s) && !IS_ASCII(s)` to `s != NA_STRING`: ``` Error in Table__from_dots(dots, schema, option_use_threads()): No Set_elt found for ALTSTRING class [class: vroom_chr, pkg: vroom] ``` > Also, ALTREP(s) does not seem to make sense to me, that's a CHARSXP (not a STRSXP), so it cannot be ALTREP, unless I am misremembering something. Yeah, I agree that `ALTREP(s)` here is not right. > Are you trying to catch the non-UTF-8 strings here? Rf_translateCharUTF8() does not do anything (returns the same const char *) if the string is UTF-8, so you could always call it, I will admit I'm not 100% certain what is going on here, but I catching non-UTF-8 strings here is what I thought it was doing. > and only call SET_STRING_ELT() if the returned pointer is different? Oh interesting. Forgive my C++ naïveness, but would this be something like (i.e. does `!=` check if the pointer is different>): ``` SEXP new_s = Rf_mkCharCE(Rf_translateCharUTF8(s); if (new_s != s) { SET_STRING_ELT(x, i, Rf_mkCharCE(Rf_translateCharUTF8(new_s), CE_UTF8)); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org