On Thu, Dec 11, 2014 at 12:24 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu> wrote: > I'm interfacing a c++ library which assumes strings are UTF-8. However > strings from R can have various encodings. It's not clear to me how I > need to account for that in Rcpp.
Follow-up on this: from what I have found, there is currently no string type that is unambiguous across platforms and locales (other than the actual STRSXP). If the native locale uses UTF8 than all is fine, but we can not assume that in R. Here is a little script that illustrates the various combinations I tried and the results on Windows: https://gist.github.com/jeroenooms/9edf97f873f17a4ce5d3. Assuming that each of these cases are intended behavior, perhaps we could introduce an additional string type e.g. Rcpp::UTF8String. The mapping from STRSXP to Rcpp::UTF8String would use translateCharUTF8(STRING_ELT(x, 0)) and the mapping Rcpp::UTF8String back to STRSXP would use SET_STRING_ELT(out, 0, mkCharCE(olds, CE_UTF8)). That would allow for defining c++ functions operating on UTF8 strings which will work as expected across platforms and locales. _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel