romainfrancois edited a comment on pull request #8365:
URL: https://github.com/apache/arrow/pull/8365#issuecomment-717996007


   if what we want is that the nul is kept, based on @bkietz comment from 
https://github.com/apache/arrow/pull/8536 
`Converter_String::Ingest_some_nulls()` could end with: 
   
   ```cpp
       cpp11::unwind_protect([&] {
         if (array->null_count()) {
           // need to watch for nulls
           arrow::internal::BitmapReader null_reader(array->null_bitmap_data(),
                                                     array->offset(), n);
           for (int i = 0; i < n; i++, null_reader.Next()) {
             if (null_reader.IsSet()) {
               SET_STRING_ELT(data, start + i, 
unsafe_r_string(string_array->GetString(i)));
             } else {
               SET_STRING_ELT(data, start + i, NA_STRING);
             }
           }
   
         } else {
           for (int i = 0; i < n; i++) {
             SET_STRING_ELT(data, start + i, 
unsafe_r_string(string_array->GetString(i)));
           }
         }
       });
   ```
   
   with: 
   
   ```cpp
   private:
     SEXP unsafe_r_string(const std::string& s) const {
       return Rf_mkCharLenCE(s.c_str(), s.size(), CE_UTF8);
     }
   ```
   
   this builds on knowing that `GetString()` gives an utf8 string, and that we 
know its size, as opposed to what `cpp11::r_string(std::string)` does here: 
https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_string.hpp#L18 
and uses unwind protection around the loop instead of the individual 
conversions. 
   
   i.e. it assumes utf-8 but since it does not use the known size, it searches 
for `nul`: 
   
   ```
   r_string(const std::string& data) : data_(safe[Rf_mkCharCE](data.c_str(), 
CE_UTF8)) {}
   ```
   
   cc @jimhester, is this on purpose that this constructor uses `Rf_mkCharCE` 
instead of `Rf_mkCharLenCE` ? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to