On Thu, Dec 19, 2019 at 1:11 PM Antoine Pitrou <[email protected]> wrote: > > > > Would the Arrow team welcome a pull request that enhances > ValidateFull() to > > validate that utf8-column values are well-formed UTF-8 byte sequences? > > We already have a UTF-8 validation function, but it's not hooked into > ValidateFull(). So, yes, that seems desirable to me. Can you open a > JIRA and perhaps a PR? > > > Another validation we've added to Workbench is in column *names*. In > > Arrow's IPC layer, `FieldFromFlatbuffer()` validates that column names are > > not null. But it doesn't validate that column names are well-formed UTF-8. > > The Flatbuffers spec says strings should be valid UTF-8. Should > > `FieldFromFlatbuffer()` check?
Seems like this should happen in `Schema::Validate` or similar. I don't think erroring on IPC reconstruction would be desirable > I have no idea. I'll let others comment. > > Regards > > Antoine.
