Re: [C++] Validate UTF-8 in ValidateFull?

Wes McKinney Thu, 19 Dec 2019 11:21:12 -0800

On Thu, Dec 19, 2019 at 1:11 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> > Would the Arrow team welcome a pull request that enhances
> ValidateFull() to
> > validate that utf8-column values are well-formed UTF-8 byte sequences?
>
> We already have a UTF-8 validation function, but it's not hooked into
> ValidateFull().  So, yes, that seems desirable to me.  Can you open a
> JIRA and perhaps a PR?
>
> > Another validation we've added to Workbench is in column *names*. In
> > Arrow's IPC layer, `FieldFromFlatbuffer()` validates that column names are
> > not null. But it doesn't validate that column names are well-formed UTF-8.
> > The Flatbuffers spec says strings should be valid UTF-8. Should
> > `FieldFromFlatbuffer()` check?


Seems like this should happen in `Schema::Validate` or similar. I
don't think erroring on IPC reconstruction would be desirable

> I have no idea.  I'll let others comment.
>
> Regards
>
> Antoine.

Re: [C++] Validate UTF-8 in ValidateFull?

Reply via email to