Re: RFC: Type inference rules

Ben Kietzman Fri, 30 Nov 2018 06:44:26 -0800

Hi Antoine,

The conversion of previous blocks is part of the fall back mechanism I'm
trying to describe. When type inference fails (even in a different block),
conversion of all blocks of the column is attempted to the next type in the
fallback graph.


If there is no problem with the fallback graph model, the API would
probably look like a reusable LoosenType- something which simplifies
querying for the loosened type when inference fails.

Unrelated: I forgot to include some edges in the json graph

NULL -> BOOL
NULL -> INT64 -> DOUBLE
NULL -> TIMESTAMP -> STRING -> BINARY
NULL -> STRUCT
NULL -> LIST

On Fri, Nov 30, 2018, 04:52 Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Ben,
>
> Le 30/11/2018 à 02:19, Ben Kietzman a écrit :
> > Currently, to figure out which types may be inferred and under which
> > circumstances they will be inferred involves digging through code. I
> think
> > it would be useful to have an API for expressing type inference rules.
> > Ideally this would be provided as utility functions alongside
> > StringConverter and used by anything which does type inference while
> > parsing/unboxing.
>
> It may be a bit more complicated.  For example, a CSV file is parsed by
> blocks, and each block produces an array chunk.  But when the type of a
> later block changes due to type inference failing on the current type,
> all previous blocks must be parsed again.
>
> So I'm curious what you would make the API look like.
>
> > By contrast, when reading JSON (which is explicit about numbers vs
> > strings), the graph would be:
> >
> >   NULL -> BOOL
> >   NULL -> INT64 -> DOUBLE
> >   NULL -> TIMESTAMP -> STRING -> BINARY
> >
> > Seem reasonable?
> > Is there a case which isn't covered by a fallback graph as above?
>
> I have no idea.  Someone else may be able to answer your question.
>
> Regards
>
> Antoine.
>

Re: RFC: Type inference rules

Reply via email to