Hi Antoine, The conversion of previous blocks is part of the fall back mechanism I'm trying to describe. When type inference fails (even in a different block), conversion of all blocks of the column is attempted to the next type in the fallback graph.
If there is no problem with the fallback graph model, the API would probably look like a reusable LoosenType- something which simplifies querying for the loosened type when inference fails. Unrelated: I forgot to include some edges in the json graph NULL -> BOOL NULL -> INT64 -> DOUBLE NULL -> TIMESTAMP -> STRING -> BINARY NULL -> STRUCT NULL -> LIST On Fri, Nov 30, 2018, 04:52 Antoine Pitrou <anto...@python.org> wrote: > > Hi Ben, > > Le 30/11/2018 à 02:19, Ben Kietzman a écrit : > > Currently, to figure out which types may be inferred and under which > > circumstances they will be inferred involves digging through code. I > think > > it would be useful to have an API for expressing type inference rules. > > Ideally this would be provided as utility functions alongside > > StringConverter and used by anything which does type inference while > > parsing/unboxing. > > It may be a bit more complicated. For example, a CSV file is parsed by > blocks, and each block produces an array chunk. But when the type of a > later block changes due to type inference failing on the current type, > all previous blocks must be parsed again. > > So I'm curious what you would make the API look like. > > > By contrast, when reading JSON (which is explicit about numbers vs > > strings), the graph would be: > > > > NULL -> BOOL > > NULL -> INT64 -> DOUBLE > > NULL -> TIMESTAMP -> STRING -> BINARY > > > > Seem reasonable? > > Is there a case which isn't covered by a fallback graph as above? > > I have no idea. Someone else may be able to answer your question. > > Regards > > Antoine. >