Currently, to figure out which types may be inferred and under which circumstances they will be inferred involves digging through code. I think it would be useful to have an API for expressing type inference rules. Ideally this would be provided as utility functions alongside StringConverter and used by anything which does type inference while parsing/unboxing. In addition to simplifying implementation, this would simplify documentation by providing a single inference mechanism to summarize them all.
For purposes of discussion, type inference rules can be expressed as a directed graph with vertices representing types and the edges indicating fallback on failed conversion. For example, in the case of arrow's csv reader, the graph is very simple: NULL -> INT64 -> DOUBLE -> TIMESTAMP -> STRING -> BINARY This indicates that a column containing only values which can be converted to null (NULL, null, N/A, and a few other strings are currently recognized) will be an array of NullType. If the column contains values which can't be converted to null then conversion to int64 is attempted. If that succeeds then the column is an array of Int64Type, otherwise conversion to double is attempted and so on. By contrast, when reading JSON (which is explicit about numbers vs strings), the graph would be: NULL -> BOOL NULL -> INT64 -> DOUBLE NULL -> TIMESTAMP -> STRING -> BINARY Seem reasonable? Is there a case which isn't covered by a fallback graph as above?