Re: Druid Auto Field Type Detection

Gian Merlino Fri, 25 Jan 2019 09:57:39 -0800

Hey Furkan,

Right now when Druid detects dimensions (so called "schemaless" mode, what
you get when you have an empty dimensions list at ingestion time), it
assumes they are all strings. It would definitely be better if it did some
analysis on the incoming data and chose the most appropriate type. I think
the main consideration here is that Druid has to pick a type as soon as it
sees a new column, but it might not get it right just by looking at the
first record. Imagine some JSON data where you have a field that is the
number 3 for the first row Druid sees, but the string "foo" in the second.
The right type would be string, but Druid wouldn't know that when it gets
the first row.


Maybe it would work to do some mechanism where auto-detected fields are
ingested as strings initially into IncrementalIndex, and then potentially
converted to a different type when written to disk.

On Thu, Jan 10, 2019 at 12:43 AM Furkan KAMACI <[email protected]>
wrote:

> Hi All,
>
> I can define auto type detection for timestamp as follows:
>
> "timestampSpec" : {
>      "format" : "auto",
>      "column" : "ts"
> }
>
> In similar manner, I cannot detect field type via parseSpec. I mean:
>
>
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid1","eventName":"app-x","properties-key1":"123"}
>
>
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid2","eventName":"app-x","properties-key2":123}
>
> Both properties-key1 and properties-key2 are indexed as String. I expect to
> index properties-key2 as Integer at Druid.
>
> So, is there any mechanism at Druid about letting Druid auto filed type
> detection for a newly created field? If not, I would like to implement such
> a feature.
>
> Kind Regards,
> Furkan KAMACI
>

Re: Druid Auto Field Type Detection

Reply via email to