Re: Druid Auto Field Type Detection

Gian Merlino Mon, 11 Feb 2019 14:44:26 -0800

Yeah that's a good point. Maybe we should store some extra information
about what the type was in the original input.


On Sat, Jan 26, 2019 at 4:04 AM Furkan KAMACI <[email protected]>
wrote:

> Hi Gian,
>
> Same problem applies to null fields too. When first record is null, it will
> not possible to detect such a field's type.
>
> However, problem is different at my case. You may have an ad-hoc field
> which is not defined at beginning. Such a field should have strict type but
> not known at the beginning. At your example case, we may define such field
> as Integer and throw error or skip an entry which has a value if "foo" due
> to field is initialized as Integer. On the other hand, sending a datum as:
>
> field: 3
>
> and
>
> field: "3"
>
> maybe threatened different. Second one could be String but first one should
> be Integer.
>
> I think that Solr could be an example for us such a schemaless mode.
> What do you think?
>
> Kind Regards,
> Furkan KAMACI
>
> On Fri, Jan 25, 2019 at 8:56 PM Gian Merlino <[email protected]> wrote:
>
> > Hey Furkan,
> >
> > Right now when Druid detects dimensions (so called "schemaless" mode,
> what
> > you get when you have an empty dimensions list at ingestion time), it
> > assumes they are all strings. It would definitely be better if it did
> some
> > analysis on the incoming data and chose the most appropriate type. I
> think
> > the main consideration here is that Druid has to pick a type as soon as
> it
> > sees a new column, but it might not get it right just by looking at the
> > first record. Imagine some JSON data where you have a field that is the
> > number 3 for the first row Druid sees, but the string "foo" in the
> second.
> > The right type would be string, but Druid wouldn't know that when it gets
> > the first row.
> >
> > Maybe it would work to do some mechanism where auto-detected fields are
> > ingested as strings initially into IncrementalIndex, and then potentially
> > converted to a different type when written to disk.
> >
> > On Thu, Jan 10, 2019 at 12:43 AM Furkan KAMACI <[email protected]>
> > wrote:
> >
> > > Hi All,
> > >
> > > I can define auto type detection for timestamp as follows:
> > >
> > > "timestampSpec" : {
> > >      "format" : "auto",
> > >      "column" : "ts"
> > > }
> > >
> > > In similar manner, I cannot detect field type via parseSpec. I mean:
> > >
> > >
> > >
> >
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid1","eventName":"app-x","properties-key1":"123"}
> > >
> > >
> > >
> >
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid2","eventName":"app-x","properties-key2":123}
> > >
> > > Both properties-key1 and properties-key2 are indexed as String. I
> expect
> > to
> > > index properties-key2 as Integer at Druid.
> > >
> > > So, is there any mechanism at Druid about letting Druid auto filed type
> > > detection for a newly created field? If not, I would like to implement
> > such
> > > a feature.
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> >
>

Re: Druid Auto Field Type Detection

Reply via email to