Yeah that's a good point. Maybe we should store some extra information about what the type was in the original input.
On Sat, Jan 26, 2019 at 4:04 AM Furkan KAMACI <[email protected]> wrote: > Hi Gian, > > Same problem applies to null fields too. When first record is null, it will > not possible to detect such a field's type. > > However, problem is different at my case. You may have an ad-hoc field > which is not defined at beginning. Such a field should have strict type but > not known at the beginning. At your example case, we may define such field > as Integer and throw error or skip an entry which has a value if "foo" due > to field is initialized as Integer. On the other hand, sending a datum as: > > field: 3 > > and > > field: "3" > > maybe threatened different. Second one could be String but first one should > be Integer. > > I think that Solr could be an example for us such a schemaless mode. > What do you think? > > Kind Regards, > Furkan KAMACI > > On Fri, Jan 25, 2019 at 8:56 PM Gian Merlino <[email protected]> wrote: > > > Hey Furkan, > > > > Right now when Druid detects dimensions (so called "schemaless" mode, > what > > you get when you have an empty dimensions list at ingestion time), it > > assumes they are all strings. It would definitely be better if it did > some > > analysis on the incoming data and chose the most appropriate type. I > think > > the main consideration here is that Druid has to pick a type as soon as > it > > sees a new column, but it might not get it right just by looking at the > > first record. Imagine some JSON data where you have a field that is the > > number 3 for the first row Druid sees, but the string "foo" in the > second. > > The right type would be string, but Druid wouldn't know that when it gets > > the first row. > > > > Maybe it would work to do some mechanism where auto-detected fields are > > ingested as strings initially into IncrementalIndex, and then potentially > > converted to a different type when written to disk. > > > > On Thu, Jan 10, 2019 at 12:43 AM Furkan KAMACI <[email protected]> > > wrote: > > > > > Hi All, > > > > > > I can define auto type detection for timestamp as follows: > > > > > > "timestampSpec" : { > > > "format" : "auto", > > > "column" : "ts" > > > } > > > > > > In similar manner, I cannot detect field type via parseSpec. I mean: > > > > > > > > > > > > {"ts":"2018-01-01T03:35:45Z","app_token":"guid1","eventName":"app-x","properties-key1":"123"} > > > > > > > > > > > > {"ts":"2018-01-01T03:35:45Z","app_token":"guid2","eventName":"app-x","properties-key2":123} > > > > > > Both properties-key1 and properties-key2 are indexed as String. I > expect > > to > > > index properties-key2 as Integer at Druid. > > > > > > So, is there any mechanism at Druid about letting Druid auto filed type > > > detection for a newly created field? If not, I would like to implement > > such > > > a feature. > > > > > > Kind Regards, > > > Furkan KAMACI > > > > > >
