Hi Gian, Thanks for the detailed answer! I'll implement such a feature and let you know about the development process.
Kind Regards, Furkan KAMACI On Fri, Jan 25, 2019 at 8:43 PM Gian Merlino <[email protected]> wrote: > Hey Furkan, > > There isn't currently an out of the box parser in Druid that can do what > you are describing. But it is an interesting feature to think about. Today > you could implement this using a custom parser (instead of using the > builtin json/avro/etc parsers, write an extension that implements an > InputRowParser, and you can do anything you want, including automatic > flattening of nested data). > > In terms of how this might be done out of the box in the future I could > think of a few ideas. > > 1) Have some way to define an "automatic flatten spec". Maybe something > that systematically flattens in a particular way: in your example, perhaps > it'd automatically create fields like "world.0.hey" and "world.1.tree". > > 2) A repetition and definition level scheme similar to Parquet: > > https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html > . > It sounds like this could be more natural and lend itself to better > compression than (1). > > 3) Create a new column type designed to store json-like data, although > presumably in some more optimized form. Add some query-time functionality > for extracting values from it. Use this for storing the original input > data. This would only really make sense if you had rollup disabled. In this > case, the idea would be that you would store an entire ingested object in > this new kind of column, and extract some subset fields for faster access > into traditional dimension and metric columns. > > On Wed, Jan 9, 2019 at 8:08 AM Furkan KAMACI <[email protected]> > wrote: > > > Hi Dylan, > > > > Indexing such data as flatten works for my case. I've checked that > > documentation before and this is similar to my need at documentation: > > > > "world": [{"hey": "there"}, {"tree": "apple"}] > > > > However, I don't know what will be the keys at indexing time. Such > > configuration is handled via this at documentation: > > > > ... > > { > > "type": "path", > > "name": "world-hey", > > "expr": "$.world[0].hey" > > }, > > { > > "type": "path", > > "name": "worldtree", > > "expr": "$.world[1].tree" > > } > > ... > > > > However, I can't define parseSpec for it as far as I know. I need > something > > like working as schemaless or defining a RegEx i.e? > > > > Kind Regards, > > Furkan KAMACI > > > > On Wed, Jan 9, 2019 at 5:45 PM Dylan Wylie <[email protected]> wrote: > > > > > Hey Furkan, > > > > > > Druid can index flat arrays (multi-value dimensions) but not arrays of > > > objects. There is the ability to flatten objects on ingestion using > > > JSONPath. See http://druid.io/docs/latest/ingestion/flatten-json > > > > > > Best regards, > > > Dylan > > > > > > On Wed, 9 Jan 2019 at 14:23, Furkan KAMACI <[email protected]> > > wrote: > > > > > > > Hi All, > > > > > > > > I can index such data with Druid: > > > > > > > > {"ts":"2018-01-01T02:35:45Z","appToken":"guid", > "eventName":"app-open", > > > > "key1":"value1"} > > > > > > > > via this configuration: > > > > > > > > "parser" : { > > > > "type" : "string", > > > > "parseSpec" : { > > > > "format" : "json", > > > > "timestampSpec" : { > > > > "format" : "iso", > > > > "column" : "ts" > > > > }, > > > > "dimensionsSpec" : { > > > > "dimensions": [ > > > > "appToken", > > > > "eventName", > > > > "key1" > > > > ] > > > > } > > > > } > > > > } > > > > > > > > However, I would also want to index such data: > > > > > > > > { > > > > "ts":"2018-01-01T03:35:45Z", > > > > "appToken":"guid", > > > > "eventName":"app-open", > > > > "properties":[{"randomKey1":"randomValue1"}, > > > > {"randomKey2":"randomValue2"}] > > > > } > > > > > > > > at which properties is an array and members of that array has some > > > > arbitrary keys and values. > > > > > > > > How can I do that? > > > > > > > > Kind Regards, > > > > Furkan KAMACI > > > > > > > > > >
