Hey Furkan, There isn't currently an out of the box parser in Druid that can do what you are describing. But it is an interesting feature to think about. Today you could implement this using a custom parser (instead of using the builtin json/avro/etc parsers, write an extension that implements an InputRowParser, and you can do anything you want, including automatic flattening of nested data).
In terms of how this might be done out of the box in the future I could think of a few ideas. 1) Have some way to define an "automatic flatten spec". Maybe something that systematically flattens in a particular way: in your example, perhaps it'd automatically create fields like "world.0.hey" and "world.1.tree". 2) A repetition and definition level scheme similar to Parquet: https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html. It sounds like this could be more natural and lend itself to better compression than (1). 3) Create a new column type designed to store json-like data, although presumably in some more optimized form. Add some query-time functionality for extracting values from it. Use this for storing the original input data. This would only really make sense if you had rollup disabled. In this case, the idea would be that you would store an entire ingested object in this new kind of column, and extract some subset fields for faster access into traditional dimension and metric columns. On Wed, Jan 9, 2019 at 8:08 AM Furkan KAMACI <[email protected]> wrote: > Hi Dylan, > > Indexing such data as flatten works for my case. I've checked that > documentation before and this is similar to my need at documentation: > > "world": [{"hey": "there"}, {"tree": "apple"}] > > However, I don't know what will be the keys at indexing time. Such > configuration is handled via this at documentation: > > ... > { > "type": "path", > "name": "world-hey", > "expr": "$.world[0].hey" > }, > { > "type": "path", > "name": "worldtree", > "expr": "$.world[1].tree" > } > ... > > However, I can't define parseSpec for it as far as I know. I need something > like working as schemaless or defining a RegEx i.e? > > Kind Regards, > Furkan KAMACI > > On Wed, Jan 9, 2019 at 5:45 PM Dylan Wylie <[email protected]> wrote: > > > Hey Furkan, > > > > Druid can index flat arrays (multi-value dimensions) but not arrays of > > objects. There is the ability to flatten objects on ingestion using > > JSONPath. See http://druid.io/docs/latest/ingestion/flatten-json > > > > Best regards, > > Dylan > > > > On Wed, 9 Jan 2019 at 14:23, Furkan KAMACI <[email protected]> > wrote: > > > > > Hi All, > > > > > > I can index such data with Druid: > > > > > > {"ts":"2018-01-01T02:35:45Z","appToken":"guid", "eventName":"app-open", > > > "key1":"value1"} > > > > > > via this configuration: > > > > > > "parser" : { > > > "type" : "string", > > > "parseSpec" : { > > > "format" : "json", > > > "timestampSpec" : { > > > "format" : "iso", > > > "column" : "ts" > > > }, > > > "dimensionsSpec" : { > > > "dimensions": [ > > > "appToken", > > > "eventName", > > > "key1" > > > ] > > > } > > > } > > > } > > > > > > However, I would also want to index such data: > > > > > > { > > > "ts":"2018-01-01T03:35:45Z", > > > "appToken":"guid", > > > "eventName":"app-open", > > > "properties":[{"randomKey1":"randomValue1"}, > > > {"randomKey2":"randomValue2"}] > > > } > > > > > > at which properties is an array and members of that array has some > > > arbitrary keys and values. > > > > > > How can I do that? > > > > > > Kind Regards, > > > Furkan KAMACI > > > > > >
