Re: Indexing Arbitrary Key/Value Data

Gian Merlino Fri, 25 Jan 2019 09:43:41 -0800

Hey Furkan,

There isn't currently an out of the box parser in Druid that can do what
you are describing. But it is an interesting feature to think about. Today
you could implement this using a custom parser (instead of using the
builtin json/avro/etc parsers, write an extension that implements an
InputRowParser, and you can do anything you want, including automatic
flattening of nested data).

In terms of how this might be done out of the box in the future I could
think of a few ideas.

1) Have some way to define an "automatic flatten spec". Maybe something
that systematically flattens in a particular way: in your example, perhaps
it'd automatically create fields like "world.0.hey" and "world.1.tree".

2) A repetition and definition level scheme similar to Parquet:
https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html.
It sounds like this could be more natural and lend itself to better
compression than (1).

3) Create a new column type designed to store json-like data, although
presumably in some more optimized form. Add some query-time functionality
for extracting values from it. Use this for storing the original input
data. This would only really make sense if you had rollup disabled. In this
case, the idea would be that you would store an entire ingested object in
this new kind of column, and extract some subset fields for faster access
into traditional dimension and metric columns.

On Wed, Jan 9, 2019 at 8:08 AM Furkan KAMACI <furkankam...@gmail.com> wrote:

> Hi Dylan,
>
> Indexing such data as flatten works for my case. I've checked that
> documentation before and this is similar to my need at documentation:
>
> "world": [{"hey": "there"}, {"tree": "apple"}]
>
> However, I don't know what will be the keys at indexing time. Such
> configuration is handled via this at documentation:
>
> ...
> {
>   "type": "path",
>   "name": "world-hey",
>   "expr": "$.world[0].hey"
> },
> {
>   "type": "path",
>   "name": "worldtree",
>   "expr": "$.world[1].tree"
> }
> ...
>
> However, I can't define parseSpec for it as far as I know. I need something
> like working as schemaless or defining a RegEx i.e?
>
> Kind Regards,
> Furkan KAMACI
>
> On Wed, Jan 9, 2019 at 5:45 PM Dylan Wylie <dylanwy...@gmail.com> wrote:
>
> > Hey Furkan,
> >
> > Druid can index flat arrays (multi-value dimensions) but not arrays of
> > objects. There is the ability to flatten objects on ingestion using
> > JSONPath. See http://druid.io/docs/latest/ingestion/flatten-json
> >
> > Best regards,
> > Dylan
> >
> > On Wed, 9 Jan 2019 at 14:23, Furkan KAMACI <furkankam...@gmail.com>
> wrote:
> >
> > > Hi All,
> > >
> > > I can index such data with Druid:
> > >
> > > {"ts":"2018-01-01T02:35:45Z","appToken":"guid", "eventName":"app-open",
> > > "key1":"value1"}
> > >
> > > via this configuration:
> > >
> > > "parser" : {
> > >         "type" : "string",
> > >         "parseSpec" : {
> > >           "format" : "json",
> > >           "timestampSpec" : {
> > >             "format" : "iso",
> > >             "column" : "ts"
> > >           },
> > >           "dimensionsSpec" : {
> > >             "dimensions": [
> > >               "appToken",
> > >               "eventName",
> > >               "key1"
> > >             ]
> > >           }
> > >         }
> > >       }
> > >
> > > However, I would also want to index such data:
> > >
> > > {
> > >   "ts":"2018-01-01T03:35:45Z",
> > >   "appToken":"guid",
> > >   "eventName":"app-open",
> > >   "properties":[{"randomKey1":"randomValue1"},
> > > {"randomKey2":"randomValue2"}]
> > > }
> > >
> > > at which properties is an array and members of that array has some
> > > arbitrary keys and values.
> > >
> > > How can I do that?
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> >
>

Re: Indexing Arbitrary Key/Value Data

Reply via email to