Re: Indexing Arbitrary Key/Value Data

Furkan KAMACI Sat, 26 Jan 2019 03:58:37 -0800

Hi Gian,

Thanks for the detailed answer! I'll implement such a feature and let you
know about the development process.


Kind Regards,
Furkan KAMACI

On Fri, Jan 25, 2019 at 8:43 PM Gian Merlino <[email protected]> wrote:

> Hey Furkan,
>
> There isn't currently an out of the box parser in Druid that can do what
> you are describing. But it is an interesting feature to think about. Today
> you could implement this using a custom parser (instead of using the
> builtin json/avro/etc parsers, write an extension that implements an
> InputRowParser, and you can do anything you want, including automatic
> flattening of nested data).
>
> In terms of how this might be done out of the box in the future I could
> think of a few ideas.
>
> 1) Have some way to define an "automatic flatten spec". Maybe something
> that systematically flattens in a particular way: in your example, perhaps
> it'd automatically create fields like "world.0.hey" and "world.1.tree".
>
> 2) A repetition and definition level scheme similar to Parquet:
>
> https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html
> .
> It sounds like this could be more natural and lend itself to better
> compression than (1).
>
> 3) Create a new column type designed to store json-like data, although
> presumably in some more optimized form. Add some query-time functionality
> for extracting values from it. Use this for storing the original input
> data. This would only really make sense if you had rollup disabled. In this
> case, the idea would be that you would store an entire ingested object in
> this new kind of column, and extract some subset fields for faster access
> into traditional dimension and metric columns.
>
> On Wed, Jan 9, 2019 at 8:08 AM Furkan KAMACI <[email protected]>
> wrote:
>
> > Hi Dylan,
> >
> > Indexing such data as flatten works for my case. I've checked that
> > documentation before and this is similar to my need at documentation:
> >
> > "world": [{"hey": "there"}, {"tree": "apple"}]
> >
> > However, I don't know what will be the keys at indexing time. Such
> > configuration is handled via this at documentation:
> >
> > ...
> > {
> >   "type": "path",
> >   "name": "world-hey",
> >   "expr": "$.world[0].hey"
> > },
> > {
> >   "type": "path",
> >   "name": "worldtree",
> >   "expr": "$.world[1].tree"
> > }
> > ...
> >
> > However, I can't define parseSpec for it as far as I know. I need
> something
> > like working as schemaless or defining a RegEx i.e?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Wed, Jan 9, 2019 at 5:45 PM Dylan Wylie <[email protected]> wrote:
> >
> > > Hey Furkan,
> > >
> > > Druid can index flat arrays (multi-value dimensions) but not arrays of
> > > objects. There is the ability to flatten objects on ingestion using
> > > JSONPath. See http://druid.io/docs/latest/ingestion/flatten-json
> > >
> > > Best regards,
> > > Dylan
> > >
> > > On Wed, 9 Jan 2019 at 14:23, Furkan KAMACI <[email protected]>
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I can index such data with Druid:
> > > >
> > > > {"ts":"2018-01-01T02:35:45Z","appToken":"guid",
> "eventName":"app-open",
> > > > "key1":"value1"}
> > > >
> > > > via this configuration:
> > > >
> > > > "parser" : {
> > > >         "type" : "string",
> > > >         "parseSpec" : {
> > > >           "format" : "json",
> > > >           "timestampSpec" : {
> > > >             "format" : "iso",
> > > >             "column" : "ts"
> > > >           },
> > > >           "dimensionsSpec" : {
> > > >             "dimensions": [
> > > >               "appToken",
> > > >               "eventName",
> > > >               "key1"
> > > >             ]
> > > >           }
> > > >         }
> > > >       }
> > > >
> > > > However, I would also want to index such data:
> > > >
> > > > {
> > > >   "ts":"2018-01-01T03:35:45Z",
> > > >   "appToken":"guid",
> > > >   "eventName":"app-open",
> > > >   "properties":[{"randomKey1":"randomValue1"},
> > > > {"randomKey2":"randomValue2"}]
> > > > }
> > > >
> > > > at which properties is an array and members of that array has some
> > > > arbitrary keys and values.
> > > >
> > > > How can I do that?
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > >
> >
>

Re: Indexing Arbitrary Key/Value Data

Reply via email to