Hey Samarth,

> I think it would be a good contribution to add a select only certain
fields
> /projection feature for native queries. Not every team, for example at my
> work, have adopted to use the Druid SQL. They just have been so used to
> writing json queries ;). Besides a lot of the use cases have multi valued
> dimensions which SQL standard doesn't support in general.

The SQL standard doesn't have anything really like our mutli-valued
dimensions, but, that doesn't stop us from trying to make them work in SQL
anyway. Clint has been doing a bunch of work here recently. Check out some
of these related PRs:

- https://github.com/apache/incubator-druid/pull/7588
- https://github.com/apache/incubator-druid/pull/7973
- https://github.com/apache/incubator-druid/pull/7974

> On the note of SQL support, do you have know of any examples in Druid SQL
> where a sql aggregation function returns an array of doubles? I looked at
> DoubleSketchSqlAggregator but it seems to be returning a single double
> value.

I don't have an example, and I'm not sure if we've quite made it to arrays
of doubles yet, but Clint may be able to chime in with something
intelligent there.

On Thu, Jun 27, 2019 at 1:44 PM Samarth Jain <samarth.j...@gmail.com> wrote:

> Thanks for the reply, Gian. I am working on adding SQL support for the
> t-digest module.
>
> I think it would be a good contribution to add a select only certain fields
> /projection feature for native queries. Not every team, for example at my
> work, have adopted to use the Druid SQL. They just have been so used to
> writing json queries ;). Besides a lot of the use cases have multi valued
> dimensions which SQL standard doesn't support in general.
>
> On the note of SQL support, do you have know of any examples in Druid SQL
> where a sql aggregation function returns an array of doubles? I looked at
> DoubleSketchSqlAggregator but it seems to be returning a single double
> value.
>
>
> On Wed, Jun 26, 2019 at 10:26 PM Gian Merlino <g...@apache.org> wrote:
>
> > Hey Samarth,
> >
> > This kind of thing doable in Druid SQL, which will only return the stuff
> > you SELECT. Native queries don't have a concept like that, so they always
> > return everything, even if you intended certain things to be 'internal'
> > computations and aren't interested in seeing the results directly. If it
> > makes sense for you to use SQL I would suggest going that route.
> Otherwise
> > it might be interesting to add a native query feature to select only
> > certain fields.
> >
> > On Wed, Jun 26, 2019 at 3:30 PM Samarth Jain <samarth.j...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I recently contributed TDigest based sketch aggregators in Druid. It
> also
> > > included a post aggregator that lets you generate quantiles from the
> > > aggregated sketches.
> > >
> > > Example query:
> > >
> > > {
> > >         "queryType": "groupBy",
> > >         "dataSource": "test_datasource",
> > >         "granularity": "ALL",
> > >         "dimensions": [],
> > >         "aggregations": [{
> > >                 "type": "mergeTDigestSketch",
> > >                 "name": "merged_sketch",
> > >                 "fieldName": "ingested_sketch",
> > >                 "compression": 200
> > >         }],
> > >         "postAggregations": [{
> > >                 "type": "quantilesFromTDigestSketch",
> > >                 "name": "quantiles",
> > >                 "fractions": [0, 0.5, 1],
> > >                 "field": {
> > >                         "type": "fieldAccess",
> > >                         "fieldName": "merged_sketch"
> > >                 }
> > >         }],
> > >         "intervals":
> > ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
> > > }
> > >
> > > The one limitation I have been running into is that the above query
> > returns
> > > both merged_sketch that was aggregated and the quantiles array that was
> > > generated from applying post aggregation on merged_sketch. What I would
> > > rather want in this case is for the query to just return the quantiles
> > > array.
> > >
> > > So instead of
> > >
> > > "version": "v1",
> > >         "timestamp": "2019-06-25T00:00:00.000Z",
> > >         "event": {
> > >              "quantiles": [
> > >                 0,
> > >                 162569.21411280808,
> > >                 5814934
> > >             ],
> > >             "merged_sketch": "AAAABBAXAS"
> > >           }
> > >
> > > I would prefer this:
> > > "version": "v1",
> > >         "timestamp": "2019-06-25T00:00:00.000Z",
> > >         "event": {
> > >              "quantiles": [
> > >                 0,
> > >                 162569.21411280808,
> > >                 5814934
> > >             ]
> > >           }
> > >
> > > Is there a way to achieve this today? I tried changing post aggregation
> > > field access from
> > >
> > > "field": {
> > >                         "type": "fieldAccess",
> > >                         "fieldName": "merged_sketch"
> > >                 }
> > >
> > > to
> > >
> > > "field": {
> > >                         "type": "finalizingFieldAccess",
> > >                         "fieldName": "merged_sketch"
> > >                 }
> > >
> > > but that didn't help either.
> > >
> > > Thanks,
> > > Samarth
> > >
> >
>

Reply via email to