Thanks for the reply, Gian. I am working on adding SQL support for the
t-digest module.

I think it would be a good contribution to add a select only certain fields
/projection feature for native queries. Not every team, for example at my
work, have adopted to use the Druid SQL. They just have been so used to
writing json queries ;). Besides a lot of the use cases have multi valued
dimensions which SQL standard doesn't support in general.

On the note of SQL support, do you have know of any examples in Druid SQL
where a sql aggregation function returns an array of doubles? I looked at
DoubleSketchSqlAggregator but it seems to be returning a single double
value.


On Wed, Jun 26, 2019 at 10:26 PM Gian Merlino <[email protected]> wrote:

> Hey Samarth,
>
> This kind of thing doable in Druid SQL, which will only return the stuff
> you SELECT. Native queries don't have a concept like that, so they always
> return everything, even if you intended certain things to be 'internal'
> computations and aren't interested in seeing the results directly. If it
> makes sense for you to use SQL I would suggest going that route. Otherwise
> it might be interesting to add a native query feature to select only
> certain fields.
>
> On Wed, Jun 26, 2019 at 3:30 PM Samarth Jain <[email protected]>
> wrote:
>
> > Hi,
> >
> > I recently contributed TDigest based sketch aggregators in Druid. It also
> > included a post aggregator that lets you generate quantiles from the
> > aggregated sketches.
> >
> > Example query:
> >
> > {
> >         "queryType": "groupBy",
> >         "dataSource": "test_datasource",
> >         "granularity": "ALL",
> >         "dimensions": [],
> >         "aggregations": [{
> >                 "type": "mergeTDigestSketch",
> >                 "name": "merged_sketch",
> >                 "fieldName": "ingested_sketch",
> >                 "compression": 200
> >         }],
> >         "postAggregations": [{
> >                 "type": "quantilesFromTDigestSketch",
> >                 "name": "quantiles",
> >                 "fractions": [0, 0.5, 1],
> >                 "field": {
> >                         "type": "fieldAccess",
> >                         "fieldName": "merged_sketch"
> >                 }
> >         }],
> >         "intervals":
> ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
> > }
> >
> > The one limitation I have been running into is that the above query
> returns
> > both merged_sketch that was aggregated and the quantiles array that was
> > generated from applying post aggregation on merged_sketch. What I would
> > rather want in this case is for the query to just return the quantiles
> > array.
> >
> > So instead of
> >
> > "version": "v1",
> >         "timestamp": "2019-06-25T00:00:00.000Z",
> >         "event": {
> >              "quantiles": [
> >                 0,
> >                 162569.21411280808,
> >                 5814934
> >             ],
> >             "merged_sketch": "AAAABBAXAS"
> >           }
> >
> > I would prefer this:
> > "version": "v1",
> >         "timestamp": "2019-06-25T00:00:00.000Z",
> >         "event": {
> >              "quantiles": [
> >                 0,
> >                 162569.21411280808,
> >                 5814934
> >             ]
> >           }
> >
> > Is there a way to achieve this today? I tried changing post aggregation
> > field access from
> >
> > "field": {
> >                         "type": "fieldAccess",
> >                         "fieldName": "merged_sketch"
> >                 }
> >
> > to
> >
> > "field": {
> >                         "type": "finalizingFieldAccess",
> >                         "fieldName": "merged_sketch"
> >                 }
> >
> > but that didn't help either.
> >
> > Thanks,
> > Samarth
> >
>

Reply via email to