[
https://issues.apache.org/jira/browse/DRILL-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacques Nadeau updated DRILL-1716:
----------------------------------
Fix Version/s: 0.9.0
> Nested Data : There should be an easy way to apply aggregate functions on
> repeated types
> ----------------------------------------------------------------------------------------
>
> Key: DRILL-1716
> URL: https://issues.apache.org/jira/browse/DRILL-1716
> Project: Apache Drill
> Issue Type: New Feature
> Components: Functions - Drill, Storage - JSON
> Reporter: Rahul Challapalli
> Fix For: 0.9.0
>
>
> {code}
> {
> "company_id": 1,
> "evnts": [
> {
> "evnt_id": 999,
> "evnt_duration": 60
> },
> {
> "evnt_id": 998,
> "evnt_duration": 30
> },
> {
> "evnt_id": 997,
> "evnt_duration": 45
> }
> ]
> }
> {code}
> For the above dataset, if I want to find the longest duration for each
> company id, below is how I would do it now
> {code}
> select sub.company_id , max(sub.evnt.evnt_duration) max_duration
> from (
> select company_id, flatten(evnts) evnt from `nested.json`
> ) sub
> group by sub.company_id;
> +------------+--------------+
> | company_id | max_duration |
> +------------+--------------+
> | 1 | 60 |
> +------------+--------------+
> {code}
> Now if I want the evnt_id associated with the longest duration then we need
> one more join
> {code}
> select a.company_id, b.evnt.evnt_id
> from (
> select sub.company_id company_id, max(sub.evnt.evnt_duration)
> max_duration
> from (
> select company_id, flatten(evnts) evnt from `nested.json`
> ) sub
> group by sub.company_id
> ) a
> join
> (
> select flatten(evnts) evnt from `nested.json`
> ) b
> on a.max_duration = b.evnt.evnt_duration;
> {code}
> The above query currently fails (DRILL-1649). But that is how we have to do
> it currently.
> It would be much simpler if I can do something like the below
> {code}
> select company_id, nested_agg('evnts', 'max', 'evnt_duration','evnt_id') as
> evnt_id from `nested.json`;
> {code}
> Apart from making the query much simpler to write this might enhance drill's
> performance as well.
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)