Any inputs on this one? On Wed, May 25, 2016 at 7:51 PM, rahul challapalli < challapallira...@gmail.com> wrote:
> Its using hash aggregation. > On May 25, 2016 7:48 PM, "Zelaine Fong" <zf...@maprtech.com> wrote: > >> What does the explain plan show? I.e., is the group by being done via a >> hash agg or a streaming agg? If it's a streaming agg, then you still have >> to sort the entire data set before you reduce it down to a single group. >> That would explain the increase in memory as you add group by keys. >> >> -- Zelaine >> >> On Wed, May 25, 2016 at 5:50 PM, rahul challapalli < >> challapallira...@gmail.com> wrote: >> >> > I am trying to understand the memory usage patterns for hash aggregate. >> The >> > below query completes in 9.163 seconds and uses 24 MB of memory for >> > hash-aggregate (according to profile) >> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, 'kfjhl' >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5 from mem_heavy1) d group by d.c1, >> > d.c2, d.c3, d.c4, d.c5; >> > >> > Adding one more constant column to the group by, the below query takes >> > 11.638 seconds and uses 29 MB of ram >> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, 'kfjhl' >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6 from mem_heavy1) d >> group >> > by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6; >> > >> > The below query with one more constant column added to group by 14.622 >> > seconds and uses 33 MB memory >> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, 'kfjhl' >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6, 'ciuh' c7 from >> > mem_heavy1) d group by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6, d.c7; >> > >> > >> > As you can see, there is only one disctinct group in all the above >> cases. >> > It looks like the memory usage is proportional to no of elements in the >> > group by clause. Is this expected? >> > >> > Is the increase in time expected between the above queries? (As we did >> not >> > introduce any new groups) >> > >> > - Rahul >> > >> >