I've had some luck disabling multi-phase aggregations on some queries where
memory was an issue.

https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/

After I try that, than I typically look at the hash aggregation as you have
done:

https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/

I've had limited success with changing the max_query_memory_per_node and
max_width, sometimes it's a weird combination of things that work in there.

https://drill.apache.org/docs/troubleshooting/#memory-issues

Back to your spill stuff if you disable hash aggregation, do you know if
your spill directories are setup? That may be part of the issue, I am not
sure what the default spill behavior of Drill is for spill directory setup.



On Fri, Mar 11, 2016 at 2:17 PM, François Méthot <fmetho...@gmail.com>
wrote:

> Hi,
>
>    Using version 1.5, DirectMemory is currently set at 32GB, heap is at
> 8GB. We have been trying to perform multiple aggregation in one query (see
> below) on 40 Billions+ rows stored on 13 nodes. We are using parquet
> format.
>
> We keep getting OutOfMemoryException: Failure allocating buffer..
>
> on a query that looks like this:
>
> create table hdfs.`test1234` as
> (
> select string_field1,
>           string_field2,
>           min ( int_field3 ),
>           max ( int_field4 ),
>           count(1),
>           count ( distinct int_field5 ),
>           count ( distinct int_field6 ),
>           count ( distinct string_field7 )
>     from hdfs.`/data/`
>     group by string_field1, string_field2;
> )
>
> The documentation state:
> "Currently, hash-based operations do not spill to disk as needed."
>
> and
>
> "If the hash-based operators run out of memory during execution, the query
> fails. If large hash operations do not fit in memory on your system, you
> can disable these operations. When disabled, Drill creates alternative
> plans that allow spilling to disk."
>
> My understanding is that it will fall back to Streaming aggregation, which
> required sorting..
>
> but
>
> "As of Drill 1.5, ... the sort operator (in queries that ran successfully
> in previous releases) may not have enough memory, resulting in a failed
> query"
>
> And Indeed, disabling hash agg and hash join resulted in memory leak error.
>
> So it looks like increasing direct memory our only option.
>
> Is there a plan to have Hash Aggregation to spill on disk in the next
> release?
>
>
> Thanks for your feedback
>

Reply via email to