[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174345#comment-15174345
 ] 

Julian Hyde commented on DRILL-4460:
------------------------------------

An "external" algorithm is one that uses disk to complete if there is not 
enough memory. An "adaptive" algorithm is one that can start using memory and 
switch to external in the same run, without losing data. A "hybrid" algorithm 
is one that puts as much data as possible in memory and puts the rest in 
external, and therefore tends to gracefully degrade as input increases.

I wanted to point out that there are adaptive, external algorithms based on 
sort as well as hash. This paper describes adaptive hybrid hash join but 
adaptive hybrid hash aggregation is similar (and in fact simpler). 
http://www.vldb.org/conf/1990/P186.PDF

To be clear, external hashing is not currently implemented in Drill.

> Provide feature that allows fall back to sort aggregation
> ---------------------------------------------------------
>
>                 Key: DRILL-4460
>                 URL: https://issues.apache.org/jira/browse/DRILL-4460
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.5.0
>            Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to