[ 
https://issues.apache.org/jira/browse/IMPALA-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karthik updated IMPALA-10431:
-----------------------------
    Comment: was deleted

(was: [~karthikkpm] test)

> Consider partitioning instead of sorting before inserts
> -------------------------------------------------------
>
>                 Key: IMPALA-10431
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10431
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>
> Currently before partitioned inserts Impala can add a sort node to sort by 
> the partitioning columns - the benefit of this is that the insert will only 
> have to write one file at a time, which reduces memory requirements if the 
> number of partitions is large. The drawback is the extra O(n*log n) CPU cost 
> and the potential spilling.
> As the order we write the partitions in doesn't matter, partitioning the rows 
> in a spilling  hash table would give the same benefits but reduce the 
> O(n*log) n CPU cost to O(n). It may be even possible to avoid spilling in 
> some cases - e.g. if only a single partition is written, and the partitioner 
> passes through the first partition it sees to the file sink.
> The drawback is that unlike sort, we don't have  a partitioner node in Impala 
> yet, but the algorithm is very similar to what's happening in grouping 
> aggregation or hash join builders, so probably this could be done with medium 
> amount of code.
> Note that we can also sort by other columns than partitioning columns during 
> insert to make min-max stat filtering efficient - in this case we should 
> probably stick to the sorting before insert.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to