[ 
https://issues.apache.org/jira/browse/CARBONDATA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2238.
------------------------------------
    Resolution: Fixed

> Optimization in unsafe sort during data loading
> -----------------------------------------------
>
>                 Key: CARBONDATA-2238
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2238
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Major
>          Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Inspired by batch_sort, if we have enough memory, in local_sort with unsafe 
> property, we can hold all the row pages in memory if possible and only spill 
> the pages to disk as sort temp file if the memory is unavailable.
> Before spilling the pages, we can do in-memory merge sort of the pages.
> Each time we request an unsafe row page, if the memory is unavailable, we can 
> trigger a merge sort for the in-memory pages and spill the result to disk as 
> a sort temp file. So the incoming pages will be held into the memory instead 
> of spilling to disk directly.
> After this implementation, the data size during each spilling will be bigger 
> than that of before and will benefit the disk IO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to