[ 
https://issues.apache.org/jira/browse/CARBONDATA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-470.
---------------------------------
       Resolution: Fixed
         Assignee: Ravindra Pesala
    Fix Version/s: 1.0.0-incubating

> Add unsafe offheap and on-heap sort in carbodata loading
> --------------------------------------------------------
>
>                 Key: CARBONDATA-470
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-470
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Ravindra Pesala
>            Assignee: Ravindra Pesala
>             Fix For: 1.0.0-incubating
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In the current carbondata system loading performance is not so encouraging 
> since we need to sort the data at executor level for data loading. Carbondata 
> collects batch of data and sorts before dumping to the temporary files and 
> finally it does merge sort from those temporary files to finish sorting. Here 
> we face two major issues , one is disk IO and second is GC issue. Even though 
> we dump to the file still carbondata face lot of GC issue since we sort batch 
> data in-memory before dumping to the temporary files.
> To solve the above problems we can introduce Unsafe Storage and Unsafe sort.
> Unsafe Storage : User can configure the memory limit to keep the amount of 
> data to in-memory. Here we can keep all the data in continuous memory 
> location either on off-heap or on-heap using Unsafe. Once configure limit 
> exceeds remaining data will be spilled to disk.
> Unsafe Sort : The data which is store in-memory using Unsafe can be sorted 
> using Unsafe sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to