[ https://issues.apache.org/jira/browse/CARBONDATA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-470. --------------------------------- Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Add unsafe offheap and on-heap sort in carbodata loading > -------------------------------------------------------- > > Key: CARBONDATA-470 > URL: https://issues.apache.org/jira/browse/CARBONDATA-470 > Project: CarbonData > Issue Type: Improvement > Reporter: Ravindra Pesala > Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In the current carbondata system loading performance is not so encouraging > since we need to sort the data at executor level for data loading. Carbondata > collects batch of data and sorts before dumping to the temporary files and > finally it does merge sort from those temporary files to finish sorting. Here > we face two major issues , one is disk IO and second is GC issue. Even though > we dump to the file still carbondata face lot of GC issue since we sort batch > data in-memory before dumping to the temporary files. > To solve the above problems we can introduce Unsafe Storage and Unsafe sort. > Unsafe Storage : User can configure the memory limit to keep the amount of > data to in-memory. Here we can keep all the data in continuous memory > location either on off-heap or on-heap using Unsafe. Once configure limit > exceeds remaining data will be spilled to disk. > Unsafe Sort : The data which is store in-memory using Unsafe can be sorted > using Unsafe sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)