[ https://issues.apache.org/jira/browse/CARBONDATA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravindra Pesala reassigned CARBONDATA-742: ------------------------------------------ Assignee: Ravindra Pesala > Add batch sort to improve the loading performance > ------------------------------------------------- > > Key: CARBONDATA-742 > URL: https://issues.apache.org/jira/browse/CARBONDATA-742 > Project: CarbonData > Issue Type: Improvement > Reporter: Ravindra Pesala > Assignee: Ravindra Pesala > > Current Problem: > Sort step is major issue as it is blocking step. It needs to receive all data > and write down the sort temp files to disk, after that only data writer step > can start. > Solution: > Make sort step as non blocking step so it avoids waiting of Data writer step. > Process the data in sort step in batches with size of in-memory capability of > the machine. For suppose if machine can allocate 4 GB to process data > in-memory, then Sort step can sorts the data with batch size of 2GB and gives > it to the data writer step. By the time data writer step consumes the data, > sort step receives and sorts the data. So here all steps are continuously > working and absolutely there is no disk IO in sort step. > So there would not be any waiting of data writer step for sort step, As and > when sort step sorts the data in memory data writer can start writing it. > It can significantly improves the performance. > Advantages: > Increases the loading performance as there is no intermediate IO and no > blocking of Sort step. > There is no extra effort for compaction, the current flow can handle it. > Disadvantages: > Number of driver side btrees will increase. So the memory might increase but > it could be controlled by current LRU cache implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)