[ 
https://issues.apache.org/jira/browse/CARBONDATA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yadong Qi updated CARBONDATA-1109:
----------------------------------
    Description: 
First, we use Producer-Consumer model in the write step, we have n(default 
value is 2 and can be configured) producers and one consumer. The task of 
generate last page(less than 32000) is added to thread pool at the end, but 
can't be guaranteed to be finished and add to BlockletDataHolder at the end. 
Because we have n tasks running concurrently.
Second, we have 2 ways to invoke `writeDataToFile`, one is the size of 
`DataWriterHolder` reach the size of blocklet and two is the page is the last 
page.
So if the last page is not be consumed at the end, we lost the page which be 
consumed after last page.

  was:
First, we use Producer-Consumer model in the write step, we have n(default 
value is 2 and can be configured) producers and one consumer. The task of 
generate last page(less than 32000) is added to thread pool at the end, but 
can't be guaranteed to be finished and add to BlockletDataHolder at the end. 
Because we have n tasks running concurrently.
Second, we have 2 ways to invoke `writeDataToFile`, one is the size of 
`DataWriterHolder` reach the size of blocklet and two is page is the last page.
So if the last page is not be consumed at the end, we lost the page which be 
consumed after last page.


> Page lost in load process when last page is not be consumed at the end
> ----------------------------------------------------------------------
>
>                 Key: CARBONDATA-1109
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1109
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load
>    Affects Versions: 1.2.0
>            Reporter: Yadong Qi
>
> First, we use Producer-Consumer model in the write step, we have n(default 
> value is 2 and can be configured) producers and one consumer. The task of 
> generate last page(less than 32000) is added to thread pool at the end, but 
> can't be guaranteed to be finished and add to BlockletDataHolder at the end. 
> Because we have n tasks running concurrently.
> Second, we have 2 ways to invoke `writeDataToFile`, one is the size of 
> `DataWriterHolder` reach the size of blocklet and two is the page is the last 
> page.
> So if the last page is not be consumed at the end, we lost the page which be 
> consumed after last page.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to