[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

xuchuanyin Wed, 24 Oct 2018 01:01:44 -0700

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2850
  
    @NamanRastogi I think we can further optimize this function.
    
    1. We can enable the parallel reading and set the parallelism while 
creating a CarbonReader;
    2. Inside CarbonReader, we handle the concurrent processing;
    3. The interfaces for CarbonReader should be kept the same as before, there 
is no need to add more interfaces. By calling hasNext or next, user can get the 
next record and will not care about which RecordReader does this record belong 
to.
    
    The user interface looks like below:
    ```
    CarbonReader reader = CarbonReader.builder(dataDir).parallelism(3).build();
    while (reader.hasNext()) {
      reader.next();
    }
    reader.close();
    ```
    To keep it simple, by default the parallelism can be 1 which means we will 
process each RecordReader one by one. Setting this parallelism to a higher 
value means that we will go process the RecordReaders in a thread pool with 
size 3.

---

[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

Reply via email to