GitHub user QiangCai opened a pull request:

    https://github.com/apache/carbondata/pull/1440

    [WIP][CARBONDATA-1581][CARBONDATA-1582] Implement StreamSinkProvider and 
stream file writer

    1. Change hadoop.version to 2.7.2 as default
    Require using truncate operation of the filesystem.
    
    2. CarbonSource extend StreamSinkProvider
    Provide stream sink to support streaming ingest
    
    3. Implement CarbonStreamOutputFormat and CarbonStreamRecordWriter
    CarbonStreamRecordWriter write input data to CarbonData stream file.
    
    4. Avoid Small file issue
    Append new blocklet to old file to avoid small file issue
    
    5. Support fault tolerant
    Stream segment has a CarbonIndex file, this index file record the 
information of the CarbonData files.
    We can recover data to last successful commit.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/QiangCai/carbondata streaming

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1440
    
----
commit 6c94c9311ea1b260e75bf576eec75aea17ce8984
Author: QiangCai <qiang...@qq.com>
Date:   2017-10-18T03:13:00Z

    support streaming ingest

----


---

Reply via email to