GitHub user QiangCai opened a pull request: https://github.com/apache/carbondata/pull/1440
[WIP][CARBONDATA-1581][CARBONDATA-1582] Implement StreamSinkProvider and stream file writer 1. Change hadoop.version to 2.7.2 as default Require using truncate operation of the filesystem. 2. CarbonSource extend StreamSinkProvider Provide stream sink to support streaming ingest 3. Implement CarbonStreamOutputFormat and CarbonStreamRecordWriter CarbonStreamRecordWriter write input data to CarbonData stream file. 4. Avoid Small file issue Append new blocklet to old file to avoid small file issue 5. Support fault tolerant Stream segment has a CarbonIndex file, this index file record the information of the CarbonData files. We can recover data to last successful commit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/carbondata streaming Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1440.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1440 ---- commit 6c94c9311ea1b260e75bf576eec75aea17ce8984 Author: QiangCai <qiang...@qq.com> Date: 2017-10-18T03:13:00Z support streaming ingest ---- ---