[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

GitBox Thu, 07 May 2020 18:48:08 -0700


niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890531




##########
File path: docs/flink-integration-guide.md
##########
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description

Review comment:
       OK

##########
File path: docs/flink-integration-guide.md
##########
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

Reply via email to