This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new bca261d  Fix DeltaStreamer args and layout in docker demo page
bca261d is described below

commit bca261d27930577bfb2ec74dc6a09a5d21d1fde6
Author: Balaji Varadarajan <[email protected]>
AuthorDate: Wed Apr 17 17:28:06 2019 -0700

    Fix DeltaStreamer args and layout in docker demo page
---
 docs/docker_demo.md | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/docs/docker_demo.md b/docs/docker_demo.md
index 23a5a4f..c8fc1b3 100644
--- a/docs/docker_demo.md
+++ b/docs/docker_demo.md
@@ -168,8 +168,11 @@ spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
 ....
 2018-09-24 22:20:00 INFO  
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - 
OutputCommitCoordinator stopped!
 2018-09-24 22:20:00 INFO  SparkContext:54 - Successfully stopped SparkContext
+
+
+
 # Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_mor dataset in HDFS
-spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_mor --target-table 
stock_ticks_mor --props /var/demo/config/kafka-source.properties
+spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_mor --target-table 
stock_ticks_mor --props /var/demo/config/kafka-source.properties 
--schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
 ....
 2018-09-24 22:22:01 INFO  
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - 
OutputCommitCoordinator stopped!
 2018-09-24 22:22:01 INFO  SparkContext:54 - Successfully stopped SparkContext
@@ -437,13 +440,15 @@ cat docker/demo/data/batch_2.json | kafkacat -b 
kafkabroker -t stock_ticks -P
 docker exec -it adhoc-2 /bin/bash
 
 # Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_cow dataset in HDFS
-spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
stock_ticks_cow --props /var/demo/config/kafka-source.properties
+spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
stock_ticks_cow --props /var/demo/config/kafka-source.properties 
--schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
+
 
 # Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_mor dataset in HDFS
-spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_mor --target-table 
stock_ticks_mor --props /var/demo/config/kafka-source.properties
+spark-submit --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class 
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_mor --target-table 
stock_ticks_mor --props /var/demo/config/kafka-source.properties 
--schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
 
 exit
 ```
+
 With Copy-On-Write table, the second ingestion by DeltaStreamer resulted in a 
new version of Parquet file getting created.
 See 
`http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_cow/2018/08/31`
 
@@ -600,6 +605,7 @@ exit
 With 2 batches of data ingested, lets showcase the support for incremental 
queries in Hudi Copy-On-Write datasets
 
 Lets take the same projection query example
+
 ```
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf 
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf 
hive.stats.autogather=false
@@ -611,7 +617,6 @@ beeline -u jdbc:hive2://hiveserver:10000 --hiveconf 
hive.input.format=org.apache
 | 20180924064621       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     
| 1230.02   |
 | 20180924065039       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  
| 1227.215  |
 
+----------------------+---------+----------------------+---------+------------+-----------+--+
-
 ```
 
 As you notice from the above queries, there are 2 commits - 20180924064621 and 
20180924065039 in timeline order.
@@ -622,7 +627,7 @@ To show the effects of incremental-query, let us assume 
that a reader has alread
 ingesting first batch. Now, for the reader to see effect of the second batch, 
he/she has to keep the start timestamp to
 the commit time of the first batch (20180924064621) and run incremental query
 
-`Hudi incremental mode` provides efficient scanning for incremental queries by 
filtering out files that do not have any
+Hudi incremental mode provides efficient scanning for incremental queries by 
filtering out files that do not have any
 candidate rows using hudi-managed metadata.
 
 ```

Reply via email to