[GitHub] [hudi] codope commented on issue #3724: [SUPPORT] Spark start reading stream from hudi dataset starting from given commit time

GitBox Thu, 30 Sep 2021 08:02:53 -0700


codope commented on issue #3724:
URL: https://github.com/apache/hudi/issues/3724#issuecomment-931404620



   Generally, for incremental queries we need to set following configs:
   ```
   "hoodie.datasource.query.type" : "incremental",
   "hoodie.datasource.read.begin.instanttime" : "commit_time_to_read_from"
   ```
   Did you try using these configs? You can also take a look at 
[TestStructuredStreaming](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala#L101)
 for an example usage.
   
   > What I'm trying to do is to obtain changes that are happening in one hudi 
dataset to then create incremental pipeline in spark and process them further.
   
   For this, I would also suggest to take a look at HoodieIncrSource and setup 
a deltastreamer job using that source. For an example, take a look at 
[TestHoodieDeltaStreamer](https://github.com/apache/hudi/blob/47ed91799943271f219419cf209793a98b3f09b5/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java#L1225).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on issue #3724: [SUPPORT] Spark start reading stream from hudi dataset starting from given commit time

Reply via email to