codope commented on issue #3724: URL: https://github.com/apache/hudi/issues/3724#issuecomment-931404620
Generally, for incremental queries we need to set following configs: ``` "hoodie.datasource.query.type" : "incremental", "hoodie.datasource.read.begin.instanttime" : "commit_time_to_read_from" ``` Did you try using these configs? You can also take a look at [TestStructuredStreaming](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala#L101) for an example usage. > What I'm trying to do is to obtain changes that are happening in one hudi dataset to then create incremental pipeline in spark and process them further. For this, I would also suggest to take a look at HoodieIncrSource and setup a deltastreamer job using that source. For an example, take a look at [TestHoodieDeltaStreamer](https://github.com/apache/hudi/blob/47ed91799943271f219419cf209793a98b3f09b5/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java#L1225). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org