Last commit id/ts checkpoint for incremental pull

Roshan Nair (Data Platform) Mon, 06 May 2019 23:52:54 -0700

Hi,

We are trying to work out how to use hudi for incremental pulls. In our
scenario, we would like to read from a hudi table incrementally, so that
every subsequent read only reads new data.


In the incremental hiveql example in the quickstart (
http://hudi.incubator.apache.org/quickstart.html#incremental-hiveql), it
appears that I can filter on _hoodie_commit_time to select only those
records that have not been processed yet. Hudi will ensure snapshot
isolation, so no new partial writes are visible to this reader.

The next time I want an incremental set, how do I set the
_hoodie_commit_time in the query?

Is the expectation that the user will identify the max _hoodie_commit_time
in the result of the query and then use this to set the _hoodie_commit_time
filter for the next incremental query?

Roshan

Last commit id/ts checkpoint for incremental pull

Reply via email to