Vinoth Govindarajan created HUDI-1790:
-----------------------------------------

             Summary: Add SqlSource for DeltaStreamer to support backfill use 
cases
                 Key: HUDI-1790
                 URL: https://issues.apache.org/jira/browse/HUDI-1790
             Project: Apache Hudi
          Issue Type: New Feature
          Components: DeltaStreamer
            Reporter: Vinoth Govindarajan
            Assignee: Vinoth Govindarajan


Delta Streamer is great for incremental workloads, but we need to support 
backfills for use cases like adding a new column and backfill only that column 
for the last 6 months, and if there was a bug in our transformation logic and 
we need to reprocess a couple of older partitions.

 

If we have a SqlSource as one of the input source to the delta streamer, then I 
can pass any custom Spark SQL queries selecting specific partitions and 
backfill.

 

When we do the backfill, we don't need to update the last processed commit 
checkpoint, this has to copy the last processed checkpoint before the backfill 
and copy that over to the backfill commit.

 

cc [~nishith29]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to