[ https://issues.apache.org/jira/browse/HUDI-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-1790: --------------------------------- Labels: pull-request-available (was: ) > Add SqlSource for DeltaStreamer to support backfill use cases > ------------------------------------------------------------- > > Key: HUDI-1790 > URL: https://issues.apache.org/jira/browse/HUDI-1790 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer > Reporter: Vinoth Govindarajan > Assignee: Vinoth Govindarajan > Priority: Major > Labels: pull-request-available > > Delta Streamer is great for incremental workloads, but we need to support > backfills for use cases like adding a new column and backfill only that > column for the last 6 months, and if there was a bug in our transformation > logic and we need to reprocess a couple of older partitions. > > If we have a SqlSource as one of the input source to the delta streamer, then > I can pass any custom Spark SQL queries selecting specific partitions and > backfill. > > When we do the backfill, we don't need to update the last processed commit > checkpoint, this has to copy the last processed checkpoint before the > backfill and copy that over to the backfill commit. > > cc [~nishith29] -- This message was sent by Atlassian Jira (v8.3.4#803005)