I am creating a number of small ETL tasks to pull data into a Redshift instance. For every task, I want a one-time bootstrapping task to create the target table (and maybe even the schema) and do the initial load.
Preferably, I would like the DAG to perform this initial load, instead doing it manually. This would be particularly useful when testing the DAG in different environments (e.g., local vs production). I see two ways of doing this: 1. Have a bootstrap operator in the DAG that `create if not exists` the table, and rely on the DAG start date to backfill the data in the initial load. E.g. `bootstrap` -> `fetch` -> `load`. 2. Implement an operator that does the initial load if the table does not exist, or a incremental load if it does. The start date for the initial upload is a parameter for the operator. Similar to what is described in this post <https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-ii-47c4e7cbda71#a102> in "parameterize workflow". How do you typically handle this situation? What are the best practices for an initial load? Best, Rafael