akocukcu commented on issue #10992:
URL: https://github.com/apache/airflow/issues/10992#issuecomment-721562330


   @shizidushu for your case again I kindly suggest using "Control Table" 
pattern with a Staging table. 
   You can insert data directly (maybe bulk load) to **empty** Staging table. 
   After that since staging table and datawarehouse table are in the same 
server, upsert operation will complete at lightning speed. 
   
   BTW, I think in the same sense there are `preoperator` parameters inside a 
lot of transfer operators. 
   For example if you look at the definition of `mysql_preoperator` parameter 
of 
[PrestoToMySqlOperator](https://github.com/apache/airflow/blob/fdd9b6f65b608c516b8a062b058972d9a45ec9e3/airflow/providers/mysql/transfers/presto_to_mysql.py),
 it says: 
   
   > ["... typically use to truncate of delete in place of the data coming in 
...".](https://github.com/apache/airflow/blob/fdd9b6f65b608c516b8a062b058972d9a45ec9e3/airflow/providers/mysql/transfers/presto_to_mysql.py#L42)
   
   
   So basically, first you can truncate the staging table, second insert the 
(only new or updated) oltp data to staging table, finally you can upsert 
staging data to datawarehouse table.
   
   Just like in the blog post, you can test that approach in your environment 
and data. 
   
   Independent from your case I can add upsert functionality by adding `MERGE` 
to `MsSqlHook`. But I have concerns, if folks use it for upserting large data.
   Maybe I can add a comment just like they did in `GenericTransfer`:
   
   > ["... This is meant to be used on small-ish datasets 
..."](https://github.com/apache/airflow/blob/4e8f9cc8d02b29c325b8a5a76b4837671bdf5f68/airflow/operators/generic_transfer.py#L32)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to