Hello,
Here I'm again with another proposal, which shouldn't be that hard to evaluate
and is also related to the work I did regarding the dialects and performance
enhancements in the common sql provider but also the PR regarding the deferred
pagination in the GenericTransfer operator which I'm finishing as we speak and
I also mentioned in the
medium<https://medium.com/apache-airflow/transfering-data-from-sap-hana-to-mssql-using-the-airflow-generictransfer-d29f147a9f1f>
article I wrote about it.
At our company we are using a custom SQLInsertRowsOperator, which allows us to
persist XCom's directly without the need to write a custom Python code, so it's
again like some kind of facilitator on top of the DbApiHook.
Hence why the work with the dialects and other related PR's were so important
to be able to implement it in a correct way, meaning as less as possible logic
within the operator so that all logic can be handled within the hook and both
options can be used the same way.
So my question is if that operator would be accepted? The code would be
minimal, it could be added beside the other SQL operators within the common sql
provider.
It's similar to the GenericTransfer operator, except it doesn't read data from
another database, it uses an XCom as input for the rows to be persisted by the
insert_rows method of the DbAPiHook.
It also offers some handy callbacks parameters to process the rows which has to
be persisted.
As I already explained before, at our company we try to strive to have a less
as possible custom python code within our DAG's, and use as much as possible
existing Airflow operators, which make maintenance of DAG's easier.
This one would allow Airflow users to easily persist XCom's without the need to
write Python code.
Below an example on how it could be used:
persist_records_task = SQLInsertRowsOperator(
task_id="persist_records",
conn_id=conn_id",
schema="schema",
table_name="table_name",
insert_args={
"commit_every": 5000,
"replace": True,
"executemany": True,
"fast_executemany": True,
},
rows=csv_to_records_tasks.output,
)
What do you think about this proposal?
[cid:[email protected]]
David Blain
Data Engineer at ICT-514 - BI End User Reporting