kacpermuda opened a new pull request, #61535:
URL: https://github.com/apache/airflow/pull/61535

   <!--
   Thank you for contributing!
   
   Please provide above a brief description of the changes made in this pull 
request.
   Write a good git commit message following this guide: 
http://chris.beams.io/posts/git-commit/
   
   Please make sure that your code changes are covered with tests.
   And in case of new features or big changes remember to adjust the 
documentation.
   
   Feel free to ping (in general) for the review if you do not see reaction for 
a few days
   (72 Hours is the minimum reaction time you can expect from volunteers) - we 
sometimes miss notifications.
   
   In case of an existing issue, reference it using one of the following:
   
   * closes: #ISSUE
   * related: #ISSUE
   -->
   
   ## TLDR:
   Add hook-level lineage (HLL) reporting to SQL hooks via send_sql_hook_lineage
   This PR introduces a standardized mechanism for SQL hooks to report 
execution metadata - SQL text, query parameters, job IDs, row counts, default 
database/schema - to the hook lineage collector using add_extra.
   
   I also bumped the required sql-common version for all modified providers, so 
that the HLL is being emitted.
   
   I've also added tests for most Hooks that use DbApiHook as base class, to 
make sure that even when some methods will be overwritten in the future, the 
Hook Level Lineage will still be sent (so for now we are mostly testing 
DbApiHook implementation multiple times, but if some db decides to overwrite 
`run()`, I need my test to fail so that new implementation also calls HLL 
collector).
   
   ### Important context
   The HLL collector is a no-op unless a collector is registered (e.g. by the 
OpenLineage provider). This means no runtime overhead for users who don't use 
lineage collection.
   
   ### Motivation
   Black-box operators (e.g. PythonOperator calling PostgresHook.run(sql)) 
currently produce no lineage. With this change, any registered collector can 
capture the SQL being executed, parse it for input/output datasets, and attach 
query IDs to lineage events - dramatically improving lineage quality without 
requiring operator-level changes.
   
   ### Follow-up PRs
   
   - OpenLineage consumer: modify the OL provider to consume these extras, 
parse SQL for datasets, and attach query_id to OL events
   - BigQueryHook insert_job: mix of sql and non-sql lineage, will do in a 
separate PR.
   - Additional non-SQL hooks: extend HLL to more hooks beyond SQL
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
please
   change below checkbox to `[X]` followed by the name of the tool, uncomment 
the "Generated-by".
   -->
   
   - [X] Yes (please specify the tool below)
   
   Co-authored by: Cursor following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   
   ---
   
   * Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information. Note: commit author/co-author name and email in commits 
become permanently public when merged.
   * For fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   * When adding dependency, check compliance with the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   * For significant user-facing changes create newsfragment: 
`{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to