BentsiLeviav opened a new pull request, #67080:
URL: https://github.com/apache/airflow/pull/67080
### Description
Adds a new apache-airflow-providers-clickhouse provider that integrates
Airflow with ClickHouse via the HTTP interface using the `clickhouse-connect`
library.
### Scope of this implementation
- `ClickHouseHook` - the core integration, extending `DbApiHook` so all
standard `SQLExecuteQueryOperator` features work out of the box (templating,
handler, split_statements, etc.)
- Connection form UI with dedicated fields for TLS, timeouts, compression,
session settings, and client kwargs
- `bulk_insert_rows()` for more performant inserts using
clickhouse-connect's native insert path
- `get_uri()` for SQLAlchemy-compatible connection strings
(`clickhousedb://` / `clickhousedbs://`)
- Connection type docs, operator how-to guide, and integration logo
- 95 unit tests
### Implementation decisions
- `DB-API 2.0` adapter (`ClickHouseConnection`): clickhouse-connect
doesn't expose a DB-API connection natively - we wrap its Client in a thin
adapter so `DbApiHook.run()` works unmodified. `commit()`
and `rollback()` are intentional no-ops since ClickHouse has no
transactions.
- Two-level settings merge: both `session_settings` and `client_kwargs`
can be set at the connection level (via the extra JSON field) and overridden at
the task level (via hook constructor arguments), with the constructor taking
precedence on conflicts.
- Hook-managed kwargs protection: keys that the hook owns (host, port,
username, password, database, secure, verify, client_name, settings) are
stripped from any user-supplied client_kwargs so hook-managed values always win.
- Client name: every query is tagged with `apache-airflow/<version>
apache-airflow-providers-clickhouse/<version>` in the HTTP User-Agent
(system.query_log), making queries traceable back to their Airflow source.
Users can append a custom label via the client_name extra field.
- No dedicated operators are added - `SQLExecuteQueryOperator` from
`common.sql` covers all standard SQL use cases.
## File structure (generated with Claude)
| File(s) | Purpose |
|---|---|
| `provider.yaml` | Provider metadata: name, version, integrations,
connection types, UI field behaviour, and `conn-fields` schema used to generate
the connection form |
| `pyproject.toml` | Package build config and dependencies
(`clickhouse-connect >=0.7.0`, `common-sql >=1.32.0`) — auto-generated from the
Breeze template |
| `src/.../hooks/clickhouse.py` | Core implementation: `ClickHouseHook`
(extends `DbApiHook`) and `ClickHouseConnection` (thin DB-API 2.0 adapter
wrapping the `clickhouse-connect` client) |
| `src/.../get_provider_info.py` | Auto-generated from `provider.yaml` by
the Breeze release tooling — do not edit manually |
| `src/airflow/__init__.py`, `src/airflow/providers/__init__.py` |
Namespace package declarations required for the `airflow.providers` implicit
namespace |
| `src/.../clickhouse/__init__.py` | Version file (`__version__ =
"1.0.0"`) with minimum Airflow version guard — auto-generated |
| `docs/connections/clickhouse.rst` | Connection configuration reference:
all fields, their types, defaults, and JSON/URI examples |
| `docs/operators/clickhouse.rst` | How-to guide: using
`SQLExecuteQueryOperator` and `ClickHouseHook` directly, including
`session_settings` and `bulk_insert_rows` examples |
| `docs/index.rst`, `docs/conf.py`, `docs/changelog.rst`,
`docs/security.rst` | Standard provider docs scaffold — mostly auto-generated |
| `docs/integration-logos/ClickHouse.png` | Official ClickHouse logo used
by the Apache Airflow website |
| `tests/unit/clickhouse/hooks/test_clickhouse.py` | 95 unit tests
covering connection building, settings/kwargs merge logic, database override,
URI generation, bulk insert, UI widgets, and
autocommit semantics |
| `tests/system/clickhouse/example_clickhouse.py` | System test / example
DAG: create table → bulk insert → read rows → drop table |
| `.github/boring-cyborg.yml` | Adds `provider:clickhouse` label rule for
automatic PR labelling |
| `scripts/ci/docker-compose/remove-sources.yml`, `tests-sources.yml` |
Auto-updated by prek to mount the clickhouse provider sources/tests into the CI
Docker environment |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]