GitHub user depowell edited a discussion: dbt Cloud Provider - Job Management
Enhancement Proposal
Hey there, I have been working on an Airflow module to enhance the
functionality of the dbt cloud provider, I'd like to propose these changes as
an enhancement to the dbt Cloud provider to support full lifecycle management
of dbt Cloud jobs directly from Airflow. Currently, the provider only supports
Triggering, Listing, and Monitoring existing jobs or with Sensors. The proposed
enhancement would add capabilities to create, update, and delete dbt Cloud
jobs, enabling complete workflow ownership from Airflow.
Does this sound like a good project to enhance the features of the current
provider?
We are currently using version 2.9.2 of airflow, and I am curious if you allow
checking out and patching old versions/tags. If this is accepted I can happily
do this work, it would be my first attempt at contributing here.
# Details: dbt Cloud Provider - Job Management Enhancement Proposal
## Current Limitations
As of Airflow 2.9.2 through to 4.4.0, the dbt Cloud provider offers the
following functionality through the dbt cloud v2 api:
- Listing jobs
- Triggering jobs
- Monitoring job execution status with Sensors
- Retrieving job artifacts post run
However, it lacks the ability to:
- Create jobs
- Update job configurations
- Delete jobs that are no longer needed
- Perform comprehensive job management using the above concepts and use Airflow
as a mastering tool for dbt cloud jobs.
This limitation forces users to manually create and configure dbt Cloud jobs
through the UI or use separate scripts with the dbt Cloud API, breaking the
ownership between other tools and Airflow to orchestrate jobs.
## Proposed Solution
Extend the dbt Cloud provider with new operators and hook methods that cover
the full job management lifecycle:
### Enhanced Hook Functionality
Extend `DbtCloudHook` with new methods to enable this:
- `create_job`: Creates a new job with specified payload
- `update_job`: Updates an existing job with new payload
- `destroy_job`: Deletes a job by ID
### Introduce New Operators
1. `DbtCloudCreateJobOperator`: Create jobs
2. `DbtCloudUpdateJobOperator`: Update jobs
3. `DbtCloudDestroyJobOperator`: Delete jobs
4. `DbtCloudJobManagementOperator`: A higher-level operator that can create,
update or ensure a job exists with the right configuration
for 4. a logical pattern can be established (i have done something similar in
my module):
- Step 1: Lists existing jobs with the specified name and finds their ids
- Step 2: Creates a job if it doesn't exist (no ids are returned)
- Step 3: Updates the job if it exists but config/payload has changed
- Step 4: Destroys duplicate jobs if multiple exist with the same name
(keeping the lowest job by id)
## API Integration Details
The dbt Cloud API v2 provides endpoints for managing jobs:
- `POST /api/v2/accounts/{account_id}/jobs/`: Create job
- `POST /api/v2/accounts/{account_id}/jobs/{id}/`: Update job
- `DELETE /api/v2/accounts/{account_id}/jobs/{id}/`: Delete job
## Testing Strategy
1. **Unit Tests**
- Mock the dbt Cloud API responses for different scenarios:
- Test hook methods in isolation
- Test operators with mocked payload i.e. payload cannot be empty and
includes required params
2. **Integration Tests**
- None to begin with
## Backwards Compatibility
All new functionality will be added in a backward-compatible way:
- No changes to existing operator signatures
- Existing functionality will remain unchanged
- New functionality will be available through new operators
- Hook methods will re-use method in calls `_run_and_get_response`
GitHub link: https://github.com/apache/airflow/discussions/50679
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]