GitHub user depowell edited a discussion: dbt Cloud Provider - Job Management 
Enhancement Proposal

Hey there, I have been working on an Airflow module to enhance the 
functionality of the dbt cloud provider, I'd like to propose these changes as 
an enhancement to the dbt Cloud provider to support full lifecycle management 
of dbt Cloud jobs directly from Airflow. Currently, the provider only supports 
Triggering, Listing, and Monitoring existing jobs or with Sensors. The proposed 
enhancement would add capabilities to create, update, list, and delete dbt 
Cloud jobs, enabling complete workflow ownership from Airflow.

Does this sound like a good project to enhance the features of the current 
provider?

We are currently using version 2.9.2 of airflow, and I am curious if you allow 
checking out and patching old versions/tags. If this is accepted I can happily 
do this work, it would be my first attempt at contributing here.

# Details: dbt Cloud Provider - Job Management Enhancement Proposal

## Current Limitations

As of Airflow 2.9.2 through to 4.4.0, the dbt Cloud provider offers the 
following functionality through the dbt cloud v2 api:
- Listing jobs 
- Triggering jobs
- Monitoring job execution status with Sensors
- Retrieving job artifacts post run

However, it lacks the ability to:
- Create jobs
- Update job configurations
- Delete jobs that are no longer needed
- Perform comprehensive job management using the above concepts and use Airflow 
as a mastering tool for dbt cloud jobs.

This limitation forces users to manually create and configure dbt Cloud jobs 
through the UI or use separate scripts with the dbt Cloud API, breaking the 
ownership between other tools and Airflow to orchestrate jobs.

## Proposed Solution

Extend the dbt Cloud provider with new operators and hook methods that cover 
the full job management lifecycle:


### Enhanced Hook Functionality
Extend `DbtCloudHook` with new methods to enable this:
- `create_job`: Creates a new job with specified payload
- `update_job`: Updates an existing job with new payload
- `destroy_job`: Deletes a job by ID

### Introduce New Operators
1. `DbtCloudCreateJobOperator`: Create jobs
2. `DbtCloudUpdateJobOperator`: Update jobs
3. `DbtCloudDestroyJobOperator`: Delete jobs
4. `DbtCloudJobManagementOperator`: A higher-level operator that can create, 
update or ensure a job exists with the right configuration

for 4. a logical pattern can be established (i have done something similar in 
my module):
  - Step 1: Lists existing jobs with the specified name and finds their ids
  - Step 2: Creates a job if it doesn't exist (no ids are returned)
  - Step 3: Updates the job if it exists but config/payload has changed
  - Step 4: Destroys duplicate jobs if multiple exist with the same name 
(keeping the lowest job by id)
    
## API Integration Details

The dbt Cloud API v2 provides endpoints for managing jobs:

- `POST /api/v2/accounts/{account_id}/jobs/`: Create job
- `POST /api/v2/accounts/{account_id}/jobs/{id}/`: Update job
- `DELETE /api/v2/accounts/{account_id}/jobs/{id}/`: Delete job

## Testing Strategy

1. **Unit Tests**
   - Mock the dbt Cloud API responses for different scenarios:
     - Test hook methods in isolation
     - Test operators with mocked payload i.e. payload cannot be empty and 
includes required params

2. **Integration Tests**
   - None to begin with

## Backwards Compatibility

All new functionality will be added in a backward-compatible way:
- No changes to existing operator signatures
- Existing functionality will remain unchanged
- New functionality will be available through new operators
- Hook methods will re-use method in calls `_run_and_get_response`


GitHub link: https://github.com/apache/airflow/discussions/50679

----
This is an automatically sent email for commits@airflow.apache.org.
To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org

Reply via email to