GitHub user depowell edited a discussion: dbt Cloud Provider - Job Management Enhancement Proposal
Hey there, I have been working on an Airflow module to enhance the functionality of the dbt cloud provider, I'd like to propose these changes as an enhancement to the dbt Cloud provider to support full lifecycle management of dbt Cloud jobs directly from Airflow. Currently, the provider only supports Triggering, Listing, and Monitoring existing jobs or with Sensors. The proposed enhancement would add capabilities to create, update, and delete dbt Cloud jobs, enabling complete workflow ownership from Airflow. Does this sound like a good project to enhance the features of the current provider? We are currently using version 2.9.2 of airflow, and I am curious if you allow checking out and patching old versions/tags. If this is accepted I can happily do this work, it would be my first attempt at contributing here. # Details: dbt Cloud Provider - Job Management Enhancement Proposal ## Current Limitations As of Airflow 2.9.2 through to 4.4.0, the dbt Cloud provider offers the following functionality through the dbt cloud v2 api: - Listing jobs - Triggering jobs - Monitoring job execution status with Sensors - Retrieving job artifacts post run However, it lacks the ability to: - Create jobs - Update job configurations - Delete jobs that are no longer needed - Perform comprehensive job management using the above concepts and use Airflow as a mastering tool for dbt cloud jobs. This limitation forces users to manually create and configure dbt Cloud jobs through the UI or use separate scripts with the dbt Cloud API, breaking the ownership between other tools and Airflow to orchestrate jobs. ## Proposed Solution Extend the dbt Cloud provider with new operators and hook methods that cover the full job management lifecycle: ### Enhanced Hook Functionality Extend `DbtCloudHook` with new methods to enable this: - `create_job`: Creates a new job with specified payload - `update_job`: Updates an existing job with new payload - `destroy_job`: Deletes a job by ID ### Introduce New Operators 1. `DbtCloudCreateJobOperator`: Create jobs 2. `DbtCloudUpdateJobOperator`: Update jobs 3. `DbtCloudDestroyJobOperator`: Delete jobs 4. `DbtCloudJobManagementOperator`: A higher-level operator that can create, update or ensure a job exists with the right configuration for 4. a logical pattern can be established (i have done something similar in my module): - Step 1: Lists existing jobs with the specified name and finds their ids - Step 2: Creates a job if it doesn't exist (no ids are returned) - Step 3: Updates the job if it exists but config/payload has changed - Step 4: Destroys duplicate jobs if multiple exist with the same name (keeping the lowest job by id) ## API Integration Details The dbt Cloud API v2 provides endpoints for managing jobs: - `POST /api/v2/accounts/{account_id}/jobs/`: Create job - `POST /api/v2/accounts/{account_id}/jobs/{id}/`: Update job - `DELETE /api/v2/accounts/{account_id}/jobs/{id}/`: Delete job ## Testing Strategy 1. **Unit Tests** - Mock the dbt Cloud API responses for different scenarios: - Test hook methods in isolation - Test operators with mocked payload i.e. payload cannot be empty and includes required params 2. **Integration Tests** - None to begin with ## Backwards Compatibility All new functionality will be added in a backward-compatible way: - No changes to existing operator signatures - Existing functionality will remain unchanged - New functionality will be available through new operators - Hook methods will re-use method in calls `_run_and_get_response` GitHub link: https://github.com/apache/airflow/discussions/50679 ---- This is an automatically sent email for commits@airflow.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org