o-nikolas commented on code in PR #34381: URL: https://github.com/apache/airflow/pull/34381#discussion_r1367523179
########## airflow/providers/amazon/aws/executors/ecs/README.md: ########## @@ -0,0 +1,196 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# AWS ECS Executor + +This is an Airflow executor powered by Amazon Elastic Container Service (ECS). Each task that Airflow schedules for execution is run within its own ECS container. Some benefits of an executor like this include: + +1. Task isolation: No task can be a noisy neighbor for another. Resources like CPU, memory and disk are isolated to each individual task. Any actions or failures which affect networking or fail the entire container only affect the single task running in it. No single user can overload the environment by triggering too many tasks, because there are no shared workers. +2. Customized environments: You can build different container images which incorporate specific dependencies (such as system level dependencies), binaries, or data required for a task to run. +3. Cost effective: Compute resources only exist for the lifetime of the Airflow task itself. This saves costs by not requiring persistent/long lived workers ready at all times, which also need maintenance and patching. + +For a quick start guide please see [here](Setup_guide.md), it will get you up and running with a basic configuration. + +The below sections provide more generic details about configuration, the provided example Dockerfile and logging. + +## Config Options + +There are a number of configuration options available, which can either be set directly in the airflow.cfg +file under an "aws_ecs_executor" section or via environment variables using the `AIRFLOW__AWS_ECS_EXECUTOR__<OPTION_NAME>` +format, for example `AIRFLOW__AWS_ECS_EXECUTOR__CONTAINER_NAME = "myEcsContainer"`. For more information +on how to set these options, see [Setting Configuration Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html) + +In the case of conflicts, the order of precedence is: + +1. Load default values for options which have defaults. +2. Load any values provided in the RUN_TASK_KWARGS option if one is provided. +3. Load any values explicitly provided through airflow.cfg or environment variables. These are checked with Airflow's config precedence. + +### Required config options: + +- CLUSTER - Name of the Amazon ECS Cluster. Required. +- CONTAINER_NAME - Name of the container that will be used to execute Airflow tasks via the ECS executor. +The container should be specified in the ECS Task Definition. Required. +- REGION - The name of the AWS Region where Amazon ECS is configured. Required. + +### Optional config options: + +- ASSIGN_PUBLIC_IP - "Whether to assign a public IP address to the containers launched by the ECS executor. Defaults to "False". +- CONN_ID - The Airflow connection (i.e. credentials) used by the ECS executor to make API calls to AWS ECS. Defaults to "aws_default". +- LAUNCH_TYPE - Launch type can either be 'FARGATE' OR 'EC2'. Defaults to "FARGATE". +- PLATFORM_VERSION - The platform version the ECS task uses if the FARGATE launch type is used. Defaults to "LATEST". +- RUN_TASK_KWARGS - A JSON string containing arguments to provide the ECS `run_task` API. +- SECURITY_GROUPS - Up to 5 comma-seperated security group IDs associated with the ECS task. Defaults to the VPC default. +- SUBNETS - Up to 16 comma-separated subnet IDs associated with the ECS task or service. Defaults to the VPC default. +- TASK_DEFINITION - The family and revision (family:revision) or full ARN of the ECS task definition to run. Defaults to the latest ACTIVE revision. +- MAX_RUN_TASK_ATTEMPTS - The maximum number of times the Ecs Executor should attempt to run a task. + +For a more detailed description of available options, including type hints and examples, see the `config_templates` folder in the Amazon provider package. + +## Dockerfile for ECS Executor + +An example Dockerfile can be found [here](Dockerfile#), it creates an image that can be used on an ECS container to run Airflow tasks using the AWS ECS Executor in Apache Airflow. The image +supports AWS CLI/API integration, allowing you to interact with AWS services within your Airflow environment. It also includes options to load DAGs (Directed Acyclic Graphs) from either an S3 bucket or a local folder. + +### Base Image Review Comment: This change has been made and the PR is updated, resolving. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org