This is an automated email from the ASF dual-hosted git repository. ferruzzi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push: new 667b842632 metrics tagging documentation (#36627) 667b842632 is described below commit 667b842632fbee984b940a3b8b5a1f0bb3749a0f Author: Gopal Dirisala <39794726+dir...@users.noreply.github.com> AuthorDate: Fri Jan 12 07:14:05 2024 +0530 metrics tagging documentation (#36627) * metrics tagging documentation --- .../logging-monitoring/metrics.rst | 62 +++++++++++++++++----- 1 file changed, 50 insertions(+), 12 deletions(-) diff --git a/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst b/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst index aff3ec3e9a..4c91f37b71 100644 --- a/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst +++ b/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst @@ -147,41 +147,59 @@ Name Descripti ``LocalTaskJob`` ``local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>`` Number of ``LocalTaskJob`` terminations with a ``<return_code>`` while running a task ``<task_id>`` of a DAG ``<dag_id>``. +``local_task_job.task_exit`` Number of ``LocalTaskJob`` terminations with a ``<return_code>`` + while running a task ``<task_id>`` of a DAG ``<dag_id>``. + Metric with job_id, dag_id, task_id and return_code tagging. ``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures +``operator_failures`` Operator ``<operator_name>`` failures. Metric with operator_name tagging. ``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes -``ti_failures`` Overall task instances failures -``ti_successes`` Overall task instances successes -``previously_succeeded`` Number of previously succeeded task instances -``zombies_killed`` Zombie tasks killed +``operator_successes`` Operator ``<operator_name>`` successes. Metric with operator_name tagging. +``ti_failures`` Overall task instances failures. Metric with dag_id and task_id tagging. +``ti_successes`` Overall task instances successes. Metric with dag_id and task_id tagging. +``previously_succeeded`` Number of previously succeeded task instances. Metric with dag_id and task_id tagging. +``zombies_killed`` Zombie tasks killed. Metric with dag_id and task_id tagging. ``scheduler_heartbeat`` Scheduler heartbeats ``dag_processing.processes`` Relative number of currently running DAG parsing processes (ie this delta - is negative when, since the last metric was sent, processes have completed) -``dag_processing.processor_timeouts`` Number of file processors that have been killed due to taking too long + is negative when, since the last metric was sent, processes have completed). + Metric with file_path and action tagging. +``dag_processing.processor_timeouts`` Number of file processors that have been killed due to taking too long. + Metric with file_path tagging. ``dag_processing.sla_callback_count`` Number of SLA callbacks received ``dag_processing.other_callback_count`` Number of non-SLA callbacks received ``dag_processing.file_path_queue_update_count`` Number of times we've scanned the filesystem and queued all existing dags ``dag_file_processor_timeouts`` (DEPRECATED) same behavior as ``dag_processing.processor_timeouts`` ``dag_processing.manager_stalls`` Number of stalled ``DagFileProcessorManager`` ``dag_file_refresh_error`` Number of failures loading any DAG files -``scheduler.tasks.killed_externally`` Number of tasks killed externally +``scheduler.tasks.killed_externally`` Number of tasks killed externally. Metric with dag_id and task_id tagging. ``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler ``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler ``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. -``sla_missed`` Number of SLA misses -``sla_callback_notification_failure`` Number of failed SLA miss callback notification attempts -``sla_email_notification_failure`` Number of failed SLA miss email notification attempts +``sla_missed`` Number of SLA misses. Metric with dag_id and task_id tagging. +``sla_callback_notification_failure`` Number of failed SLA miss callback notification attempts. Metric with dag_id and func_name tagging. +``sla_email_notification_failure`` Number of failed SLA miss email notification attempts. Metric with dag_id tagging. ``ti.start.<dag_id>.<task_id>`` Number of started task in a given dag. Similar to <job_name>_start but for task +``ti.start`` Number of started task in a given dag. Similar to <job_name>_start but for task. + Metric with dag_id and task_id tagging. ``ti.finish.<dag_id>.<task_id>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task +``ti.finish`` Number of completed task in a given dag. Similar to <job_name>_end but for task + Metric with dag_id and task_id tagging. ``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it - means DAG callback is not working. + means DAG callback is not working. Metric with dag_id tagging ``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker. ``celery.execute_command.failure`` Number of non-zero exit code from Celery task. -``task_removed_from_dag.<dag_id>`` Number of tasks removed for a given dag (i.e. task no longer exists in DAG) +``task_removed_from_dag.<dag_id>`` Number of tasks removed for a given dag (i.e. task no longer exists in DAG). +``task_removed_from_dag`` Number of tasks removed for a given dag (i.e. task no longer exists in DAG). + Metric with dag_id and run_type tagging. ``task_restored_to_dag.<dag_id>`` Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file) +``task_restored_to_dag.<dag_id>`` Number of tasks restored for a given dag (i.e. task instance which was + previously in REMOVED state in the DB is added to DAG file). + Metric with dag_id and run_type tagging. ``task_instance_created_<operator_name>`` Number of tasks instances created for a given Operator +``task_instance_created`` Number of tasks instances created for a given Operator. + Metric with dag_id and run_type tagging. ``triggerer_heartbeat`` Triggerer heartbeats ``triggers.blocked_main_thread`` Number of triggers that blocked the main thread (likely due to not being fully asynchronous) @@ -213,11 +231,18 @@ Name Description ``executor.queued_tasks`` Number of queued tasks on executor ``executor.running_tasks`` Number of running tasks on executor ``pool.open_slots.<pool_name>`` Number of open slots in the pool +``pool.open_slots`` Number of open slots in the pool. Metric with pool_name tagging. ``pool.queued_slots.<pool_name>`` Number of queued slots in the pool +``pool.queued_slots`` Number of queued slots in the pool. Metric with pool_name tagging. ``pool.running_slots.<pool_name>`` Number of running slots in the pool +``pool.running_slots`` Number of running slots in the pool. Metric with pool_name tagging. ``pool.deferred_slots.<pool_name>`` Number of deferred slots in the pool +``pool.deferred_slots`` Number of deferred slots in the pool. Metric with pool_name tagging. ``pool.starving_tasks.<pool_name>`` Number of starving tasks in the pool +``pool.starving_tasks`` Number of starving tasks in the pool. Metric with pool_name tagging. ``triggers.running.<hostname>`` Number of triggers currently running for a triggerer (described by hostname) +``triggers.running`` Number of triggers currently running for a triggerer (described by hostname). + Metric with hostname tagging. =================================================== ======================================================================== Timers @@ -231,17 +256,30 @@ Name Description ``dag.<dag_id>.<task_id>.duration`` Seconds taken to run a task ``task.duration`` Seconds taken to run a task. Metric with dag_id and task-id tagging. ``dag.<dag_id>.<task_id>.scheduled_duration`` Seconds a task spends in the Scheduled state, before being Queued +``task.scheduled_duration`` Seconds a task spends in the Scheduled state, before being Queued. + Metric with dag_id and task_id tagging. ``dag.<dag_id>.<task_id>.queued_duration`` Seconds a task spends in the Queued state, before being Running +``task.queued_duration`` Seconds a task spends in the Queued state, before being Running. + Metric with dag_id and task_id tagging. ``dag_processing.last_duration.<dag_file>`` Seconds taken to load the given DAG file +``dag_processing.last_duration`` Seconds taken to load the given DAG file. Metric with file_name tagging. ``dagrun.duration.success.<dag_id>`` Seconds taken for a DagRun to reach success state +``dagrun.duration.success`` Seconds taken for a DagRun to reach success state. + Metric with dag_id and run_type tagging. ``dagrun.duration.failed.<dag_id>`` Seconds taken for a DagRun to reach failed state +``dagrun.duration.failed`` Seconds taken for a DagRun to reach failed state. + Metric with dag_id and run_type tagging. ``dagrun.schedule_delay.<dag_id>`` Milliseconds of delay between the scheduled DagRun start date and the actual DagRun start date +``dagrun.schedule_delay`` Milliseconds of delay between the scheduled DagRun + start date and the actual DagRun start date. Metric with dag_id tagging. ``scheduler.critical_section_duration`` Milliseconds spent in the critical section of scheduler loop -- only a single scheduler can enter this loop at a time ``scheduler.critical_section_query_duration`` Milliseconds spent running the critical section task instance query ``scheduler.scheduler_loop_duration`` Milliseconds spent running one scheduler loop ``dagrun.<dag_id>.first_task_scheduling_delay`` Seconds elapsed between first task start_date and dagrun expected start +``dagrun.first_task_scheduling_delay`` Seconds elapsed between first task start_date and dagrun expected start. + Metric with dag_id and run_type tagging. ``collect_db_dags`` Milliseconds taken for fetching all Serialized Dags from DB ``kubernetes_executor.clear_not_launched_queued_tasks.duration`` Milliseconds taken for clearing not launched queued tasks in Kubernetes Executor ``kubernetes_executor.adopt_task_instances.duration`` Milliseconds taken to adopt the task instances in Kubernetes Executor