tirkarthi commented on issue #61212: URL: https://github.com/apache/airflow/issues/61212#issuecomment-3818821057
The sql executed on the db took around 1 minute to complete. When a filter like `state = "deferred"` that returns fewer task instances then it was instant. Another thing I noticed is that start_date is sorted by default and is using filesort but state is using an index as per explain. https://stackoverflow.com/questions/12148943/mysql-performance-slow-using-filesort ``` mysql> explain select id from task_instance order by start_date, id desc limit 1; +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+----------------+ | 1 | SIMPLE | task_instance | NULL | ALL | NULL | NULL | NULL | NULL | 29 | 100.00 | Using filesort | +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+----------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select id from task_instance order by state desc limit 1; +----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+----------------------------------+ | 1 | SIMPLE | task_instance | NULL | index | NULL | ti_state | 83 | NULL | 1 | 100.00 | Backward index scan; Using index | +----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) ``` ``` mysql> show create table task_instance \G; *************************** 1. row *************************** Table: task_instance Create Table: CREATE TABLE `task_instance` ( `id` varchar(36) COLLATE utf8mb4_unicode_ci NOT NULL, `task_id` varchar(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL, `dag_id` varchar(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL, `run_id` varchar(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL, `map_index` int NOT NULL DEFAULT '-1', `start_date` timestamp(6) NULL DEFAULT NULL, `end_date` timestamp(6) NULL DEFAULT NULL, `duration` float DEFAULT NULL, `state` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `try_number` int NOT NULL, `max_tries` int NOT NULL DEFAULT '-1', `hostname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL, `unixname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL, `pool` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL, `pool_slots` int NOT NULL, `queue` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL, `priority_weight` int NOT NULL, `operator` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `custom_operator_name` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL, `queued_dttm` timestamp(6) NULL DEFAULT NULL, `scheduled_dttm` timestamp(6) NULL DEFAULT NULL, `queued_by_job_id` int DEFAULT NULL, `last_heartbeat_at` timestamp(6) NULL DEFAULT NULL, `pid` int DEFAULT NULL, `executor` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `executor_config` blob NOT NULL, `updated_at` timestamp(6) NULL DEFAULT NULL, `rendered_map_index` varchar(250) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `context_carrier` json DEFAULT NULL, `span_status` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'not_started', `external_executor_id` varchar(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin DEFAULT NULL, `trigger_id` int DEFAULT NULL, `trigger_timeout` timestamp(6) NULL DEFAULT NULL, `next_method` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `next_kwargs` json DEFAULT NULL, `task_display_name` varchar(2000) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `dag_version_id` char(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `task_instance_composite_key` (`dag_id`,`task_id`,`run_id`,`map_index`), KEY `task_instance_dag_version_id_fkey` (`dag_version_id`), KEY `ti_state_lkp` (`dag_id`,`task_id`,`run_id`,`state`), KEY `ti_trigger_id` (`trigger_id`), KEY `ti_pool` (`pool`,`state`,`priority_weight`), KEY `ti_dag_run` (`dag_id`,`run_id`), KEY `ti_dag_state` (`dag_id`,`state`), KEY `ti_heartbeat` (`last_heartbeat_at`), KEY `ti_state` (`state`), CONSTRAINT `task_instance_dag_run_fkey` FOREIGN KEY (`dag_id`, `run_id`) REFERENCES `dag_run` (`dag_id`, `run_id`) ON DELETE CASCADE, CONSTRAINT `task_instance_dag_version_id_fkey` FOREIGN KEY (`dag_version_id`) REFERENCES `dag_version` (`id`) ON DELETE RESTRICT, CONSTRAINT `task_instance_trigger_id_fkey` FOREIGN KEY (`trigger_id`) REFERENCES `trigger` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 1 row in set (0.00 sec) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
