[ https://issues.apache.org/jira/browse/AIRFLOW-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733814#comment-16733814 ]
ASF GitHub Bot commented on AIRFLOW-3627: ----------------------------------------- XD-DENG commented on pull request #4433: [AIRFLOW-3627] Improve queries performance in /task_stats in views.py by ~20% URL: https://github.com/apache/incubator-airflow/pull/4433 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-3627 ### Description `/task_stats` is used quite heavily. Every time the main page is loaded, it will be called. But actually the performance of it can be improved significantly by minor changes. 1. In the sqlalchemy `.filter()` statement, no need to explicitly add `==True`. The value itself is already Boolean. Doing another explicit comparison is adding heavy overhead. This change helps improve query performance significantly 2. We can merge the multiple `.filter()`s (https://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.filter). **After these changes, the query time can be reduced by ~20%** #### Benchmarking ```python from airflow.settings import Session from airflow import models from datetime import datetime def test0(): s.query(DM).filter(DM.is_active == True).filter(DM.dag_id == 'tutorial') def test1(): s.query(DM).filter(DM.is_active).filter(DM.dag_id == 'tutorial') def test2(): s.query(DM).filter(DM.is_active, DM.dag_id == 'tutorial') if __name__ == '__main__': import timeit s = Session() DM = models.DagModel n = 300000 print(s.query(DM).filter(DM.is_paused == True).count() == s.query(DM).filter(DM.is_paused).count()) print(timeit.timeit("test0()", number=n, setup="from __main__ import test0")) print(timeit.timeit("test1()", number=n, setup="from __main__ import test1")) print(timeit.timeit("test2()", number=n, setup="from __main__ import test2")) print("\n --- Run Tests in Reverse Order --- \n") print(timeit.timeit("test2()", number=n, setup="from __main__ import test2")) print(timeit.timeit("test1()", number=n, setup="from __main__ import test1")) print(timeit.timeit("test0()", number=n, setup="from __main__ import test0")) ``` Result: ``` True 94.96372179000173 65.610312083998 65.09665720800695 --- Run Tests in Reverse Order --- 61.59604939300334 65.70635982099338 83.37481481899158 ``` ### Code Quality - [x] Passes `flake8` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Queries in /task_stats of both /www & /www_rbac can be refined significantly > by minor changes > --------------------------------------------------------------------------------------------- > > Key: AIRFLOW-3627 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3627 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver > Reporter: Xiaodong DENG > Assignee: Xiaodong DENG > Priority: Critical > > /task_stats is used heavily in UI. Every time the main page is loaded, it > will be called. > But actually the performance of it can be improved significantly by minor > changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)