wolfier opened a new issue, #32996: URL: https://github.com/apache/airflow/issues/32996
### Apache Airflow version 2.6.3 ### What happened A task instance's [log_url](https://github.com/apache/airflow/blob/2.6.3/airflow/models/taskinstance.py#L726) does not contain the full URL defined in [base_url](https://github.com/apache/airflow/blob/2.6.3/airflow/models/taskinstance.py#L729C9-L729C69). ### What you think should happen instead The base_url may contain paths that should be acknowledged when build the log_url. The log_url is built with [urljoin](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin). Due to how urljoin builds URLs, any existing paths are ignored leading to a faulty URL. ### How to reproduce This snippet showcases how urljoin ignores existing paths when building the url. ``` >>> from urllib.parse import urljoin >>> >>> >>> urljoin( ... "https://my.astronomer.run/path", ... f"log?execution_date=test" ... f"&task_id=wow" ... f"&dag_id=super" ... f"&map_index=-1", ... ) 'https://eochgroup.astronomer.run/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1' ``` ### Operating System n/a ### Versions of Apache Airflow Providers _No response_ ### Deployment Astronomer ### Deployment details _No response_ ### Anything else A way to fix this can be to utilize [urlsplit](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) and [urlunsplit](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunsplit) to account for existing paths. ``` from urllib.parse import urlsplit, urlunsplit parts = urlsplit("https://my.astronomer.run/paths") urlunsplit(( parts.scheme, parts.netloc, f"{parts.path}/log", f"execution_date=test" f"&task_id=wow" f"&dag_id=super" f"&map_index=-1", "" ) ) ``` Here is the fix in action. ``` >>> parts = urlsplit("https://my.astronomer.run/paths") >>> urlunsplit(( ... parts.scheme, ... parts.netloc, ... f"{parts.path}/log", ... f"execution_date=test" ... f"&task_id=wow" ... f"&dag_id=super" ... f"&map_index=-1", ... '')) 'https://my.astronomer.run/paths/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1' >>> >>> parts = urlsplit("https://my.astronomer.run/paths/test") >>> urlunsplit(( ... parts.scheme, ... parts.netloc, ... f"{parts.path}/log", ... f"execution_date=test" ... f"&task_id=wow" ... f"&dag_id=super" ... f"&map_index=-1", ... '')) 'https://my.astronomer.run/paths/test/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1' ``` ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
