kobethuwis commented on issue #29393:
URL: https://github.com/apache/airflow/issues/29393#issuecomment-1438402879

   Due to this issue, I tried understanding the way Airflow is (remotely) 
logging tasks. I am using S3 for remote log storage, but would like to access 
live logs for running tasks without having to use a persistent volume.
   
   > In the Airflow UI, remote logs take precedence over local logs when remote 
logging is enabled. If remote logs can not be found or accessed, local logs 
will be displayed. Note that logs are only sent to remote storage once a task 
is complete (including failure). In other words, remote logs for running tasks 
are unavailable (but local logs are available).
   
   However, the error log tells us that on the pod's level log retrieval is not 
possible, `*** Log file does not exist: /opt/airflow/logs/.../attempt=1.log *** 
`
   
   This is because I am using the CeleryWorker in my deployment and is 
elaborated in the documentation 
[here](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html#serving-logs-from-workers)
 
   
   > Airflow automatically starts an HTTP server to serve the logs ... The 
server is running on the port specified by worker_log_server_port option in 
[logging] section. By default, it is 8793 ... 
   
   This behaviour is confirmed by the second part of the error, where we are 
trying to fetch the from the server: `Fetching from: 
http://airflow-worker-46838b34-rq7r8:8793/log/.../attempt=1.log ` I can thus 
reduce my issue to not being able to fetch the logs from the server.
   
   Using a shell inside the webserver I tried resolving/pinging 
`http://airflow-worker-46838b34-rq7r8:879`, without result. We're actually 
looking at a 'simple' DNS resolving error, which led me to this [solution
   
](https://stackoverflow.com/questions/62905221/dns-for-kubernetes-pods-and-airflow-worker-logs).
 
   
   The suggested resolution works! This way I'm able to fall back to live local 
logs served by the worker for running tasks.
   
   Specify the following in your helm chart:
   ```
   extraEnv: |
     - name: AIRFLOW__CORE__HOSTNAME_CALLABLE
       value: 'airflow.utils.net:get_host_ip_address'
   ```
   
   To be clear, this does not resolve @pdebelak 's original issue but did 
enable me to view live local logs for running tasks while still storing the 
finished tasks logs remotely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to