potiuk commented on issue #36963:
URL: https://github.com/apache/airflow/issues/36963#issuecomment-1985742361

   So maybe just to explain how logging works when REMOTE logging uses GCS - 
because you might not be aware of it. GCS is an object storage and streaming to 
it is impossible as mentioned. So the way Airflow "GCS logging handler" works 
is:
   
   1) while tasks are running logs are produced locallly on the worker task is 
running
   2) Airflow UI knows the host name of that worker and worker exposes an API 
where UI can actually stream the logs directly from the worker (and while task 
is runnig this is what airflow UI does - it does not use GCS whatsoever) 
   
   3) after task is completed., the complete log is uploaded from the worker to 
the GCS bucket (this is the first time there is an interaction with GCS for the 
task). This is (again) because streaming to GCS is not possible, you can only 
upload a complete object to GCS and once you upload it, you can't append to it, 
you can only replace it with a new complete object.
   
   4) then Airflow UI - knowing that task is completed will attempt to download 
the log from GCS. Here it actually does use partial retrieval (for efficient 
reading only parts of the log that  it displays) - this is possible with object 
storage, but it's not live streaming -it's merely retriving parts of the object 
that is there, knowing the complete size of it and it's impossible to read 
parts of the object until it is fully uploaded to GCS.
   
   BTW. This is not a jargon, you need to understand how Object storage works 
and how airflow works in all those different cases when object storage is used 
as logging backend.
   
   If we want Cloud Run logs to be available "live", the only good way to make 
this works "properly" for all different configurations of remote logging is to 
be able to stream Cloud Run logs to the worker. and let the worker stream it 
back to the UI. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to