potiuk commented on issue #35474: URL: https://github.com/apache/airflow/issues/35474#issuecomment-1801908221
YEp. @Taragolis . That would be me idea. It comes from the assumption that in order to REALLY be able to handle all timeouts you need to do it from a separate process - because as you rightfully explained - trying to handle things "in-process" is not always applicable. The idea of mine is to add extra layer of "what to do if the actual task process is not responding" - and I think utilising that parent process (which is already there) to apply such hard-timeout is simplest - without modifying states and adding yet another layer of monitoring processes/overloading the scheduler. I think - other than occasional "whole machine stops working" this would handle most cases where the task is not timing out but still continues to do stuff because of badly written low-level C implemetnation of the library that is used.. And the "whole machine hangs" case should anyhow be handled on deployment level (for example K8S should kill it, also in this case we will stop receiving heartbeats and ultimately Scheduler should handle it even today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
