lboudard edited a comment on issue #17490:
URL: https://github.com/apache/airflow/issues/17490#issuecomment-898448207


   I agree on this subject, currently pod operator is missing some very handy 
features that [kubernetes job 
controller](https://kubernetes.io/docs/concepts/workloads/controllers/job/) 
implements such as time to live after success/failure that are really handy.
   I also agree on the fact that the usage of kubernetes executor vs kubernetes 
pod operator is not very clear yet.
   In our use case, since we have very different dags types living in the same 
airflow instance, so we use multiple images that are scheduled through pod 
operators (that we used before kubernetes executor and taskflow api appeared).
   Say for instance one image to parse new batches of data and another one to 
train models on it in another dag.
   That is not ideal since the workflow dependencies are not properly binded in 
code but rather to expected data checkpoints, say instead of having
   ```
   read_file | parse | feature_engineering | train_model
   read_file | archive
   ```
   that describe direct data dependencies in code (say the airflow taskflow 
way, or equivalently in spark or apache beam), we rather have
   ```
   schedule_parse_file_and_store(raw_data_batch_location)
   schedule_feature_engineer(raw_data_batch_location)
   schedule_train_model(feature_engineered_batch_location)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to