[GitHub] [airflow] lboudard commented on issue #17490: KubernetesJobOperator

GitBox Fri, 13 Aug 2021 06:12:43 -0700


lboudard commented on issue #17490:
URL: https://github.com/apache/airflow/issues/17490#issuecomment-898448207



   I agree on this subject, currently pod operator is missing some very handy 
features that [kubernetes job 
controller](https://kubernetes.io/docs/concepts/workloads/controllers/job/) 
implements such as time to live after success/failure that are really handy.
   I also agree on the fact that the usage of kubernetes executor vs kubernetes 
pod operator is not very clear yet.
   In our use case, since we have very different dags types living in the same 
airflow instance, we have multiple images that are run through pod operators 
(that we used before kubernetes executor and taskflow api).
   Say for instance one image to parse new batches of data and another one to 
train models on it in another dag. But that is not ideal since the workflow 
dependencies are not properly binded in code
   ```
   read_file | parse | feature_engineering | train_model
   read_file | archive
   ```
   that describe direct data dependencies in code (say the airflow taskflow 
way, or equivalently in spark or apache beam), we rather have
   ```
   schedule_parse_file_and_store(raw_data_batch_location)
   schedule_feature_engineer(raw_data_batch_location)
   schedule_train_model(feature_engineered_batch_location)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [airflow] lboudard commented on issue #17490: KubernetesJobOperator

Reply via email to