[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-595422689 > CI fails with the following: > > ``` > /opt/airflow/docs/_api/airflow/providers/cncf/kubernetes/hooks/kubernetes/index.rst:38: WARNING: Field list ends without a blank line; unexpected unindent. > ``` fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-595347845 > Good job! > I am working with Spark on Kubernetes as well, this will allow us to adopt Airflow for scheduling our Spark apps, because the current way is not so great. > Also, the idea of generalizing this to any CRD is indeed the next step and will be an amazing plus to embrace Airflow as scheduler for all Kubernetes workloads. > Hope to see this PR merged soon! > Simon thank you Simon stay tuned :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-595244604 thank you @kaxil I fixed all the stuff according to your last review :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-595108351 > Hello, > > your pull request seems very interesting. Currently, I am using Airflow and Spark Operator on Kubernetes separately and this PR seems a very good solution to use them together. > > Dealing with the code I would like to understand some points: > > 1. Why are two different tasks used for creating and checking the status of Spark Application (**SparkKubernetesOperator** and **SparkKubernetesSensor**) ? >Is it possible to merge them in a single Airflow operator ? In this way you can use only a single task in the DAG to manage a Spark Application. > 2. when a Custom resource is created/watched, I noticed the following part: > > ``` > group="sparkoperator.k8s.io", > version="v1beta2", > plural="sparkapplications", > ``` > > Why are these variables hard-coded here instead of retrieving them from the spark application definition (yaml) ? > > Having a generic airflow operator that is able to manage all kinds of custom resources and not only the spark applications would be easier to re-use. > > Thanks, > Alessio Thanks for your interest in my PR!!! 1. I've seen the division to separate Operator and sensor also in the AWS EMR operator, and this is the way that we are currently working in my company, also the sensor has the built-in poke methods that check the condition and sleep. I know there is a way to call the sensor from the operator directly so let's say given a flag in the operator include_sensor=True after the creation done i'll call the sensor class. @mik-laj what is your opinion on this? 2. I want to make this operator/sensor specific to sparkOperator so that's why I used hardcoded values so if you feeding the operator with the wrong YAML it won't pass. as for general CRD operator: I made a proposal on the mailing list: https://lists.apache.org/thread.html/r36fddadf0791718716829875e9a60674ef61a0ca598ed4e952d115fc%40%3Cdev.airflow.apache.org%3E and it seems that the right way is to make a more specific operator. but after this PR will pass I'll make another one with general CRD operator/sensor(the Kubernetes hook thanks to @mik-laj already has general CRD methods) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-592021693 @kaxil and @ashb I added test and fixed the things from your last review and you check again? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-592021378 @kaxil and @ashb I added test and fixed the things from your last review and you check again? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-579006913 @mik-laj can you make another review? :) I fixed the connection according to your recommendations This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-578556340 @mik-laj can you review again, please? I added now new Kubernetes connection. and thank you for your review and comments I learn a lot from them :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-578542183 > > the spark-on-k8s-operator is being developed by GCP but it's not GCP service(can run on all Kubernetes flavors, not just GKE) so I don't know if it should follow GCP guidelines, or if it should and can be included in GCP system tests > > The requirements for GCP are very restrictive and these are the best we have. Other providers and developers can also learn from them and after changes, introduce to their operators. We also want to have tests for other services, but there were no people willing to develop system tests. Here, I think it is worth developing them, but this is not necessary if you do not have time. However, we currently have a cluster in the CI environment, so writing tests should not be problematic. https://github.com/apache/airflow/blob/master/TESTING.rst#running-tests-with-kubernetes in order to do it, I need the k8s cluster to have [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/](url) installed on the cluster I don't know if it's possible I will take a look and if it's possible I'll definitely do it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-577146596 ping @mik-laj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-574622298 > @roitvt I am busy on the client's project, but I will try to come back this week. great thanks!! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-574569602 @mik-laj can you review again? and share your thoughts This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-574569284 > Have you tested it with GKE? Should we ensure integration with IAM? KubernetesPodOperator have support for IAM via GKEStartPodOperator > https://github.com/apache/airflow/blob/6b1986ec58f10420dd9e397f91a388767a89b325/airflow/gcp/operators/kubernetes_engine.py#L185 i tested with amazon EKS and with local kubernetes cluster(docker for mac) but you can specify kube_config so it should work with all Kubernetes flavors. if the airflow is running inside a kubernetes cluster so in_cluster: True flag will take the kube_config from within the cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor URL: https://github.com/apache/airflow/pull/7163#issuecomment-574327663 > The `spark-on-k8s-operator` project is maintained by the Google Cloud Platform and should therefore comply with the GCP guidelines. I am not sure if it complies with these recommendations. Can you check it? https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit > > In particular, system tests and guides are very much awaited. We have system tests for all operators for GCP. There are several services missing on the list of guides, but we are constantly developing. the spark-on-k8s-operator is being developed by GCP but it's not GCP service(can run on all Kubernetes flavors, not just GKE) so I don't know if it should follow GCP guidelines, or if it should and can be included in GCP system tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services