[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-03-05 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595422689
 
 
   > CI fails with the following:
   > 
   > ```
   > 
/opt/airflow/docs/_api/airflow/providers/cncf/kubernetes/hooks/kubernetes/index.rst:38:
 WARNING: Field list ends without a blank line; unexpected unindent.
   > ```
   
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-03-05 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595347845
 
 
   > Good job!
   > I am working with Spark on Kubernetes as well, this will allow us to adopt 
Airflow for scheduling our Spark apps, because the current way is not so great.
   > Also, the idea of generalizing this to any CRD is indeed the next step and 
will be an amazing plus to embrace Airflow as scheduler for all Kubernetes 
workloads.
   > Hope to see this PR merged soon!
   > Simon
   
   thank you Simon stay tuned :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-03-05 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595244604
 
 
   thank you @kaxil I fixed all the stuff according to your last review :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-03-05 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595108351
 
 
   > Hello,
   > 
   > your pull request seems very interesting. Currently, I am using Airflow 
and Spark Operator on Kubernetes separately and this PR seems a very good 
solution to use them together.
   > 
   > Dealing with the code I would like to understand some points:
   > 
   > 1. Why are two different tasks used for creating and checking the status 
of Spark Application (**SparkKubernetesOperator** and 
**SparkKubernetesSensor**) ?
   >Is it possible to merge them in a single Airflow operator ? In this way 
you can use only a single task in the DAG to manage a Spark Application.
   > 2. when a Custom resource is created/watched, I noticed the following part:
   > 
   > ```
   > group="sparkoperator.k8s.io",
   > version="v1beta2",
   > plural="sparkapplications",
   > ```
   > 
   > Why are these variables hard-coded here instead of retrieving them from 
the spark application definition (yaml) ?
   > 
   > Having a generic airflow operator that is able to manage all kinds of 
custom resources and not only the spark applications would be easier to re-use.
   > 
   > Thanks,
   > Alessio
   
   Thanks for your interest in my PR!!!
   1. I've seen the division to separate Operator and sensor also in the AWS 
EMR operator, and this is the way that we are currently working in my company, 
also the sensor has the built-in poke methods that check the condition and 
sleep. I know there is a way to call the sensor from the operator directly so  
let's say given a flag in the operator include_sensor=True after the creation 
done i'll call the sensor class. @mik-laj what is your opinion on this?
   2. I want to make this operator/sensor specific to sparkOperator so that's 
why I used hardcoded values so if you feeding the operator with the wrong YAML 
it won't pass. as for general CRD operator:
   I made a proposal on the mailing list: 
https://lists.apache.org/thread.html/r36fddadf0791718716829875e9a60674ef61a0ca598ed4e952d115fc%40%3Cdev.airflow.apache.org%3E
 and it seems that the right way is to make a more specific operator. but after 
this PR will pass I'll make another one with general CRD operator/sensor(the 
Kubernetes hook thanks to @mik-laj already has general CRD methods) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-02-27 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-592021693
 
 
   @kaxil and @ashb I added test and fixed the things from your last review and 
you check again?
   Thanks 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-02-27 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-592021378
 
 
   @kaxil and @ashb I added test and fixed the things from your last review and 
you check again?
   Thanks 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-27 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-579006913
 
 
   @mik-laj can you make another review? :) I fixed the connection according to 
your recommendations 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-26 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-578556340
 
 
   @mik-laj can you review again, please? I added now new Kubernetes connection.
   and thank you for your review and comments I learn a lot from them :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-26 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-578542183
 
 
   > > the spark-on-k8s-operator is being developed by GCP but it's not GCP 
service(can run on all Kubernetes flavors, not just GKE) so I don't know if it 
should follow GCP guidelines, or if it should and can be included in GCP system 
tests
   > 
   > The requirements for GCP are very restrictive and these are the best we 
have. Other providers and developers can also learn from them and after 
changes, introduce to their operators. We also want to have tests for other 
services, but there were no people willing to develop system tests. Here, I 
think it is worth developing them, but this is not necessary if you do not have 
time. However, we currently have a cluster in the CI environment, so writing 
tests should not be problematic. 
https://github.com/apache/airflow/blob/master/TESTING.rst#running-tests-with-kubernetes
   
   in order to do it, I need the k8s cluster to have 
[https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/](url) installed 
on the cluster I don't know if it's possible I will take a look and if it's 
possible I'll definitely do it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-22 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-577146596
 
 
   ping @mik-laj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-15 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-574622298
 
 
   > @roitvt I am busy on the client's project, but I will try to come back 
this week.
   
   great thanks!!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-15 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-574569602
 
 
   @mik-laj can you review again? and share your thoughts 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-15 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-574569284
 
 
   > Have you tested it with GKE? Should we ensure integration with IAM? 
KubernetesPodOperator have support for IAM via GKEStartPodOperator
   > 
https://github.com/apache/airflow/blob/6b1986ec58f10420dd9e397f91a388767a89b325/airflow/gcp/operators/kubernetes_engine.py#L185
   
   i tested with amazon EKS and with local kubernetes cluster(docker for mac) 
but you can specify kube_config so it should work with all Kubernetes flavors. 
if the airflow is running inside a kubernetes cluster so in_cluster: True flag 
will take the kube_config from within the cluster.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

2020-01-14 Thread GitBox
roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-574327663
 
 
   > The `spark-on-k8s-operator` project is maintained by the Google Cloud 
Platform and should therefore comply with the GCP guidelines. I am not sure if 
it complies with these recommendations. Can you check it? 
https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit
   > 
   > In particular, system tests and guides are very much awaited. We have 
system tests for all operators for GCP. There are several services missing on 
the list of guides, but we are constantly developing.
   
   the spark-on-k8s-operator is being developed by GCP but it's not GCP 
service(can run on all Kubernetes flavors, not just GKE) so I don't know if it 
should follow GCP guidelines, or if it should and can be included in GCP system 
tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services