Andrew Chen created AIRFLOW-1028:
------------------------------------

             Summary: Databricks Operator for Airflow
                 Key: AIRFLOW-1028
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1028
             Project: Apache Airflow
          Issue Type: New Feature
            Reporter: Andrew Chen
            Assignee: Andrew Chen


It would be nice to have a Databricks Operator/Hook in Airflow so users of 
Databricks can more easily integrate with Airflow.

The operator would submit a spark job to our new /jobs/runs/submit endpoint. 
This endpoint is similar to 
https://docs.databricks.com/api/latest/jobs.html#jobscreatejob but does not 
include the email_notifications, max_retries, min_retry_interval_millis, 
retry_on_timeout, schedule, max_concurrent_runs fields. (The submit docs are 
not out because it's still a private endpoint.)

Our proposed design for the operator then is to match this REST API endpoint. 
Each argument to the parameter is named to be one of the fields of the REST API 
request and the value of the argument will match the type expected by the REST 
API. We will also merge extra keys from kwargs which should not be passed to 
the BaseOperator into our API call in order to be flexible to updates.

In the case that this interface is not very user friendly, we can later add 
more operators which extend this operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to