[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:08 PM:
-----------------------------------------------------------

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd40eeee67_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd40eeee67_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py

> EMR Hook, Operators, Sensor
> ---------------------------
>
>                 Key: AIRFLOW-247
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-247
>             Project: Apache Airflow
>          Issue Type: New Feature
>            Reporter: Rob Froetscher
>            Assignee: Rob Froetscher
>            Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to