Sure.

While most EMR clusters are ephemeral, some of our use cases required
persistent EMR clusters since the apps they run are short and run on a
short interval so the overhead of creating a new EMR cluster is too high.

In these cases I want to make sure that if the cluster dies and is replaced
by another one nothing needs to change in the DAG.

So if I search by cluster name (In our use case we only have 1 cluster
alive for any given name) I can always find the correct cluster ID.

Perhaps instead of a whole operator it can be added to EmrHook as you
suggested, then an option to pass either cluster name or id
to EmrAddStepsOperator (which today only accepts cluster id [param
job_flow_id]).

On Wed, Nov 13, 2019 at 5:59 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> My initial thought is that doesn't quite sound like a whole operator, but
> a useful function to add to the EmrHook.
>
> Could you describe in a little bit more detail how you use it?
>
> -a
>
> > On 13 Nov 2019, at 15:40, Aviem Zur <aviem...@gmail.com> wrote:
> >
> > Hi,
> >
> > I've created a new operator and want to check viability to contribute it
> to
> > airflow/contrib
> >
> > The operator is called: emr_cluster_name_to_id
> >
> > Given an EMR cluster name will return id of the first live cluster found
> > with a matching name.
> > This is useful for users with persistent EMR clusters they wish to add
> > steps to via airflow.
> > If the cluster dies and is replaced by a new cluster with the same name
> no
> > code or configuration needs to be changed since the operator will pick up
> > the correct id when the DAG is run.
> >
> > Is this a viable operator for airflow/contrib?
> > If so I'll create a JIRA task and a PR on GitHub.
> >
> > Thanks,
> > Aviem
>
>

Reply via email to