Sure. While most EMR clusters are ephemeral, some of our use cases required persistent EMR clusters since the apps they run are short and run on a short interval so the overhead of creating a new EMR cluster is too high.
In these cases I want to make sure that if the cluster dies and is replaced by another one nothing needs to change in the DAG. So if I search by cluster name (In our use case we only have 1 cluster alive for any given name) I can always find the correct cluster ID. Perhaps instead of a whole operator it can be added to EmrHook as you suggested, then an option to pass either cluster name or id to EmrAddStepsOperator (which today only accepts cluster id [param job_flow_id]). On Wed, Nov 13, 2019 at 5:59 PM Ash Berlin-Taylor <a...@apache.org> wrote: > My initial thought is that doesn't quite sound like a whole operator, but > a useful function to add to the EmrHook. > > Could you describe in a little bit more detail how you use it? > > -a > > > On 13 Nov 2019, at 15:40, Aviem Zur <aviem...@gmail.com> wrote: > > > > Hi, > > > > I've created a new operator and want to check viability to contribute it > to > > airflow/contrib > > > > The operator is called: emr_cluster_name_to_id > > > > Given an EMR cluster name will return id of the first live cluster found > > with a matching name. > > This is useful for users with persistent EMR clusters they wish to add > > steps to via airflow. > > If the cluster dies and is replaced by a new cluster with the same name > no > > code or configuration needs to be changed since the operator will pick up > > the correct id when the DAG is run. > > > > Is this a viable operator for airflow/contrib? > > If so I'll create a JIRA task and a PR on GitHub. > > > > Thanks, > > Aviem > >