Re: Declarative Workflow update & shorthand/DSL

Alex Heneveld Mon, 19 Sep 2022 09:31:48 -0700

Geoff-  Thanks.  Comments addressed in #1361 along with a major addition to
support variables -- inputs/outputs/etc.


All-  One of the points Geoff makes concerns how steps are defined.  I
think along with other comments that tips the balance in favour of
revisiting how steps are defined.

I propose we switch from the OLD proposed approach -- the map of ordered
IDs -- to a NEW LIST-BASED approach.  There's a lot of detail below but
in-short it's shifting from:

steps:
  1-say-hi:  log hi
  2-step-two:  log step 2

To:

steps:
  - log hi
  - log step 2


Specifically, based on feedback and more hands-on experience, I propose:

* steps now be supplied as a list (now a map)
* users are no longer required to supply an ID for each step (in the old
approach, the ID was required as the key for every step)
* users can if they wish supply an ID for any step (now as an explicit `id:
<ID>` rule)
* the default order, if no `next: <ID>` instruction is supplied, is the
order of the list (in the old approach the order was based on the ID)

Also, the shorthand idea has evolved a little bit; instead of a "<type>:
<type-specific-shorthand-template>" single-key map, we've suggested:

* it be a string "<type> <type-specific-shorthand-template>"
* shorthand can also be supplied in a map using the key "s" or the key
"shorthand" (to allow shorthand along with other step key values)
* custom steps can define custom shorthand templates (e.g. "${key} "="
${value}")
* (there is also some evolution in how custom steps are defined)


To illustrate:

The OLD EXAMPLE:

steps:
   1:
      type: container
      image: my/google-cloud
      command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
      env:
        BUCKET: $brooklyn:config("bucket")
      on-error: retry
    2:
      set-sensor: spark-output=${1.stdout}

Would become in the NEW proposal:

steps:
    - type: container
      image: my/google-cloud
      command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
      env:
        BUCKET: $brooklyn:config("bucket")
      on-error: retry
    - set-sensor spark-output = ${1.stdout}

If we wanted to attach an `id` to the second step (e.g. for use with
"next") we could write it either as:

    # full long-hand map
    - type: set-sensor
      input:
        sensor: spark-output
        value: ${1.stdout}
      id: set-spark-output

    # mixed "s" shorthand key and other fields
    - s: set-sensor spark-output = ${1.stdout}
      id: set-spark-output

To explain the reasoning:

The advantages of steps:

* Slightly less verbose when no ID is needed on a step
* Easier to read and understand flow
* Avoids hassle of renumbering when introducing step
* Avoids risk of error where same key defined multiple time

The advantages of OLD map-based scheme (implied disadvantages of the new
steps process):

* Easier user-facing correlation on steps (e.g. in UI) by always having an
explicit ID for easier correlation
* Easier to extend a workflow by inserting or overriding explicit steps

After some initial usage of the workflow, it seems these advantages of the
old approach are outweighed by the advantages of the list approach.  In
particular the "correlation" can be done in other ways, and extending a
workflow is probably not so useful, whereas supplying and maintaining an ID
is a hassle, error-prone, and harder to understand.

Finally to explain the custom steps idea, it works out nicely in the code
and we think for users to add a "compound-step" to the catalog e.g. as
follows for the workflow shown above:

  id: retryable-gcloud-dataproc-with-bucket-and-sensor
  item:
    type: custom-workflow-step
    parameters:
      bucket:
        type: string
      sensor_name:
        type: string
        default: spark-output
    shorthand_definition: [ " bucket " ${bucket} ] [ " sensor "
${sensor_name} ]
    steps:
    - type: container
      image: my/google-cloud
      command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
      env:
        BUCKET: ${bucket}
      on-error: retry
    - set-sensor ${sensor_name} = ${1.stdout}

A user could then write a step:

- retryable-gcloud-dataproc-with-bucket-and-sensor

And optionally use the shorthand per the shorthand_definition, matching the
quoted string literals and inferring the indicated parameters, e.g.:

- retryable-gcloud-dataproc-with-bucket-and-sensor bucket my-bucket sensor
my-spark-output

They could of course also use the longhand:

- type: retryable-gcloud-dataproc-with-bucket-and-sensor
  input:
    bucket: my-bucket
    sensor_name: my-spark-output


Best
Alex



On Sat, 17 Sept 2022 at 21:13, Geoff Macartney <geom...@apache.org> wrote:

> Hi Alex,
>
> Belatedly reviewed the PR. It's looking good! And surprisingly simple
> in the end. Made a couple of minor comments on it.
>
> Cheers
> Geoff
>
> On Thu, 8 Sept 2022 at 09:35, Alex Heneveld <a...@cloudsoft.io> wrote:
> >
> > Hi team,
> >
> > An initial PR with a few types and the ability to define an effector is
> > available [1].
> >
> > This is enough for the next steps to be parallelized, e.g. new steps
> > added.  The proposal has been updated with a work plan / list of tasks
> > [2].  Any volunteers to help with some of the upcoming tasks let me know.
> >
> > Finally I've been thinking about the "shorthand syntax" and how to bring
> us
> > closer to Peter's proposal of a DSL.  The original proposal allowed
> instead
> > of a map e.g.
> >
> > step_sleep:
> >   type: sleep
> >   duration: 5s
> >
> > or
> >
> > step_update_service_up:
> >   type: set-sensor
> >   sensor:
> >     name: service.isUp
> >     type: boolean
> >   value: true
> >
> > being able to use a shorthand _map_ with a single key being the type, and
> > value interpreted by that type, so in the OLD SHORTHAND PROPOSAL the
> above
> > could be written:
> >
> > step_sleep:
> >   sleep: 5s
> >
> > step_update_service_up:
> >   set-sensor: service.isUp = true
> >
> > Having played with syntaxes a bit I wonder if we should instead say the
> > shorthand DSL kicks in when the step _body_ is a string (instead of a
> > single-key map), and the first word of the string being the type, and the
> > remainder interpreted by the type, and we allow it to be a bit more
> > ambitious.
> >
> > Concretely this NEW SHORTHAND PROPOSAL would look something like:
> >
> > step_sleep: sleep 5s
> > step_update_service_up: set-sensor service.isUp = true
> > # also supporting a type, ie `set-sensor [TYPE] NAME = VALUE`, eg
> > step_update_service_up: set-sensor boolean service.isUp = true
> >
> > You would still need the full map syntax whenever defining flow logic --
> eg
> > condition, next, retry, or timeout -- or any property not supported by
> the
> > shorthand syntax.  But for the (majority?) simple cases the expression
> > would be very concise.  In most cases I think it would feel like a DSL
> but
> > has the virtue of a very clear translation to the actual workflow model
> and
> > the underlying (YAML) model needed for resumption and UI.
> >
> > As a final example, the example used at the start of the proposal
> > (simplified a little -- removing on-error retry and env map as those
> > wouldn't be supported by shorthand):
> >
> > brooklyn.initializers:
> > - type: workflow-effector
> >  name: run-spark-on-gcp
> >  steps:
> >    1:
> >       type: container
> >       image: my/google-cloud
> >       command: gcloud dataproc jobs submit spark
> > --BUCKET=gs://$brooklyn:config("bucket")
> >     2:
> >       type: set-sensor
> >       sensor: spark-output
> >       value: ${1.stdout}
> >
> > Could be written in this shorthand as follows:
> >
> >  steps:
> >    1: container my/google-cloud command "gcloud dataproc jobs submit
> spark
> > --BUCKET=gs://${entity.config.bucket}"
> >    2: set-sensor spark-output ${1.stdout}
> >
> > Thoughts?
> >
> > Best
> > Alex
> >
> >
> > [1] https://github.com/apache/brooklyn-server/pull/1358
> > [2]
> >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.gbadaqa2yql6
> >
> >
> > On Wed, 7 Sept 2022 at 09:58, Alex Heneveld <a...@cloudsoft.io> wrote:
> >
> > > Hi Peter,
> > >
> > > Yes - thanks for the extra details.  I did take your suggestion to be a
> > > procedural DSL not YAML, per the illustration at [1] (second code
> block).
> > > Probably where I was confusing was in saying that unlike DSLs which
> just
> > > run (and where the execution can be delegated to eg java/groovy/ruby),
> here
> > > we need to understand and display, store and resume the workflow
> progress.
> > > So I think it needs to be compiled to some representation that is well
> > > described and that new Apache Brooklyn code can reason about, both in
> the
> > > UI (JS) and backend (Java).  Parsing a DSL is much harder than using
> YAML
> > > for this "reasonable" representation (as in we can reason _about_ it
> :) ),
> > > because we already have good backend processing, persistence,
> > > serialization; and frontend processing and visualization support for
> > > YAML-based models.  So I think we almost definitely want a
> well-described
> > > declarative YAML model of the workflow.
> > >
> > > We might *also* want a Workflow DSL because I agree with you a DSL
> would
> > > be nicer for a user to write (if writing by hand; although if composing
> > > visually a drag-and-drop to YAML is probably easier).  However it
> should
> > > probably get "compiled" into a Workflow YAML.  So I'm suggesting we do
> the
> > > workflow YAML at this stage, and a DSL that compiles into that YAML
> can be
> > > designed later.  (Designing a good DSL and parser and reason-about-able
> > > representation is a big task, so being able to separate it feels good
> too!)
> > >
> > > Best
> > > Alex
> > >
> > > [1]
> > >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.75wm48pjvx0h
> > >
> > >
> > > On Fri, 2 Sept 2022 at 20:17, Geoff Macartney <
> geoff.macart...@gmail.com>
> > > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> Thanks for such a detailed writeup of how you see this working. I fear
> > >> I've too little experience with this sort of thing to be able to say
> > >> anything very useful about it. My thought on the matter would be,
> > >> let's get started with the yaml based approach and see how it goes. I
> > >> think that experience would then give us a much better feel for what a
> > >> really nice and usable DSL for workflows would look like (probably to
> > >> address all the pain points of the yaml approach! :-)   The outline
> > >> above will then be a good starting point, I'm sure.
> > >>
> > >> Cheers
> > >> Geoff
> > >>
> > >> On Thu, 1 Sept 2022 at 21:26, Peter Abramowitsch
> > >> <pabramowit...@gmail.com> wrote:
> > >> >
> > >> > Hi All
> > >> > I just wanted to clarify something in my comment the other day about
> > >> DSLs
> > >> > since I see that the acronym was also used in Alex's original
> document.
> > >> > Unless I misunderstood, Alex was proposing to create a DSL for
> Brooklyn
> > >> > using yaml as syntax and writing a code layer to translate between
> that
> > >> > syntax and underlying APIs which are presumably all in Java.
> > >> >
> > >> > What I was suggesting was a DSL written directly in  Java (I guess)
> > >> whose
> > >> > syntax would be that language, but whose grammar would be keywords
> that
> > >> > were also Java functions.  Some of these functions would be
> pre-defined
> > >> in
> > >> > the DSL, while others could be  defined by the user and could use
> other
> > >> > functions of the DSL.    The result would be turned into a JAR file
> (or
> > >> > equivalent in another platform)   But during the compile phase, it
> > >> would be
> > >> > checked for errors, and it could be debugged line by line either
> > >> invoking
> > >> > live functionality or using a library of mock versions of the
> Brooklyn
> > >> API.
> > >> >
> > >> > In this 'native' DSL one could provide different types of workflow
> > >> > constructs as functions (In the BaseClass), taking function names as
> > >> method
> > >> > pointers, or using Lambdas.  It would be a lot easier in Ruby or
> Python
> > >> >
> > >> > // linear
> > >> > brooklynRun(NamedTaskMethod, NamedTaskMethod)
> > >> >
> > >> > // chained
> > >> > TaskMethodA()TaskMethodB().
> > >> >
> > >> > // asynchronous
> > >> > brooklynJoin(NamedTaskMethod, NamedTaskMethod,...)
> > >> >
> > >> > // conditional
> > >> > brooklynRunIf(NamedTaskMethod, NamedConditionMethod,...)
> > >> >
> > >> > // iterative
> > >> > brooklynRunWhile(NamedTaskMethod, NamedConditionMethod,...)
> > >> > brooklynRunUntil(NamedTaskMethod, NamedConditionMethod,...)
> > >> >
> > >> > // there could even be a utility to implement legacy syntax (this of
> > >> course
> > >> > would require the extra code layer I was trying to avoid)
> > >> > runYaml(Path)
> > >> >
> > >> > A basic class structure might be
> > >> >
> > >> > // where BrooklynRecipeBase implements the utility functions
> including,
> > >> > among others  Join, Run, If, While, Until mentioned above
> > >> > // and the BrooklynWorkflowInterface would dictate the functional
> > >> > requirements for the mandatory aspects of the Recipe.
> > >> > class MyRecipe extends BrooklynRecipeBase implements,
> > >> > BrooklynWorkflowInterface
> > >> > {
> > >> > Initialize()
> > >> > createContext()   - spin up resources
> > >> > workflow() - the main launch sequence using aspects of the DSL
> > >> > monitoring() - an asynchronous workflow used to manage sensor
> output or
> > >> for
> > >> > whatever needs to be done while the "orchestra" is plating
> > >> > shutdownHook() - called whenever shutdown is happening
> > >> > }
> > >> >
> > >> > For those who don't like the smell of Java, the source file could
> just
> > >> be
> > >> > the contents, which would then be injected into the class framing
> code
> > >> > before compilation.
> > >> >
> > >> > These are just ideas.  I'm not familiar enough with Brooklyn in its
> > >> current
> > >> > implementation to be able to create realistic pseudocode.
> > >> >
> > >> > Peter
> > >> >
> > >> > On Thu, Sep 1, 2022 at 9:24 AM Geoff Macartney <
> > >> geoff.macart...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Alex,
> > >> > >
> > >> > > That's great, I'll be excited to hear all about it.  7th September
> > >> > > suits me fine; I would probably prefer 4.00 p.m. over 11.00.
> > >> > >
> > >> > > Cheers
> > >> > > Geoff
> > >> > >
> > >> > > On Thu, 1 Sept 2022 at 12:41, Alex Heneveld <a...@cloudsoft.io>
> > >> wrote:
> > >> > > >
> > >> > > > Thanks for the excellent feedback Geoff and yes there are some
> very
> > >> cool
> > >> > > and exciting things added recently -- containers, conditions, and
> > >> terraform
> > >> > > and kubernetes support, all of which make writing complex
> blueprints
> > >> much
> > >> > > easier.
> > >> > > >
> > >> > > > I'd love to host a session to showcase these.
> > >> > > >
> > >> > > > How does Wed 7 Sept sound?  I could do 11am UK or 4pm UK --
> > >> depending
> > >> > > what time suits for people who are interested.  Please RSVP and
> > >> indicate
> > >> > > your time preference!
> > >> > > >
> > >> > > > Best
> > >> > > > Alex
> > >> > > >
> > >> > > >
> > >> > > > On Wed, 31 Aug 2022 at 22:17, Geoff Macartney <
> > >> geoff.macart...@gmail.com>
> > >> > > wrote:
> > >> > > >>
> > >> > > >> Hi Alex,
> > >> > > >>
> > >> > > >> Another thought occurred to me when reading that workflow
> > >> proposal. You
> > >> > > wrote
> > >> > > >>
> > >> > > >> "and with the recent support for container-based tasks and
> > >> declarative
> > >> > > >> conditions, we have taken big steps towards enabling YAML
> > >> authorship"
> > >> > > >>
> > >> > > >> Unfortunately over the past while I haven't been able to keep
> up as
> > >> > > >> closely as I would like with developments in Brooklyn. I'm just
> > >> > > >> wondering if it might be possible to get together some time, on
> > >> Google
> > >> > > >> Meet or Zoom or whatnot, if you or a colleague could spare
> half an
> > >> > > >> hour to demo some of these recent developments? But don't worry
> > >> about
> > >> > > >> it if you're too busy at present.
> > >> > > >>
> > >> > > >> Adding dev@ to this in CC for the sake of Openness. Others
> might
> > >> also
> > >> > > >> be interested!
> > >> > > >>
> > >> > > >> Cheers
> > >> > > >> Geoff
> > >> > >
> > >>
> > >
>

Re: Declarative Workflow update & shorthand/DSL

Reply via email to