This is a really simple proposal to add an extension with transforms
that package the Java Scripting API )JSR-223) [1] to allow users to
specialize some transforms via a scripting language. This work was
initially created by Romain [2] and I just took it with his
authorization and refined it to make it pass all the Beam validations
+ style. I also added ValueProviders that allow users to template now
scripts also in Dataflow.

Notice that Dataflow recently added something similar to create really
simple data movement pipelines [3], so maybe the rest of the community
can benefit of a similar extension (and eventually dataflow may
converge to this implementation).

I hope there is interest in this extension, so far we have a
ScriptingParDo transform to show the idea, hopefully we can expand
this to other transforms.

For those interested in more details you can check the Jira issue [4]
and the PR [5].

[1] https://www.jcp.org/en/jsr/detail?id=223
[2] https://github.com/rmannibucau/beam-jsr223
[3] 
https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
[4] https://issues.apache.org/jira/browse/BEAM-3921
[5} https://github.com/apache/beam/pull/4944

Reply via email to