Hi guys,

PipelineOptionsFactory has a nice strict mode validating the options you
pass.

Concretely if you pass --sudoMakeItWork you will ikely see:

java.lang.IllegalArgumentException: Class interface
org.apache.beam.sdk.options.PipelineOptions missing a property named '
sudoMakeItWork'.

This is not bad however the way it is implemented just doesn't work and its
design is wrong:

1. the pipeline options factory relies on a cache so only the options
available when the class is instantiated are available. For example in a
container (web container, OSGi or other = not flat classpath) you will not
be able to load the IO options if the IO are not in the same classloader
than the sdk core which is quite a pitfall.
2. the validation leads between options. The validation is not "the options
are valid" but "there is some option matching your parameter"

A case which is broken but "green" today is (not using exact names for the
example):

--runner=DirectRunner --sparkMaster= spark://localhost

this will work if I have both runner in the classpath but it should
actually fail cause one is invalid for the other.

To fix that I see 3 options:

1. relax the strict mode to be false by default and lazily evaluate the
options
2. don't allow to lazily cast the options but enforce to do it when the
instance is created, this means that a user must know all PipelineOptions
children it relies on at creation time
(PipelineOptionsFactory.fromArgs(args).create(DirectOptions.class,
S3Options.class, MyIOOptions.class)
3. use a namespace/prefix for nested options, this way the previous example
would become:

--runner=DirectRunner --spark.master=spark://localhost

in this case the PipelineOptions instantiation would validate the prefix ""
and if SparkOptions are requested they would validate the prefix spark.*.

This is not perfect but current state with a single eager registry for the
validation is very hardly usable as soon as you try to industrialize beam
in something else than a main :(.

Any one has an idea to not break the API?


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

Reply via email to