The only current validator is the @Required validator, there were some ideas to integrate another system to perform validation on options like >=0 for numbers. I'm not sure how much use this has gotten from users, I would be for leaving it as is (if users get value out of it) or removing it and mark it deprecated for removal in the next major version.
Have you taken a look at PipelineOptionsFactory.register(...)? It allows you to register all PipelineOptions interfaces for parsing/validation reasons similar to your suggestion for PipelineOptionsFactory.fromArgs(args).create(DirectOptions.class, S3Options.class, MyIOOptions.class). Note that the user can also have their PipelineOptions interface extend all interfaces that they want to ensure are visible. Note that PipelineOptions was created to be a flat bag of string to type safe accessor/serializer/deserializer and nothing more. There has been little discussion about making it hierarchical and what that would mean. Also, wouldn't making PipelineOptionsFactory not a static singleton (like how I suggested on your PR) address most of the classloader issues that you speak of? On Tue, Mar 20, 2018 at 9:26 AM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Hi guys, > > PipelineOptionsFactory has a nice strict mode validating the options you > pass. > > Concretely if you pass --sudoMakeItWork you will ikely see: > > java.lang.IllegalArgumentException: Class interface > org.apache.beam.sdk.options.PipelineOptions missing a property named ' > sudoMakeItWork'. > > This is not bad however the way it is implemented just doesn't work and > its design is wrong: > > 1. the pipeline options factory relies on a cache so only the options > available when the class is instantiated are available. For example in a > container (web container, OSGi or other = not flat classpath) you will not > be able to load the IO options if the IO are not in the same classloader > than the sdk core which is quite a pitfall. > 2. the validation leads between options. The validation is not "the > options are valid" but "there is some option matching your parameter" > > A case which is broken but "green" today is (not using exact names for the > example): > > --runner=DirectRunner --sparkMaster= spark://localhost > > this will work if I have both runner in the classpath but it should > actually fail cause one is invalid for the other. > > To fix that I see 3 options: > > 1. relax the strict mode to be false by default and lazily evaluate the > options > 2. don't allow to lazily cast the options but enforce to do it when the > instance is created, this means that a user must know all PipelineOptions > children it relies on at creation time > (PipelineOptionsFactory.fromArgs(args).create(DirectOptions.class, > S3Options.class, MyIOOptions.class) > 3. use a namespace/prefix for nested options, this way the previous > example would become: > > --runner=DirectRunner --spark.master=spark://localhost > > in this case the PipelineOptions instantiation would validate the prefix > "" and if SparkOptions are requested they would validate the prefix spark.*. > > This is not perfect but current state with a single eager registry for the > validation is very hardly usable as soon as you try to industrialize beam > in something else than a main :(. > > Any one has an idea to not break the API? > > > Romain Manni-Bucau > @rmannibucau <https://twitter.com/rmannibucau> | Blog > <https://rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github > <https://github.com/rmannibucau> | LinkedIn > <https://www.linkedin.com/in/rmannibucau> | Book > <https://www.packtpub.com/application-development/java-ee-8-high-performance> >