Hi devs,

Has there been any discussion around changing the DataSource parameters
arguments be something more sophisticated than Map[String, String]? As you
write more complex DataSources there are likely to be a variety of
parameters of varying formats which are needed and having to coerce them to
be strings becomes suboptimal pretty fast.

Quite often I see this combated by people specifying parameters which take
in Json strings and then parse them into the parameter objects that they
actually need. Unfortunately having people write Json strings can be a
really error prone process so to ensure compile time safety people write
convenience functions written which take in actual POJOs as parameters,
serialize them to json so they can be passed into the data source API and
then deserialize them in the constructors of their data sources. There's
also no real story around discoverability of options with the current
Map[String, String] setup other than looking at the source code of the
datasource and hoping that they specified constants somewhere.

Rather than doing all of the above, we could adapt the DataSource API to
have RelationProviders be templated on a parameter class which could be
provided to the createRelation call. On the user's side, they could just
create the appropriate configuration object and provide that object to the
DataFrameReader.parameters call and it would be possible to guarantee that
enough parameters were provided to construct a DataFrame in that case.

The key challenge I see with this approach is that I'm not sure how to make
the above changes in a backwards compatible way that doesn't involve
duplicating a bunch of methods.

Do people have thoughts regarding this approach? I'm happy to file a JIRA
and have the discussion there if it makes sense.

Best,
Hamel

Reply via email to