Hi all, I posted an improvement ticket in JIRA and Hyukjin Kwon requested I send an email to the dev list for discussion.
As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and this works OK to have a single artifact for users, as long as they write out the fully qualified classname as the DataFrame format(). The one part that can't be currently worked around is the DataSourceRegister trait. Since classes which implement DataSourceRegister must also implement DataSourceV2 (and its mixins), changes to those interfaces cause the ServiceLoader to fail when it attempts to load the "wrong" DataSourceV2 class. (there's also an additional prohibition against multiple implementations having the same ShortName in org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource). This means users will need to update their notebooks/code/tutorials if they run @ a different site whose cluster is a different version. To solve this, I proposed in SPARK-31363 a new trait who would function the same as the existing DataSourceRegister trait, but adds an additional method: public Class<? implements DataSourceV2> getImplementation(); ...which will allow DSv2 plugins to dynamically choose the appropriate DataSourceV2 class based on the runtime environment. This would let us release a single artifact for different Spark versions and users could use the same artifactID & format regardless of where they were executing their code. If there was no services registered with this new trait, the functionality would remain the same as before. I think this functionality will be useful as DSv2 continues to evolve, please let me know your thoughts. Thanks Andrew --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org