[jira] [Updated] (SPARK-31363) Improve DataSourceRegister interface
[ https://issues.apache.org/jira/browse/SPARK-31363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Malone Melo updated SPARK-31363: --- Description: As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and this works good to have a single artifact for users. The one part that can't be currently worked around is the DataSourceRegister trait. This is an issue because users cargo-cult configuration values, and choosing the wrong plugin version gives a particularly baroque error message that bubbles up through ServiceLoader. Currently, the class implementing DataSourceRegister must also be the class implementing the "toplevel" DataSourceV2 interface (and mixins), and these various interfaces occasionally change as the API evolves. As a practical matter, this means that there's no opportunity to decide at runtime which class to pass along to Spark. Attempting to add multiple DataSourceV2 implementations to services/META-INF causes an exception when the ServiceLoader tries to load the DataSourceRegister who implements the "different" DataSourceV2. I would like to propose a new DataSourceRegister interface which adds a level of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. (strawman) {{interface DataSourceRegisterV2 {}} {{ public String shortName();}} {{ public Class getImplementation();}} {{ }}} Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource would have its search algorithm extended to look for DataSourceRegisterV2 objects, and if one is located for the given shortName, return the class object from getImplementation(). At this point, the plugin could decide based on the current runtime environment which class to prevent to Spark. There wouldn't be any changes to plugins who don't implement this API. If this is an acceptable idea, I can put together a PR for further comment. Thanks Andrew was: As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and this works good to have a single artifact for users. The one part that can't be currently worked around is the DataSourceRegister trait. This is an issue because users cargo-cult configuration values, and choosing the wrong plugin version gives a particularly baroque error message that bubbles up through ServiceLoader. Currently, the class implementing DataSourceRegister must also be the class implementing the "toplevel" DataSourceV2 interface (and mixins), and these various interfaces occasionally change as the API evolves. As a practical matter, this means that there's no opportunity to decide at runtime which class to pass along to Spark. Attempting to add multiple DataSourceV2 implementations to services/META-INF causes an exception when the ServiceLoader tries to load the DataSourceRegister who implements the "different" DataSourceV2. I would like to propose a new DataSourceRegister interface which adds a level of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. (strawman) interface DataSourceRegisterV2 { public String shortName(); public Class getImplementation(); } Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource would have its search algorithm extended to look for DataSourceRegisterV2 objects, and if one is located for the given shortName, return the class object from getImplementation(). At this point, the plugin could decide based on the current runtime environment which class to prevent to Spark. There wouldn't be any changes to plugins who don't implement this API. If this is an acceptable idea, I can put together a PR for further comment. Thanks Andrew > Improve DataSourceRegister interface > > > Key: SPARK-31363 > URL: https://issues.apache.org/jira/browse/SPARK-31363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Andrew Malone Melo >Priority: Minor > > As the DSv2 API evolves, some breaking changes are occasionally made to the > API. It's possible to split a plugin into a "common" part and multiple > version-specific parts and this works good to have a single artifact for > users. The one part that can't be currently worked around is the > DataSourceRegister trait. This is an issue because users cargo-cult > configuration values, and choosing the wrong plugin version gives a > particularly baroque error message that bubbles up through ServiceLoader. > Currently, the class implementing DataSourceRegister must also be the class > implementing the "toplevel" DataSourceV2 interface (and mixins), and
[jira] [Updated] (SPARK-31363) Improve DataSourceRegister interface
[ https://issues.apache.org/jira/browse/SPARK-31363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Malone Melo updated SPARK-31363: --- Description: As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and this works good to have a single artifact for users. The one part that can't be currently worked around is the DataSourceRegister trait. This is an issue because users cargo-cult configuration values, and choosing the wrong plugin version gives a particularly baroque error message that bubbles up through ServiceLoader. Currently, the class implementing DataSourceRegister must also be the class implementing the "toplevel" DataSourceV2 interface (and mixins), and these various interfaces occasionally change as the API evolves. As a practical matter, this means that there's no opportunity to decide at runtime which class to pass along to Spark. Attempting to add multiple DataSourceV2 implementations to services/META-INF causes an exception when the ServiceLoader tries to load the DataSourceRegister who implements the "different" DataSourceV2. I would like to propose a new DataSourceRegister interface which adds a level of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. (strawman) interface DataSourceRegisterV2 { public String shortName(); public Class getImplementation(); } Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource would have its search algorithm extended to look for DataSourceRegisterV2 objects, and if one is located for the given shortName, return the class object from getImplementation(). At this point, the plugin could decide based on the current runtime environment which class to prevent to Spark. There wouldn't be any changes to plugins who don't implement this API. If this is an acceptable idea, I can put together a PR for further comment. Thanks Andrew was: As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and this works good to have a single artifact for users. The one part that can't be currently worked around is the DataSourceRegister trait. This is an issue because users cargo-cult configuration values, and choosing the wrong plugin version gives a particularly baroque error message that bubbles up through ServiceLoader. Currently, the class implementing DataSourceRegister must also be the class implementing the "toplevel" DataSourceV2 interface (and mixins), and these various interfaces occasionally change as the API evolves. As a practical matter, this means that there's no opportunity to decide at runtime which class to pass along to Spark. Attempting to add multiple DataSourceV2 implementations to services/META-INF causes an exception when the ServiceLoader tries to load the DataSourceRegister who implements the "different" DataSourceV2. I would like to propose a new DataSourceRegister interface which adds a level of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. (strawman) {{interface DataSourceRegisterV2 { public String shortName(); public Class getImplementation(); }}} Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource would have its search algorithm extended to look for DataSourceRegisterV2 objects, and if one is located for the given shortName, return the class object from getImplementation(). At this point, the plugin could decide based on the current runtime environment which class to prevent to Spark. There wouldn't be any changes to plugins who don't implement this API. If this is an acceptable idea, I can put together a PR for further comment. Thanks Andrew > Improve DataSourceRegister interface > > > Key: SPARK-31363 > URL: https://issues.apache.org/jira/browse/SPARK-31363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Andrew Malone Melo >Priority: Minor > > As the DSv2 API evolves, some breaking changes are occasionally made to the > API. It's possible to split a plugin into a "common" part and multiple > version-specific parts and this works good to have a single artifact for > users. The one part that can't be currently worked around is the > DataSourceRegister trait. This is an issue because users cargo-cult > configuration values, and choosing the wrong plugin version gives a > particularly baroque error message that bubbles up through ServiceLoader. > Currently, the class implementing DataSourceRegister must also be the class > implementing the "toplevel" DataSourceV2 interface (and mixins), and these > vario