Ryan Blue created SPARK-23418: --------------------------------- Summary: DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema Key: SPARK-23418 URL: https://issues.apache.org/jira/browse/SPARK-23418 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Ryan Blue
DataSourceV2 currently does not reject user-specified schemas when a source does not implement ReadSupportWithSchema. This is confusing behavior. Here's a quote from a discussion on SPARK-23203: {quote}I think this will cause confusion when source schemas change. Also, I can't think of a situation where it is a good idea to pass a schema that is ignored. Here's an example of how this will be confusing: think of a job that supplies a schema identical to the table's schema and runs fine, so it goes into production. What happens when the table's schema changes? If someone adds a column to the table, then the job will start failing and report that the source doesn't support user-supplied schemas, even though it had previously worked just fine with a user-supplied schema. In addition, the change to the table is actually compatible with the old job because the new column will be removed by a projection. To fix this situation, it may be tempting to use the user-supplied schema as an initial projection. But that doesn't make sense because we don't need two projection mechanisms. If we used this as a second way to project, it would be confusing that you can't actually leave out columns (at least for CSV) and it would be odd that using this path you can coerce types, which should usually be done by Spark. I think it is best not to allow a user-supplied schema when it isn't supported by a source. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org