GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20208
[SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based data sources ## What changes were proposed in this pull request? A schema can evolve in several ways and the followings are already supported in file-based data sources. 1. Add a column 2. Remove a column 3. Change a column position 4. Change a column type This issue aims to guarantee users a backward-compatible schema evolution coverage on file-based data sources and to prevent future regressions by *adding schema evolution test suites explicitly*. Here, we consider safe evolution without data loss. For example, data type evolution should be from small types to larger types like `int`-to-`long`, not vice versa. As of today, in the master branch, file-based data sources have schema evolution coverages like the followings. File Format | Coverage | Note ----------- | ---------- | ------------------------------------------------ TEXT | N/A | Schema consists of a single string column. CSV | 1, 2, 4 | JSON | 1, 2, 3, 4 | ORC | 1, 2, 3, 4 | Native vectorized ORC reader has the widest coverage. PARQUET | 1, 2, 3 | ## How was this patch tested? Pass the jenkins with newly added test suites. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-SCHEMA-EVOLUTION Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20208 ---- commit 499801e7fdd545ac5918dd5f7a9294db2d5373be Author: Dongjoon Hyun <dongjoon@...> Date: 2018-01-07T00:02:09Z [SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based data sources ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org