This is about Spark-layer test cases on **read-only** CSV, JSON, Parquet, ORC files. You can find more details and comparisons in terms of Spatk support coverage.
Bests, Dongjoon. On Thu, Jan 11, 2018 at 22:19 Georg Heiler <georg.kf.hei...@gmail.com> wrote: > Isn't this related to the data format used, i.e. parquet, Avro, ... which > already support changing schema? > > Dongjoon Hyun <dongjoon.h...@gmail.com> schrieb am Fr., 12. Jan. 2018 um > 02:30 Uhr: > >> Hi, All. >> >> A data schema can evolve in several ways and Apache Spark 2.3 already >> supports the followings for file-based data sources like >> CSV/JSON/ORC/Parquet. >> >> 1. Add a column >> 2. Remove a column >> 3. Change a column position >> 4. Change a column type >> >> Can we guarantee users some schema evolution coverage on file-based data >> sources by adding schema evolution test suites explicitly? So far, there >> are some test cases. >> >> For simplicity, I have several assumptions on schema evolution. >> >> 1. A safe evolution without data loss. >> - e.g. from small types to larger types like int-to-long, not vice >> versa. >> 2. Final schema is given by users (or Hive) >> 3. Simple Spark data types supported by Spark vectorized execution. >> >> I made a test case PR to receive your opinions for this. >> >> [SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based >> data sources >> - https://github.com/apache/spark/pull/20208 >> >> Could you take a look and give some opinions? >> >> Bests, >> Dongjoon. >> >