Isn't this related to the data format used, i.e. parquet, Avro, ... which already support changing schema?
Dongjoon Hyun <[email protected]> schrieb am Fr., 12. Jan. 2018 um 02:30 Uhr: > Hi, All. > > A data schema can evolve in several ways and Apache Spark 2.3 already > supports the followings for file-based data sources like > CSV/JSON/ORC/Parquet. > > 1. Add a column > 2. Remove a column > 3. Change a column position > 4. Change a column type > > Can we guarantee users some schema evolution coverage on file-based data > sources by adding schema evolution test suites explicitly? So far, there > are some test cases. > > For simplicity, I have several assumptions on schema evolution. > > 1. A safe evolution without data loss. > - e.g. from small types to larger types like int-to-long, not vice > versa. > 2. Final schema is given by users (or Hive) > 3. Simple Spark data types supported by Spark vectorized execution. > > I made a test case PR to receive your opinions for this. > > [SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based > data sources > - https://github.com/apache/spark/pull/20208 > > Could you take a look and give some opinions? > > Bests, > Dongjoon. >
