Hi I have a couple of datasets where schema keep on changing and I store it as parquet files. Now I use mergeSchema option while loading these different schema parquet files in a DataFrame and it works all fine. Now I have a requirement of maintaining difference between schema over time basically maintaining list of columns which are latest. Please guide if anybody has done similar work or in general best practices to maintain changes of columns over time. Thanks in advance.
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org