Hi I have a couple of datasets where schema keep on changing and I store it
as parquet files. Now I use mergeSchema option while loading these different
schema parquet files in a DataFrame and it works all fine. Now I have a
requirement of maintaining difference between schema over time basically
maintaining list of columns which are latest. Please guide if anybody has
done similar work or in general best practices to maintain changes of
columns over time. Thanks in advance.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to