Best practices to keep multiple version of schema in Spark

unk1102 Mon, 30 Apr 2018 11:48:54 -0700

Hi I have a couple of datasets where schema keep on changing and I store it
as parquet files. Now I use mergeSchema option while loading these different
schema parquet files in a DataFrame and it works all fine. Now I have a
requirement of maintaining difference between schema over time basically
maintaining list of columns which are latest. Please guide if anybody has
done similar work or in general best practices to maintain changes of
columns over time. Thanks in advance.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Best practices to keep multiple version of schema in Spark

Reply via email to