Thanks for any help! On Mon, Apr 23, 2018 at 11:46 AM, Lian Jiang <jiangok2...@gmail.com> wrote:
> Hi, > > I am using structured spark streaming which reads jsonl files and writes > into parquet files. I am wondering what's the process if jsonl files schema > change. > > Suppose jsonl files are generated in \jsonl folder and the old schema is { > "field1": String}. My proposal is: > > 1. write the jsonl files with new schema (e.g. {"field1":String, > "field2":Int}) into another folder \jsonl2 > 2. let spark job complete handling all data in \jsonl, then stop the spark > streaming job. > 3. use a spark script to convert the parquet files from old schema to new > schema (e.g. add a new column with some default value for "field2"). > 4. upgrade and start the spark streaming job for handling the new schema > jsonl files and parquet files. > > Is this process correct (best)? Thanks for any clue. >