Thanks for any help!

On Mon, Apr 23, 2018 at 11:46 AM, Lian Jiang <jiangok2...@gmail.com> wrote:

> Hi,
>
> I am using structured spark streaming which reads jsonl files and writes
> into parquet files. I am wondering what's the process if jsonl files schema
> change.
>
> Suppose jsonl files are generated in \jsonl folder and the old schema is {
> "field1": String}. My proposal is:
>
> 1. write the jsonl files with new schema (e.g. {"field1":String,
> "field2":Int}) into another folder \jsonl2
> 2. let spark job complete handling all data in \jsonl, then stop the spark
> streaming job.
> 3. use a spark script to convert the parquet files from old schema to new
> schema (e.g. add a new column with some default value for "field2").
> 4. upgrade and start the spark streaming job for handling the new schema
> jsonl files and parquet files.
>
> Is this process correct (best)? Thanks for any clue.
>

Reply via email to