Re: structured streaming handling validation and json flattening

2019-02-12 Thread Phillip Henry
Hi, I'm in a somewhat similar situation. Here's what I do (it seems to be working so far): 1. Stream in the JSON as a plain string. 2. Feed this string into a JSON library to validate it (I use Circe). 3. Using the same library, parse the JSON and extract fields X, Y and Z. 4. Create a dataset

Re: structured streaming handling validation and json flattening

2019-02-11 Thread Jacek Laskowski
Hi Lian, "What have you tried?" would be a good starting point. Any help on this? How do you read the JSONs? readStream.json? You could use readStream.text followed by filter to include/exclude good/bad JSONs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

structured streaming handling validation and json flattening

2019-02-09 Thread Lian Jiang
Hi, We have a structured streaming job that converting json into parquets. We want to validate the json records. If a json record is not valid, we want to log a message and refuse to write it into the parquet. Also the json has nesting jsons and we want to flatten the nesting jsons into other