Hi,
I'm in a somewhat similar situation. Here's what I do (it seems to be
working so far):
1. Stream in the JSON as a plain string.
2. Feed this string into a JSON library to validate it (I use Circe).
3. Using the same library, parse the JSON and extract fields X, Y and Z.
4. Create a dataset
Hi Lian,
"What have you tried?" would be a good starting point. Any help on this?
How do you read the JSONs? readStream.json? You could use readStream.text
followed by filter to include/exclude good/bad JSONs.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL
Hi,
We have a structured streaming job that converting json into parquets. We
want to validate the json records. If a json record is not valid, we want
to log a message and refuse to write it into the parquet. Also the json has
nesting jsons and we want to flatten the nesting jsons into other