Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming On Wed, Dec 26, 2018 at 2:42 PM Colin Williams wrote: > > From my initial impression it looks like I'd need to create my own > `from_json` using `jsonToStructs` as a

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
>From my initial impression it looks like I'd need to create my own `from_json` using `jsonToStructs` as a reference but try to handle ` case : BadRecordException => null ` or similar to try to write the non matching string to a corrupt records column On Wed, Dec 26, 2018 at 1:55 PM Colin

Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
Hi, I'm trying to figure out how I can write records that don't match a json read schema via spark structred streaming to an output sink / parquet location. Previously I did this in batch via corrupt column features of batch. But in this spark structured streaming I'm reading from kafka a string

Re: How to clean up logs-dirs and local-dirs of running spark streaming in yarn cluster mode

2018-12-26 Thread shyla deshpande
Hi Fawze, Thank you for the link. But that is exactly what I am doing. I think this is related to yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting. When the disk utilization exceeds this setting, the node is marked unhealthy. Other than increasing the default

Re: Packaging kafka certificates in uber jar

2018-12-26 Thread Colin Williams
Hi thanks. This is part of the solution I found after writing the question. The other part being is that I needed to write the input stream to a temporary file. I would prefer not to write any temporary file but the ssl.keystore.location properties seems to expect a file path. On Tue, Dec 25,