[pyspark] structured streaming deployment & monitoring recommendation

2018-02-12 Thread Bram
Hi, I have questions regarding spark structured streaming deployment recommendation I have +- 100 kafka topics that can be processed using similar code block. I am using pyspark 2.2.1 Here is the idea: TOPIC_LIST = ["topic_a","topic_b"."topic_c"] stream = {} for t in TOPIC_LIST:

[Structured Streaming] Deserializing avro messages from kafka source using schema registry

2018-02-09 Thread Bram
Hi, I couldn't find any documentation about avro message deserialization using pyspark structured streaming. My aim is using confluent schema registry to get per topic schema then parse the avro messages with it. I found one but it was using DirectStream approach