Hi All, I have 10 million records in my Kafka and I am just trying to spark.sql(select count(*) from kafka_view). I am reading from kafka and writing to kafka.
My writeStream is set to "update" mode and trigger interval of one second ( Trigger.ProcessingTime(1000)). I expect the counts to be printed every second but looks like it would print after going through all 10M. why? Also, it seems to take forever whereas Linux wc of 10M rows would take 30 seconds. Thanks!