Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
Thanks Michael! that works! On Tue, Mar 20, 2018 at 5:00 PM, Michael Armbrust wrote: > Those options will not affect structured streaming. You are looking for > > .option("maxOffsetsPerTrigger", "1000") > > We are working on improving this by building a generic

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread Michael Armbrust
Those options will not affect structured streaming. You are looking for .option("maxOffsetsPerTrigger", "1000") We are working on improving this by building a generic mechanism into the Streaming DataSource V2 so that the engine can do admission control on the amount of data returned in a

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
I am using spark 2.3.0 and Kafka 0.10.2.0 so I assume structured streaming using Direct API's although I am not sure? If it is direct API's the only parameters that are relevant are below according to this

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread Geoff Von Allmen
The following settings may be what you’re looking for: - spark.streaming.backpressure.enabled - spark.streaming.backpressure.initialRate - spark.streaming.receiver.maxRate -

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
Yes it indeed makes sense! Is there a way to get incremental counts when I start from 0 and go through 10M records? perhaps count for every micro batch or something? On Mon, Mar 19, 2018 at 1:57 PM, Geoff Von Allmen wrote: > Trigger does not mean report the current

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread Geoff Von Allmen
Trigger does not mean report the current solution every 'trigger seconds'. It means it will attempt to fetch new data and process it no faster than trigger seconds intervals. If you're reading from the beginning and you've got 10M entries in kafka, it's likely pulling everything down then

select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
Hi All, I have 10 million records in my Kafka and I am just trying to spark.sql(select count(*) from kafka_view). I am reading from kafka and writing to kafka. My writeStream is set to "update" mode and trigger interval of one second ( Trigger.ProcessingTime(1000)). I expect the counts to be