Re: Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-14 Thread Suket Arora
// Continuous trigger with one-second checkpointing intervaldf.writeStream .format("console") .trigger(Trigger.Continuous("1 second")) .start() On Tue, 14 May 2019 at 22:10, suket arora wrote: > Hey Austin, > > If you truly want to process as a stream, use continuous streaming in >

Re: Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-13 Thread Gabor Somogyi
> Where exactly would I see the start/end offset values per batch, is that in the spark logs? Yes, it's in the Spark logs. Please see this: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reading-metrics-interactively On Mon, May 13, 2019 at 10:53 AM Austin

Re: Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-13 Thread Austin Weaver
Hi Akshay, Thanks very much for the reply! 1) The topics have 12 partitions (both input and output) 2-3) I read that "trigger" is used for microbatching, but it you would like the stream to truly process as a "stream" as quickly as possible, then to leave this opted out? In any case, I am using

Re: Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-08 Thread Akshay Bhardwaj
Hi Austin, A few questions: 1. What is the partition of the kafka topic that used for input and output data? 2. In the write stream, I will recommend to use "trigger" with a defined interval, if you prefer micro-batching strategy, 3. along with defining "maxOffsetsPerTrigger" in

Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-07 Thread Austin Weaver
Hey Spark Experts, After listening to some of you, and the presentations at Spark Summit in SF, I am transitioning from d-streams to structured streaming however I am seeing some weird results. My use case is as follows: I am reading in a stream from a kafka topic, transforming a message, and