Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning

2018-05-22 Thread Bowden, Chris
You can delete the write ahead log directory you provided to the sink via the “checkpointLocation” option. From: karthikjay Sent: Tuesday, May 22, 2018 7:24:45 AM To: user@spark.apache.org Subject: [structured-streaming]How to reset Kafka

Re: [Structured Streaming] [Kafka] How to repartition the data and distribute the processing among worker nodes

2018-04-20 Thread Bowden, Chris
The primary role of a sink is storing output tuples. Consider groupByKey and map/flatMapGroupsWithState instead. -Chris From: karthikjay Sent: Friday, April 20, 2018 4:49:49 PM To: user@spark.apache.org Subject: [Structured Streaming]

Re: Writing record once after the watermarking interval in Spark Structured Streaming

2018-03-29 Thread Bowden, Chris
The watermark is just a user-provided indicator to spark that it's ok to drop internal state after some period of time. The watermark "interval" doesn't directly dictate whether hot rows are sent to a sink. Think of a hot row as data younger than the watermark. However, the watermark will

Re: Structured Streaming Spark 2.3 Query

2018-03-23 Thread Bowden, Chris
Use a streaming query listener that tracks repetitive progress events for the same batch id. If x amount of time has elapsed given repetitive progress events for the same batch id, the source is not providing new offsets and stream execution is not scheduling new micro batches. See also:

Re: Structured Streaming - Kafka

2017-03-07 Thread Bowden, Chris
https://issues.apache.org/jira/browse/SPARK-19853, pr by eow From: Shixiong(Ryan) Zhu <shixi...@databricks.com> Sent: Tuesday, March 7, 2017 2:04:45 PM To: Bowden, Chris Cc: user; Gudenkauf, Jack Subject: Re: Structured Streaming - Kafka Good catch. Cou

Structured Streaming - Kafka

2017-03-07 Thread Bowden, Chris
Potential bug when using startingOffsets = SpecificOffsets with Kafka topics containing uppercase characters? KafkaSourceProvider#L80/86: val startingOffsets = caseInsensitiveParams.get(STARTING_OFFSETS_OPTION_KEY).map(_.trim.toLowerCase) match { case Some("latest") => LatestOffsets

Catalyst Expression(s) - Cleanup

2017-01-25 Thread Bowden, Chris
Is there currently any way to receive a signal when an Expression will no longer receive any rows so internal resources can be cleaned up? I have seen Generators are allowed to terminate() but my Expression(s) do not need to emit 0..N rows.

FunctionRegistry

2017-01-20 Thread Bowden, Chris
Thoughts on exposing FunctionRegistry via ExperimentalMethods? I have functionality which can not be expressed efficiently via UDFs, consequently I implement my own Expressions. Currently I have to lift access to FunctionRegistry in my project(s) within org.apache.spark.sql.*. I also have to