You can delete the write ahead log directory you provided to the sink via the
“checkpointLocation” option.
From: karthikjay
Sent: Tuesday, May 22, 2018 7:24:45 AM
To: user@spark.apache.org
Subject: [structured-streaming]How to reset Kafka
The primary role of a sink is storing output tuples. Consider groupByKey and
map/flatMapGroupsWithState instead.
-Chris
From: karthikjay
Sent: Friday, April 20, 2018 4:49:49 PM
To: user@spark.apache.org
Subject: [Structured Streaming]
The watermark is just a user-provided indicator to spark that it's ok to drop
internal state after some period of time. The watermark "interval" doesn't
directly dictate whether hot rows are sent to a sink. Think of a hot row as
data younger than the watermark. However, the watermark will
Use a streaming query listener that tracks repetitive progress events for the
same batch id. If x amount of time has elapsed given repetitive progress events
for the same batch id, the source is not providing new offsets and stream
execution is not scheduling new micro batches. See also:
https://issues.apache.org/jira/browse/SPARK-19853, pr by eow
From: Shixiong(Ryan) Zhu <shixi...@databricks.com>
Sent: Tuesday, March 7, 2017 2:04:45 PM
To: Bowden, Chris
Cc: user; Gudenkauf, Jack
Subject: Re: Structured Streaming - Kafka
Good catch. Cou
Potential bug when using startingOffsets = SpecificOffsets with Kafka topics
containing uppercase characters?
KafkaSourceProvider#L80/86:
val startingOffsets =
caseInsensitiveParams.get(STARTING_OFFSETS_OPTION_KEY).map(_.trim.toLowerCase)
match {
case Some("latest") => LatestOffsets
Is there currently any way to receive a signal when an Expression will no
longer receive any rows so internal resources can be cleaned up?
I have seen Generators are allowed to terminate() but my Expression(s) do not
need to emit 0..N rows.
Thoughts on exposing FunctionRegistry via ExperimentalMethods?
I have functionality which can not be expressed efficiently via UDFs,
consequently I implement my own Expressions. Currently I have to lift access to
FunctionRegistry in my project(s) within org.apache.spark.sql.*. I also have to