I read through the spark structured streaming documentation and I wonder
how does spark structured streaming determine an event has arrived late?
Does it compare the event-time with the processing time?

[image: enter image description here] <https://i.stack.imgur.com/CXH4i.png>

Taking the above picture as an example Is the bold right arrow line "Time"
represent processing time? If so

1) where does this processing time come from? since its streaming Is it
assuming someone is likely using an upstream source that has processing
timestamp in it or spark adds a processing timestamp field? For example,
when reading messages from Kafka we do something like

Dataset<Row> kafkadf = spark.readStream().forma("kafka").load()

This dataframe has timestamp column by default which I am assuming is the
processing time. correct? If so, Does Kafka or Spark add this timestamp?

2) I can see there is a time comparison between bold right arrow line and
time in the message. And is that how spark determines an event is late?

Reply via email to