I read through the spark structured streaming documentation and I wonder how does spark structured streaming determine an event has arrived late? Does it compare the event-time with the processing time?
[image: enter image description here] <https://i.stack.imgur.com/CXH4i.png> Taking the above picture as an example Is the bold right arrow line "Time" represent processing time? If so 1) where does this processing time come from? since its streaming Is it assuming someone is likely using an upstream source that has processing timestamp in it or spark adds a processing timestamp field? For example, when reading messages from Kafka we do something like Dataset<Row> kafkadf = spark.readStream().forma("kafka").load() This dataframe has timestamp column by default which I am assuming is the processing time. correct? If so, Does Kafka or Spark add this timestamp? 2) I can see there is a time comparison between bold right arrow line and time in the message. And is that how spark determines an event is late?