Hi Kant, > > > 1. Can we use Spark Structured Streaming for stateless transformations > just like we would do with DStreams or Spark Structured Streaming is only > meant for stateful computations? >
Of course you can do stateless transformations. Any map, filter, select, type of transformation is stateless. Aggregations are generally stateful. You could also perform arbitrary stateless aggregations with "flatMapGroups <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala#L145>" or make them stateful with "flatMapGroupsWithState <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala#L376> ". > 2. When we use groupBy and Window operations for event time processing and > specify a watermark does this mean the timestamp field in each message is > compared to the processing time of that machine/node and discard that > events that are late than the specified threshold? If we don't specify a > watermark I am assuming the processing time wont come into the picture. is > that right? Just trying to understand the interplay between processing time > and even time when we do even time processing. > > Watermarks are tracked with respect to the event time of your data, not the processing time of the machine. Please take a look at the blog below for more details https://databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html Best, Burak