[Spark SQL] Is it possible to do stream to stream inner join without event time?

Becket Qin Fri, 01 Jun 2018 03:11:10 -0700

Hi,

I am new to Spark and I'm trying to run a few queries from TPC-H using
Spark SQL.


According to the documentation here
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#inner-joins-with-optional-watermarking>,
it is OPTIONAL to have watermark defined in the case of inner join between
two streams. However, I am keeping getting the following exception:

org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark

So it looks that the watermark is mandatory. Because there is no timestamp
in the TPC-H records, I am not able to specify watermark with event time.
Is there a recommended workaround? e.g. using the process time instead fo
event time?

Thanks,

Jiangjie (Becket) Qin

[Spark SQL] Is it possible to do stream to stream inner join without event time?

Reply via email to