Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20598#discussion_r168110951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala --- @@ -62,7 +64,7 @@ case class StreamingRelation(dataSource: DataSource, sourceName: String, output: case class StreamingExecutionRelation( --- End diff -- They need to extend MultiInstance relation, because Dataset.join() forces an analysis to disambiguate left and right in self-joins ([here](https://github.com/apache/spark/blob/357babde5a8eb9710de7016d7ae82dee21fa4ef3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L914)) and when there is a self-join between two streaming Datasets (i.e. they contain StreamingRelation/StreamingRelationV2), without the MultiInstanceRelation, it throws the error (see PR description). Regarding StreamingExecutionRelation, while the other sources convert StreamingRelation to StreamingExecutionRelation, the MemoryStream directly injects StreamingExceutionRelation at that time of Dataset operations. Hence its good that StreamingExecutionRelation also extends MultiInstanceRelation.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org