7 at 9:50 AM
To: "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
Cc: Rick Moritz <rah...@gmail.com>, user <user@spark.apache.org>
Subject: Re: [StructuredStreaming] multiple queries of the socket source: only
one query works.
Hi Shixiong,
Thanks for the explanation.
Hi Shixiong,
Thanks for the explanation.
In my view, this is different from the intuitive understanding of the
Structured Streaming model [1], where incoming data is appended to an
'unbounded table' and queries are run on that. I had expected that all
queries would run on that 'unbounded table
Spark creates one connection for each query. The behavior you observed is
because how "nc -lk" works. If you use `netstat` to check the tcp
connections, you will see there are two connections when starting two
queries. However, "nc" forwards the input to only one connection.
On Fri, Aug 11, 2017
Hi Gerard, hi List,
I think what this would entail is for Source.commit to change its
funcationality. You would need to track all streams' offsets there.
Especially in the socket source, you already have a cache (haven't looked
at Kafka's implementation to closely yet), so that shouldn't be the
Hi,
I've been investigating this SO question:
https://stackoverflow.com/questions/45618489/executing-separate-streaming-queries-in-spark-structured-streaming
TL;DR: when using the Socket source, trying to create multiple queries does
not work properly, only one the first query in the start order