Hi all
it took me some time to get the issues extracted into a piece of standalone
code. I created the following gist
https://gist.github.com/jammann/b58bfbe0f4374b89ecea63c1e32c8f17
I has messages for 4 topics A/B/C/D and a simple Python program which shows 6
use cases, with my expectations
a same
> case, but should be good to know once you're dealing with
> streaming-streaming join.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-26154
>
> On Tue, Jun 4, 2019 at 9:31 PM Joe Ammann wrote:
>
> > Hi all
>
Hi all
sorry, tl;dr
I'm on my first Python Spark structured streaming app, in the end joining
messages from ~10 different Kafka topics. I've recently upgraded to Spark
2.4.3, which has resolved all my issues with the time handling (watermarks,
join windows) I had before with Spark 2.3.2.
My
Hi all
I'm currently developing a Spark structured streaming application which
joins/aggregates messages from ~7 Kafka topics and produces messages onto
another Kafka topic.
Quite often in my development cycle, I want to "reprocess from scratch": I stop
the program, delete the target topic
t;sharedId", window("20 seconds", "10 seconds")
>
> // ProcessingTime trigger with two-seconds micro-batch interval
>
> |df.writeStream .format("console") .trigger(Trigger.ProcessingTime("2
> seconds")) .start()|
>
>
ress, if
no new data is coming in. This is how I came to my suspsicion on how it works
internally.
I understand that it is quite uncommon to have such "slowly moving topics", but
unfortunately in my use case I have them.
> On Tue, May 14, 2019 at 3:49 PM Joe Ammann mailto:j...@
Hi Suket
Sorry, this was a typo in the pseudo-code I sent. Of course that what you
suggested (using the same eventtime attribute for the watermark and the window)
is what my code does in reality. Sorry, to confuse people.
On 5/14/19 4:14 PM, suket arora wrote:
> Hi Joe,
> As per the spark
Hi all
I'm fairly new to Spark structured streaming and I'm only starting to develop
an understanding for the watermark handling.
Our application reads data from a Kafka input topic and as one of the first
steps, it has to group incoming messages. Those messages come in bulks, e.g. 5
messages
Hi Yuanjian
On 5/7/19 4:55 AM, Yuanjian Li wrote:
> Hi Joe
>
> I think you met this issue: https://issues.apache.org/jira/browse/SPARK-27340
> You can check the description in Jira and PR. We also met this in our
> production env and fixed by the providing PR.
>
> The PR is still in review. cc
hen I nest the attributes and use "entityX.LAST_MODIFICATION" (notice
the dot for the nesting) the joins fail.
I have a feeling that the Spark execution plan get's somewhat confused, because
in the latter case, there are multiple fields called "LAST_MODIFICATION" with
differing nes
Hi all
I'm pretty new to Spark and implementing my first non-trivial structured
streaming job with outer joins. My environment is a Hortonworks HDP 3.1 cluster
with Spark 2.3.2, working with Python.
I understood that I need to provide watermarks and join conditions for left
outer joins to
11 matches
Mail list logo