.
Thanks,
Cheng Su
From: Jungtaek Lim
Date: Tuesday, September 15, 2020 at 5:04 PM
To: Joseph Torres
Cc: Sean Owen , dev
Subject: Re: [DISCUSS] Time to evaluate "continuous mode" in SS?
Yeah I realized there's a proposal for push-based shuffle, and I agree that may
unblock the architect
Yeah I realized there's a proposal for push-based shuffle, and I agree that
may unblock the architectural issue on true-streaming. (The root concern of
the continuous mode has been that it doesn't fit with the architecture of
Spark, and probably push-based shuffle could persuade me.)
I guess
Hi Joseph,
Would be interested in discussing your thoughts for how push-based shuffle
could help with continuous mode in SS.
We have discussed internally at LinkedIn with our Samza peers as well as
with Alibaba Flink team for applicability of push-based shuffle on streaming
engines, especially
It's worth noting that the push-based shuffle SPIP currently in progress
addresses a substantial blocker in the area. If you remember when we
removed the half-finished stateful query support, the lack of that
functionality and the challenge of implementing it is basically why it was
half-finished.
I think we certainly can't remove it without deprecation and a few
releases. If there were big problems with it that weren't getting
fixed, sure maybe, but lack of interest in reviewing minor changes
isn't necessarily a bad sign. By the same logic you'd delete graphx
long ago.
Anecdotally, yes
Probably it would depend on the meaning of "experimental". My understanding
of "experimental" is more likely "incubation", which may be graduated
finally, or may be retired.
To be clear, I'm evaluating the continuous mode as "candidate to retire",
unless there are actual use cases in production
If you're suggesting making it un-Experimental, probably yes, as it is
de facto not going to change much I expect.
If you're saying remove it, probably not? I don't see that it's
anywhere near deprecated, and not sure it's unmaintained - obviously
tests etc still have to keep passing.
On Mon, Sep
Hi Jungtaek,
All I see at the moment is that most of the users choose Flink over Spark
when continues processing is needed.
Unless there is a revolution in this area there is no point to keep
maintenance. 2.5 years is lot in bigdata industry.
If there will be efforts in this area then happy to
Hi devs,
It was Spark 2.3 in Feb 2018 which introduced continuous mode in Structured
Streaming as "experimental".
Now we are here at 2.5 years after its release - I feel it would be a good
time to evaluate the mode, whether the mode has been widely used or not,
and the mode has been making