Hi Jungtaek, Given the goal of the SPIP is reducing latency for stateless apps, and should reasonably fit continuous mode design goals, it feels odd to not support it fin the proposal.
I know you have raised concerns about continuous mode in past as well in dev@ list, and we are further ignoring it in this proposal (and possibly other enhancements in past few releases). Do you want to revisit the discussion to support it and propose a vote on that ? And move it to deprecated ? I am much more comfortable not supporting this SPIP for CM if it was deprecated. Thoughts ? Regards, Mridul On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <jerry.boyang.p...@gmail.com> wrote: > Jungtaek, > > Thanks for taking up the role to shepard this SPIP! Thank you for also > chiming in on your thoughts concerning the continuous mode! > > Best, > > Jerry > > On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Just FYI, I'm shepherding this SPIP project. >> >> I think the major meta question would be, "why don't we spend effort on >> continuous mode rather than initiating another feature aiming for the >> same workload?". Jerry already updated the doc to answer the question, but >> I can also share my thoughts about it. >> >> I feel like the current "continuous mode" is a niche solution. (It's not >> to blame. If you have to deal with such workload but can't rewrite the >> underlying engine from scratch, then there are really few options.) >> Since the implementation went with a workaround to implement which the >> architecture does not support natively e.g. distributed snapshot, it gets >> quite tricky on maintaining and expanding the project. It also requires 3rd >> parties to implement a separate source and sink implementation, which I'm >> not sure how many 3rd parties actually followed so far. >> >> Eventually, "continuous mode" becomes an area no one in the active >> community knows the details and has willingness to maintain. I wouldn't say >> we are confident to remove the tag on "experimental", although the feature >> has been shipped for years. It was introduced in Spark 2.3, surprising >> enough? >> >> We went back and thought about the approach from scratch. Jerry came up >> with the idea which leverages existing microbatch execution, hence >> relatively stable and no need to require 3rd parties to support another >> mode. It adds complexity against microbatch execution but it's a lot less >> complicated compared to the existing continuous mode. Definitely quite less >> than creating a new record-to-record engine from scratch. >> >> That said, we want to propose and move forward with the new approach. >> >> ps. Eventually we could probably discuss retiring continuous mode if the >> new approach gets accepted and eventually considered as a stable one after >> several minor releases. That's just me. >> >> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng <jerry.boyang.p...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I would like to start the discussion for a SPIP, Asynchronous Offset >>> Management in Structured Streaming. The high level summary of the SPIP is >>> that currently in Structured Streaming we perform a couple of offset >>> management operations for progress tracking purposes synchronously on the >>> critical path which can contribute significantly to processing latency. If >>> we were to make these operations asynchronous and less frequent we can >>> dramatically improve latency for certain types of workloads. >>> >>> I have put together a SPIP to implement such a mechanism. Please take a >>> look! >>> >>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591 >>> >>> SPIP doc: >>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing >>> >>> >>> Best, >>> >>> Jerry >>> >>