Hi Jungtaek,

  Given the goal of the SPIP is reducing latency for stateless apps, and
should reasonably fit continuous mode design goals, it feels odd to not
support it fin the proposal.

I know you have raised concerns about continuous mode in past as well in
dev@ list, and we are further ignoring it in this proposal (and possibly
other enhancements in past few releases).

Do you want to revisit the discussion to support it and propose a vote on
that ? And move it to deprecated ?

I am much more comfortable not supporting this SPIP for CM if it was
deprecated.

Thoughts ?

Regards,
Mridul




On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <jerry.boyang.p...@gmail.com>
wrote:

> Jungtaek,
>
> Thanks for taking up the role to shepard this SPIP!  Thank you for also
> chiming in on your thoughts concerning the continuous mode!
>
> Best,
>
> Jerry
>
> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Just FYI, I'm shepherding this SPIP project.
>>
>> I think the major meta question would be, "why don't we spend effort on
>> continuous mode rather than initiating another feature aiming for the
>> same workload?". Jerry already updated the doc to answer the question, but
>> I can also share my thoughts about it.
>>
>> I feel like the current "continuous mode" is a niche solution. (It's not
>> to blame. If you have to deal with such workload but can't rewrite the
>> underlying engine from scratch, then there are really few options.)
>> Since the implementation went with a workaround to implement which the
>> architecture does not support natively e.g. distributed snapshot, it gets
>> quite tricky on maintaining and expanding the project. It also requires 3rd
>> parties to implement a separate source and sink implementation, which I'm
>> not sure how many 3rd parties actually followed so far.
>>
>> Eventually, "continuous mode" becomes an area no one in the active
>> community knows the details and has willingness to maintain. I wouldn't say
>> we are confident to remove the tag on "experimental", although the feature
>> has been shipped for years. It was introduced in Spark 2.3, surprising
>> enough?
>>
>> We went back and thought about the approach from scratch. Jerry came up
>> with the idea which leverages existing microbatch execution, hence
>> relatively stable and no need to require 3rd parties to support another
>> mode. It adds complexity against microbatch execution but it's a lot less
>> complicated compared to the existing continuous mode. Definitely quite less
>> than creating a new record-to-record engine from scratch.
>>
>> That said, we want to propose and move forward with the new approach.
>>
>> ps. Eventually we could probably discuss retiring continuous mode if the
>> new approach gets accepted and eventually considered as a stable one after
>> several minor releases. That's just me.
>>
>> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng <jerry.boyang.p...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I would like to start the discussion for a SPIP, Asynchronous Offset
>>> Management in Structured Streaming.  The high level summary of the SPIP is
>>> that currently in Structured Streaming we perform a couple of offset
>>> management operations for progress tracking purposes synchronously on the
>>> critical path which can contribute significantly to processing latency.  If
>>> we were to make these operations asynchronous and less frequent we can
>>> dramatically improve latency for certain types of workloads.
>>>
>>> I have put together a SPIP to implement such a mechanism.  Please take a
>>> look!
>>>
>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591
>>>
>>> SPIP doc:
>>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing
>>>
>>>
>>> Best,
>>>
>>> Jerry
>>>
>>

Reply via email to