+1 to improve the widely used micro-batch mode first.

On Thu, Dec 1, 2022 at 8:49 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> +1
>
> On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu <zsxw...@gmail.com> wrote:
>
>> +1
>>
>> This is exciting. I agree with Jerry that this SPIP and continuous
>> processing are orthogonal. This SPIP itself would be a great improvement
>> and impact most Structured Streaming users.
>>
>> Best Regards,
>> Shixiong
>>
>>
>> On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>>
>>> Thanks for all the clarifications and details Jerry, Jungtaek :-)
>>> This looks like an exciting improvement to Structured Streaming -
>>> looking forward to it becoming part of Apache Spark !
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng <jerry.boyang.p...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I will add my two cents.  Improving the Microbatch execution engine
>>>> does not prevent us from working/improving on the continuous execution
>>>> engine in the future.  These are orthogonal issues.  This new mode I am
>>>> proposing in the microbatch execution engine intends to lower latency of
>>>> this execution engine that most people use today.  We can view it as an
>>>> incremental improvement on the existing engine. I see the continuous
>>>> execution engine as a partially completed re-write of spark streaming and
>>>> may serve as the "future" engine powering Spark Streaming.   Improving the
>>>> "current" engine does not mean we cannot work on a "future" engine.  These
>>>> two are not mutually exclusive. I would like to focus the discussion on the
>>>> merits of this feature in regards to the current micro-batch execution
>>>> engine and not a discussion on the future of continuous execution engine.
>>>>
>>>> Best,
>>>>
>>>> Jerry
>>>>
>>>>
>>>> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> Hi Mridul,
>>>>>
>>>>> I'd like to make clear to avoid any misunderstanding - the decision
>>>>> was not led by me. (I'm just a one of engineers in the team. Not even TL.)
>>>>> As you see the direction, there was an internal consensus to not revisit
>>>>> the continuous mode. There are various reasons, which I think we know
>>>>> already. You seem to remember I have raised concerns about continuous 
>>>>> mode,
>>>>> but have you indicated that it was even over 2 years ago? I still see no
>>>>> traction around the project. The main reason I abandoned the discussion 
>>>>> was
>>>>> due to promising effort on integrating push based shuffle into continuous
>>>>> mode to achieve shuffle, but no effort has been made so far.
>>>>>
>>>>> The goal of this SPIP is to have an alternative approach dealing with
>>>>> same workload, given that we no longer have confidence of success of
>>>>> continuous mode. But I also want to make clear that deprecating and
>>>>> eventually retiring continuous mode is not a goal of this project. If that
>>>>> happens eventually, that would be a side-effect. Someone may have concerns
>>>>> that we have two different projects aiming for similar thing, but I'd
>>>>> rather see both projects having competition. If anyone willing to improve
>>>>> continuous mode can start making the effort right now. This SPIP does not
>>>>> block it.
>>>>>
>>>>>
>>>>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan <mri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi Jungtaek,
>>>>>>
>>>>>>   Given the goal of the SPIP is reducing latency for stateless apps,
>>>>>> and should reasonably fit continuous mode design goals, it feels odd to 
>>>>>> not
>>>>>> support it fin the proposal.
>>>>>>
>>>>>> I know you have raised concerns about continuous mode in past as well
>>>>>> in dev@ list, and we are further ignoring it in this proposal (and
>>>>>> possibly other enhancements in past few releases).
>>>>>>
>>>>>> Do you want to revisit the discussion to support it and propose a
>>>>>> vote on that ? And move it to deprecated ?
>>>>>>
>>>>>> I am much more comfortable not supporting this SPIP for CM if it was
>>>>>> deprecated.
>>>>>>
>>>>>> Thoughts ?
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <
>>>>>> jerry.boyang.p...@gmail.com> wrote:
>>>>>>
>>>>>>> Jungtaek,
>>>>>>>
>>>>>>> Thanks for taking up the role to shepard this SPIP!  Thank you for
>>>>>>> also chiming in on your thoughts concerning the continuous mode!
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Jerry
>>>>>>>
>>>>>>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Just FYI, I'm shepherding this SPIP project.
>>>>>>>>
>>>>>>>> I think the major meta question would be, "why don't we spend
>>>>>>>> effort on continuous mode rather than initiating another feature 
>>>>>>>> aiming for
>>>>>>>> the same workload?". Jerry already updated the doc to answer the 
>>>>>>>> question,
>>>>>>>> but I can also share my thoughts about it.
>>>>>>>>
>>>>>>>> I feel like the current "continuous mode" is a niche solution.
>>>>>>>> (It's not to blame. If you have to deal with such workload but can't
>>>>>>>> rewrite the underlying engine from scratch, then there are really few
>>>>>>>> options.)
>>>>>>>> Since the implementation went with a workaround to implement which
>>>>>>>> the architecture does not support natively e.g. distributed snapshot, 
>>>>>>>> it
>>>>>>>> gets quite tricky on maintaining and expanding the project. It also
>>>>>>>> requires 3rd parties to implement a separate source and sink
>>>>>>>> implementation, which I'm not sure how many 3rd parties actually 
>>>>>>>> followed
>>>>>>>> so far.
>>>>>>>>
>>>>>>>> Eventually, "continuous mode" becomes an area no one in the active
>>>>>>>> community knows the details and has willingness to maintain. I 
>>>>>>>> wouldn't say
>>>>>>>> we are confident to remove the tag on "experimental", although the 
>>>>>>>> feature
>>>>>>>> has been shipped for years. It was introduced in Spark 2.3, surprising
>>>>>>>> enough?
>>>>>>>>
>>>>>>>> We went back and thought about the approach from scratch. Jerry
>>>>>>>> came up with the idea which leverages existing microbatch execution, 
>>>>>>>> hence
>>>>>>>> relatively stable and no need to require 3rd parties to support another
>>>>>>>> mode. It adds complexity against microbatch execution but it's a lot 
>>>>>>>> less
>>>>>>>> complicated compared to the existing continuous mode. Definitely quite 
>>>>>>>> less
>>>>>>>> than creating a new record-to-record engine from scratch.
>>>>>>>>
>>>>>>>> That said, we want to propose and move forward with the new
>>>>>>>> approach.
>>>>>>>>
>>>>>>>> ps. Eventually we could probably discuss retiring continuous mode
>>>>>>>> if the new approach gets accepted and eventually considered as a 
>>>>>>>> stable one
>>>>>>>> after several minor releases. That's just me.
>>>>>>>>
>>>>>>>> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng <
>>>>>>>> jerry.boyang.p...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I would like to start the discussion for a SPIP, Asynchronous
>>>>>>>>> Offset Management in Structured Streaming.  The high level summary of 
>>>>>>>>> the
>>>>>>>>> SPIP is that currently in Structured Streaming we perform a couple of
>>>>>>>>> offset management operations for progress tracking purposes 
>>>>>>>>> synchronously
>>>>>>>>> on the critical path which can contribute significantly to processing
>>>>>>>>> latency.  If we were to make these operations asynchronous and less
>>>>>>>>> frequent we can dramatically improve latency for certain types of
>>>>>>>>> workloads.
>>>>>>>>>
>>>>>>>>> I have put together a SPIP to implement such a mechanism.  Please
>>>>>>>>> take a look!
>>>>>>>>>
>>>>>>>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591
>>>>>>>>>
>>>>>>>>> SPIP doc:
>>>>>>>>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Jerry
>>>>>>>>>
>>>>>>>>

Reply via email to