Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-07-12 Thread Jungtaek Lim
Just submitted the patch: https://github.com/apache/spark/pull/29077

On Tue, Jun 16, 2020 at 3:40 PM Jungtaek Lim 
wrote:

> Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
> couple of days if there's no voice on the reason we should keep it.
>
> 1. https://issues.apache.org/jira/browse/SPARK-31985
>
> On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim 
> wrote:
>
>> Let me share the effect on removing the incomplete and undocumented code
>> path. I manually tried out removing the code path and here's the change.
>>
>>
>> https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8
>>
>> 1,120 lines deleted without hurting any existing streaming tests, except
>> a suite I removed as well since it tests such code path. No need to update
>> documentation as it was never publicized. This also removes some rules
>> which only apply for continuous mode, which gave exceptions on the fact
>> that "continuous mode is only available for map-like operations". Removing
>> them would be back to simplify the usage of continuous mode.
>>
>> Also worth noting that I had to manually remove the code path instead of
>> revert, because the code path has been changed to reflect DSv2 change. What
>> this means? We have to "update" the code path and concern
>> about compatibility, etc. while it never be used in production.
>>
>> On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi devs,
>>>
>>> during experiment on complete mode I realized we left some incomplete
>>> code parts on supporting aggregation for continuous mode. (shuffle &
>>> coalesce)
>>>
>>> The work had been occurred around first half of 2018 and stopped, and no
>>> work has been done for around 2 years (so I don't expect anyone is working
>>> on this). The functionality is undocumented (as the work was only done
>>> partially) and continuous mode is experimental so I don't feel risks to get
>>> rid of the part.
>>>
>>> What do you think? If it makes sense then I'll raise a PR to get rid of
>>> the incomplete codes.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. I'm kind of feeling the same with continuous mode itself - it has
>>> been still experimental over the 2 years, and while it lacks lots of
>>> essential things (backpressure, epoch synchronization, query progress,
>>> etc.) there's no additional major work over 1 year. We eventually have been
>>> excluded continuous mode for the new streaming feature like observable
>>> metric, streaming UI, etc. because it is on top of the feature
>>> which continuous mode lacks.
>>>
>>> Unlike incomplete code path, I'm not strongly against this, as it
>>> has been documented and someone might use the feature in production. I
>>> still think it's ideal to retire the feature smoothly.
>>> (Please chime in with the use case if someone has the production cases.)
>>>
>>> ps2. It feels like "feature branch" looks to be a thing to consider for
>>> efforts on one big feature.
>>>
>>


Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-06-15 Thread Jungtaek Lim
Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
couple of days if there's no voice on the reason we should keep it.

1. https://issues.apache.org/jira/browse/SPARK-31985

On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim 
wrote:

> Let me share the effect on removing the incomplete and undocumented code
> path. I manually tried out removing the code path and here's the change.
>
>
> https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8
>
> 1,120 lines deleted without hurting any existing streaming tests, except a
> suite I removed as well since it tests such code path. No need to update
> documentation as it was never publicized. This also removes some rules
> which only apply for continuous mode, which gave exceptions on the fact
> that "continuous mode is only available for map-like operations". Removing
> them would be back to simplify the usage of continuous mode.
>
> Also worth noting that I had to manually remove the code path instead of
> revert, because the code path has been changed to reflect DSv2 change. What
> this means? We have to "update" the code path and concern
> about compatibility, etc. while it never be used in production.
>
> On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim 
> wrote:
>
>> Hi devs,
>>
>> during experiment on complete mode I realized we left some incomplete
>> code parts on supporting aggregation for continuous mode. (shuffle &
>> coalesce)
>>
>> The work had been occurred around first half of 2018 and stopped, and no
>> work has been done for around 2 years (so I don't expect anyone is working
>> on this). The functionality is undocumented (as the work was only done
>> partially) and continuous mode is experimental so I don't feel risks to get
>> rid of the part.
>>
>> What do you think? If it makes sense then I'll raise a PR to get rid of
>> the incomplete codes.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> ps. I'm kind of feeling the same with continuous mode itself - it has
>> been still experimental over the 2 years, and while it lacks lots of
>> essential things (backpressure, epoch synchronization, query progress,
>> etc.) there's no additional major work over 1 year. We eventually have been
>> excluded continuous mode for the new streaming feature like observable
>> metric, streaming UI, etc. because it is on top of the feature
>> which continuous mode lacks.
>>
>> Unlike incomplete code path, I'm not strongly against this, as it
>> has been documented and someone might use the feature in production. I
>> still think it's ideal to retire the feature smoothly.
>> (Please chime in with the use case if someone has the production cases.)
>>
>> ps2. It feels like "feature branch" looks to be a thing to consider for
>> efforts on one big feature.
>>
>


Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-05-20 Thread Jungtaek Lim
Let me share the effect on removing the incomplete and undocumented code
path. I manually tried out removing the code path and here's the change.

https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8

1,120 lines deleted without hurting any existing streaming tests, except a
suite I removed as well since it tests such code path. No need to update
documentation as it was never publicized. This also removes some rules
which only apply for continuous mode, which gave exceptions on the fact
that "continuous mode is only available for map-like operations". Removing
them would be back to simplify the usage of continuous mode.

Also worth noting that I had to manually remove the code path instead of
revert, because the code path has been changed to reflect DSv2 change. What
this means? We have to "update" the code path and concern
about compatibility, etc. while it never be used in production.

On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim 
wrote:

> Hi devs,
>
> during experiment on complete mode I realized we left some incomplete code
> parts on supporting aggregation for continuous mode. (shuffle & coalesce)
>
> The work had been occurred around first half of 2018 and stopped, and no
> work has been done for around 2 years (so I don't expect anyone is working
> on this). The functionality is undocumented (as the work was only done
> partially) and continuous mode is experimental so I don't feel risks to get
> rid of the part.
>
> What do you think? If it makes sense then I'll raise a PR to get rid of
> the incomplete codes.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. I'm kind of feeling the same with continuous mode itself - it has been
> still experimental over the 2 years, and while it lacks lots of essential
> things (backpressure, epoch synchronization, query progress, etc.) there's
> no additional major work over 1 year. We eventually have been
> excluded continuous mode for the new streaming feature like observable
> metric, streaming UI, etc. because it is on top of the feature
> which continuous mode lacks.
>
> Unlike incomplete code path, I'm not strongly against this, as it has been
> documented and someone might use the feature in production. I still think
> it's ideal to retire the feature smoothly.
> (Please chime in with the use case if someone has the production cases.)
>
> ps2. It feels like "feature branch" looks to be a thing to consider for
> efforts on one big feature.
>