Hi Baodi,

Thanks for the reply and update of the PIP.

1. Pulsar Functions currently isn't integrated with the Transaction feature
yet, so there's no EXACTLY_ONCE support.

2. And Yes, "EFFECTIVELY_ONCE = ATLEAST_ONCE + Message Deduplication"



On Tue, May 31, 2022 at 9:16 AM 石宝迪 <wudixiaolong...@icloud.com.invalid>
wrote:

> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
>
>
> I tend to fail. Although this breaks the current logic. but the current
> implementation can be considered is a bug.
>
> > It will flood their logs if they used it wrong. Maybe write to log once?
>
>
> Agree, I changed PIP.
>
> Thanks,
> Baodi Shi
>
> > 2022年5月31日 23:5720,Asaf Mesika <asaf.mes...@gmail.com> 写道:
> >
> > Hi Baodi,
> >
> > Regarding
> >
> >>
> >>   1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must
> >>   be true. If the validation fails, let the function fail to start (This
> >>   temporarily resolves the configuration ambiguity). When autoAck is
> >>   subsequently removed, the message will be acked immediately after
> receiving
> >>   the message.
> >>
> >>
> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
> >
> > Regarding
> >
> >>
> >>   1.
> >>
> >>
> >>   When user call record.ack() in function, just ProcessingGuarantees ==
> >>   MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, user
> >>   call record.ack() is invalid(print warn log).
> >>
> >> It will flood their logs if they used it wrong. Maybe write to log once?
> >
> > On Tue, May 31, 2022 at 12:24 PM Baozi <wudixiaolong...@icloud.com
> .invalid>
> > wrote:
> >
> >> Hi, Asaf.
> >>
> >> Thanks review.
> >>
> >>>> I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>
> >>
> >> Not using transactions now, I understand: EFFECTIVELY_ONCE =
> ATLEAST_ONCE
> >> + Message Deduplication.
> >>
> >> @Neng Lu @Rui Fu Can help make sure?
> >>
> >>>> I would issue a WARN when reading configuring the function (thus
> emitted
> >>> once) when the user actively configured autoAck=false and warn them
> that
> >>> this configuration is deprecated and they should switch to the MANUAL
> >>> ProcessingGuarantee configuration option.
> >>
> >>
> >> Added to API Change(2)
> >>
> >>>> suggest you clarify what existing behavior remains for backward
> >>> compatibility with the appropriate comment that this is deprecated and
> >> will
> >>> be removed.
> >>
> >> Yes, I have rewritten it, please see Implementation(1)
> >>
> >>> 5. Regarding Test Plan
> >>> * I would add: Validate the test of autoAck=false still works (you
> >> haven't
> >>> broken anything)
> >>> * I would add: Validate existing ProcessingGuarantee test for
> AtMostOnce,
> >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true)
> >>
> >>
> >> Nice, I added to PIP.
> >>
> >>
> >> Thanks,
> >> Baodi Shi
> >>
> >>> 2022年5月30日 22:0011,Asaf Mesika <asaf.mes...@gmail.com> 写道:
> >>>
> >>> Thanks for applying the fixes.
> >>>
> >>> 1. Regarding
> >>>
> >>>>
> >>>>  - EFFECTIVELY_ONCE: The message is acknowledged *after* the function
> >>>>  finished execution. Depends on pulsar deduplication, and provides
> >>>>  end-to-end effectively once processing.
> >>>>
> >>>> I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>>
> >>> 2. Regarding
> >>>
> >>>>
> >>>>  1. Indication of autoAck is deprecated, and not use it in the code.
> >>>>  (and also Function.proto)
> >>>>
> >>>> * I would issue a WARN when reading configuring the function (thus
> >> emitted
> >>> once) when the user actively configured autoAck=false and warn them
> that
> >>> this configuration is deprecated and they should switch to the MANUAL
> >>> ProcessingGuarantee configuration option.
> >>>
> >>> 3. Regarding
> >>>
> >>>>
> >>>>  1. When the delivery semantic is ATMOST_ONCE, the message will be
> >>>>  acked immediately after receiving the message, no longer affected by
> >> the
> >>>>  autoAck configuration.
> >>>>
> >>>> I suggest you clarify what existing behavior remains for backward
> >>> compatibility with the appropriate comment that this is deprecated and
> >> will
> >>> be removed.
> >>>
> >>> 4. Regarding
> >>>
> >>>>
> >>>>  1.
> >>>>
> >>>>  When user call record.ack() in function, just ProcessingGuarantees ==
> >>>>  MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL,
> user
> >>>>  call record.ack() is invalid(print warn log).
> >>>>
> >>>> That might blast WARN messages to the user. Perhaps save the fact that
> >> you
> >>> have printed a WARN message once and only print when the variable is
> not
> >>> set?
> >>>
> >>> 5. Regarding Test Plan
> >>> * I would add: Validate the test of autoAck=false still works (you
> >> haven't
> >>> broken anything)
> >>> * I would add: Validate existing ProcessingGuarantee test for
> AtMostOnce,
> >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true)
> >>>
> >>>
> >>>
> >>> On Mon, May 30, 2022 at 4:09 PM Baozi <wudixiaolong...@icloud.com
> >> .invalid>
> >>> wrote:
> >>>
> >>>> Hi, Mesika.
> >>>>
> >>>> Thanks review.
> >>>>
> >>>>>> 2. I suggest calling it `MANUAL` `ProcessingGuarantee` instead of
> >> NONE.
> >>>> As
> >>>>>> you carefully explained, ProcessingGuarantee comes does to the fact
> >> that
> >>>>>> the function executor calls acknowledge, in specific timing:
> >>>>
> >>>>
> >>>> Added, Refer to the latest pip.
> >>>> https://github.com/apache/pulsar/issues/15560
> >>>>
> >>>>>> 3. Removing autoAck from Function Config breaks backward
> >> compatibility,
> >>>>>> thus shouldn't this be strongly reflected in the PIP - how does
> Pulsar
> >>>>>> release handle breaking change?
> >>>>
> >>>> As suggested by @neng, They will be marked as deprecated first and
> >> clearly
> >>>> stated in the documentation. Remove it after 2~3 release.
> >>>>
> >>>>>> 4. Regarding Implementation (1), isn't the class itself
> >>>>>> PulsarSinkAtLeastOnceProcessor encodes what to do? I'm not sure I
> >>>>>> understand how you use that enum value *inside* the class/
> >>>>
> >>>> I changed PIP, add new PulsarSinkManualProcessor. Refer to the latest
> >> PIP
> >>>> API Changes(3)
> >>>>
> >>>> Thanks,
> >>>> Baodi Shi
> >>>>
> >>>>> 2022年5月30日 12:5128,Rui Fu <r...@apache.org> 写道:
> >>>>>
> >>>>> Hi Baodi,
> >>>>>
> >>>>> Nice work. Put some suggestions below, ptal.
> >>>>>
> >>>>> 1. API changes should also contain the changes of `Function.proto`,
> >>>> including new `ProcessingGuarantees` option and `autoAck`.
> >>>>> 2. Please be sure the other language runtimes (like Python, Golang)
> do
> >>>> support similar `record.ack()` function from the context, if no, it
> >> might
> >>>> have some API changes for different runtime we well.
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Rui Fu
> >>>>> 在 2022年5月29日 +0800 22:18,Asaf Mesika <asaf.mes...@gmail.com>,写道:
> >>>>>> 1. "Added NONE delivery semantics and delete autoAck config."
> >>>>>> - Added --> add
> >>>>>>
> >>>>>> 2. I suggest calling it `MANUAL` `ProcessingGuarantee` instead of
> >> NONE.
> >>>> As
> >>>>>> you carefully explained, ProcessingGuarantee comes does to the fact
> >> that
> >>>>>> the function executor calls acknowledge, in specific timing:
> >>>>>> - `AT_MOST_ONCE` - When the message is read by the client, it is
> >>>>>> immediately acknowledged and only then the function is executed,
> thus
> >>>>>> guaranteeing it will not run more than once
> >>>>>> - `AT_LEAST_ONCE` - Message is acknowledged *after* the function
> >>>> finished
> >>>>>> execution, thus it will be run at least once.
> >>>>>> - `MANUAL` - Signals to the user that it is up to them to
> acknowledge
> >>>> the
> >>>>>> message, inside the function.
> >>>>>>
> >>>>>> I think if you couple that change with adding the explanation I
> wrote
> >>>>>> above to the documentation it will become crystal clear (hopefully)
> >>>> what is
> >>>>>> a Processing Guarantee exactly and what each value signifies.
> >>>>>>
> >>>>>> 3. Removing autoAck from Function Config breaks backward
> >> compatibility,
> >>>>>> thus shouldn't this be strongly reflected in the PIP - how does
> Pulsar
> >>>>>> release handle breaking change?
> >>>>>>
> >>>>>> 4. Regarding Implementation (1), isn't the class itself
> >>>>>> PulsarSinkAtLeastOnceProcessor encodes what to do? I'm not sure I
> >>>>>> understand how you use that enum value *inside* the class/
> >>>>>>
> >>>>>>
> >>>>>> On Fri, May 27, 2022 at 1:08 AM Neng Lu <freen...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Some suggestions:
> >>>>>>>
> >>>>>>> 1. Instead of deleting the `autoAck`, keep it but not use it in the
> >>>> code.
> >>>>>>> And documented clearly it's deprecated for the following 2~3
> release.
> >>>> And
> >>>>>>> then delete it.
> >>>>>>> 2. For `PulsarSinkAtLeastOnceProcessor` and
> >>>>>>> `PulsarSinkEffectivelyOnceProcessor`, if `NONE` is configured, it
> >>>> defaults
> >>>>>>> to ATLEAST_ONCE.
> >>>>>>> 3. Need to let users know the behavior when they call
> `record.ack()`
> >>>> inside
> >>>>>>> the function implementation.
> >>>>>>>
> >>>>>>> On Thu, May 12, 2022 at 1:52 AM Baozi <wudixiaolong...@icloud.com
> >>>> .invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Pulsar community,
> >>>>>>>>
> >>>>>>>> I open a https://github.com/apache/pulsar/issues/15560 for
> Function
> >>>> add
> >>>>>>>> NONE delivery semantics
> >>>>>>>>
> >>>>>>>> Let me know what you think.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Baodi Shi
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ## Motivation
> >>>>>>>>
> >>>>>>>> Currently Function supports three delivery semantics, and also
> >>>> provides
> >>>>>>>> autoAck to control whether to automatically ack.
> >>>>>>>> Because autoAck affects the delivery semantics of Function, it can
> >> be
> >>>>>>>> confusing for users to understand the relationship between these
> two
> >>>>>>>> parameters.
> >>>>>>>>
> >>>>>>>> For example, when the user configures `Guarantees == ATMOST_ONCE`
> >> and
> >>>>>>>> `autoAck == false`, then the framework will not help the user to
> ack
> >>>>>>>> messages, and the processing semantics may become `ATLEAST_ONCE`.
> >>>>>>>>
> >>>>>>>> The delivery semantics provided by Function should be clear. When
> >> the
> >>>>>>> user
> >>>>>>>> sets the guarantees, the framework should ensure point-to-point
> >>>> semantic
> >>>>>>>> processing and cannot be affected by other parameters.
> >>>>>>>>
> >>>>>>>> ## Goal
> >>>>>>>>
> >>>>>>>> Added `NONE` delivery semantics and delete `autoAck` config.
> >>>>>>>>
> >>>>>>>> The original intention of `autoAck` semantics is that users want
> to
> >>>>>>>> control the timing of ack by themselves. When autoAck == false,
> the
> >>>>>>>> processing semantics provided by the framework should be invalid.
> >>>> Then we
> >>>>>>>> can add `NONE` processing semantics to replace the autoAck ==
> false
> >>>>>>>> scenario.
> >>>>>>>>
> >>>>>>>> When the user configuration `ProcessingGuarantees == NONE`, the
> >>>> framework
> >>>>>>>> does not help the user to do any ack operations, and the ack is
> left
> >>>> to
> >>>>>>> the
> >>>>>>>> user to handle. In other cases, the framework guarantees
> processing
> >>>>>>>> semantics.
> >>>>>>>>
> >>>>>>>> ## API Changes
> >>>>>>>> 1. Add `NONE` type to ProcessingGuarantees
> >>>>>>>> ``` java
> >>>>>>>> public enum ProcessingGuarantees {
> >>>>>>>> ATLEAST_ONCE,
> >>>>>>>> ATMOST_ONCE,
> >>>>>>>> EFFECTIVELY_ONCE,
> >>>>>>>> NONE
> >>>>>>>> }
> >>>>>>>> ```
> >>>>>>>>
> >>>>>>>> 2. Delete autoAck config in FunctionConfig
> >>>>>>>> ``` java
> >>>>>>>> public class FunctionConfig {
> >>>>>>>> - private Boolean autoAck;
> >>>>>>>> }
> >>>>>>>> ```
> >>>>>>>>
> >>>>>>>> ## Implementation
> >>>>>>>>
> >>>>>>>> 1. In `PulsarSinkAtLeastOnceProcessor` and
> >>>>>>>> `PulsarSinkEffectivelyOnceProcessor`, when `ProcessingGuarantees
> !=
> >>>> NONE`
> >>>>>>>> can be ack.
> >>>>>>>>
> >>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L274-L276
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> 2. When the delivery semantic is `ATMOST_ONCE`, the message will
> be
> >>>> acked
> >>>>>>>> immediately after receiving the message, no longer affected by the
> >>>>>>> autoAck
> >>>>>>>> configuration.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L271-L276
> >>>>>>>>
> >>>>>>>> 3. When user call `record.ack()` in function, just
> >>>> `ProcessingGuarantees
> >>>>>>>> == NONE` can be work.
> >>>>>>>>
> >>>>>>>> ## Plan test
> >>>>>>>> The main test and assert is that when ProcessingGuarantees ==
> NONE,
> >>>> the
> >>>>>>>> function framework will not do any ack operations for the user.
> >>>>>>>>
> >>>>>>>> ## Compatibility
> >>>>>>>> 1. This change will invalidate the user's setting of autoAck,
> which
> >>>>>>> should
> >>>>>>>> be explained in the documentation and provide parameter
> verification
> >>>> to
> >>>>>>>> remind the user.
> >>>>>>>> 2. Runtimes of other languages need to maintain consistent
> >> processing
> >>>>>>>> logic (python, go).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards,
> >>>>>>> Neng
> >>>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

-- 
Best Regards,
Neng

Reply via email to