Jira for Java: https://jira.apache.org/jira/browse/BEAM-7250
Jira for Python........ no : / I just jumped right in to make the change:
https://github.com/apache/beam/pull/8430

On Thu, Jul 18, 2019 at 1:59 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> Is there a JIRA already to track this?
>
> On Fri, Jun 14, 2019 at 11:52 PM Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> > +1 for the change for Java too both for consistency with Python and
> > with the way State/Timers work too.
> >
> > It would be really nice if possible to see a concrete proposed example
> > (or even better design doc).
> >
> > Thanks for bringing this idea Pablo and sorry for delayed answer.
> >
> > On Wed, Jun 5, 2019 at 8:44 PM Pablo Estrada <pabl...@google.com> wrote:
> > >
> > > I have no objections.
> > >
> > > +Ismaël Mejía who has familiarity and interest in Java SDF.
> > >
> > > On Wed, Jun 5, 2019 at 11:31 AM Brian Hulette <bhule...@google.com>
> wrote:
> > >>
> > >> Just wanted to resurrect this to say that it seems appropriate to
> make the same change in Java. All the same arguments apply there, and now
> there's the additional argument for maintaining symmetry with Python.
> > >>
> > >> I think BEAM-7250 should be changed to a ticket to actually implement
> this in Java unless someone has an objection.
> > >>
> > >> Brian
> > >>
> > >> On Wed, May 8, 2019 at 2:20 PM Pablo Estrada <pabl...@google.com>
> wrote:
> > >>>
> > >>> Hello all,
> > >>> The API has been updated for Python (See
> https://github.com/apache/beam/pull/8430). Please, if you catch any
> documentation that needs updating, flag to me or just propose the change : )
> > >>>
> > >>> As for Java - we didn't end up determining whether it makes sense to
> update the API as well. Thoughts from others?
> > >>>
> > >>> In any case, I've filed
> https://jira.apache.org/jira/browse/BEAM-7250 to track this for Java.
> > >>>
> > >>> Best
> > >>> -P.
> > >>>
> > >>> On Mon, Apr 29, 2019 at 2:41 PM Lukasz Cwik <lc...@google.com>
> wrote:
> > >>>>
> > >>>> Pablo, all the SplittableDoFn stuff is marked as @Experimental so
> one is able to change it. There really is only one complicated one to
> change in Watch.java, the rest are quite straightforward.
> > >>>>
> > >>>> On Mon, Apr 29, 2019 at 2:23 PM Pablo Estrada <pabl...@google.com>
> wrote:
> > >>>>>
> > >>>>> Thanks all,
> > >>>>>  @Luke - I imagine that would be an improvement to the API, but
> this may be harder as this is already available to users, and there are
> those who have implemented SDFs under the current API. Would it be possible
> to make a backwards-compatible change to the API here?
> > >>>>>
> > >>>>> For the Python changes, I've proposed a pull request:
> https://github.com/apache/beam/pull/8430 - it was smaller than I thought
> : ) - All comments welcome please.
> > >>>>>
> > >>>>> +Boyuan Zhang I am happy to wait for your SyntheticSource PR to be
> merged and make the appropriate changes if you'd like.
> > >>>>> Best
> > >>>>> -P.
> > >>>>>
> > >>>>> On Mon, Apr 29, 2019 at 8:23 AM Lukasz Cwik <lc...@google.com>
> wrote:
> > >>>>>>
> > >>>>>> Would it make sense to also do this in the Java SDK?
> > >>>>>>
> > >>>>>> The would make the restriction provider also mirror the TimerSpec
> and StateSpec which use annotations similar to how its done in Python.
> > >>>>>>
> > >>>>>> On Mon, Apr 29, 2019 at 3:42 AM Robert Bradshaw <
> rober...@google.com> wrote:
> > >>>>>>>
> > >>>>>>> +1 to introducing this Param for consistency (and making the
> > >>>>>>> substitution more obvious), and I think SDF is still
> new/experimental
> > >>>>>>> enough we can do this. I don't know if we need Spec in addition
> to
> > >>>>>>> Param and Provider.
> > >>>>>>>
> > >>>>>>> On Sat, Apr 27, 2019 at 1:07 AM Chamikara Jayalath <
> chamik...@google.com> wrote:
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> > On Fri, Apr 26, 2019 at 3:43 PM Pablo Estrada <
> pabl...@google.com> wrote:
> > >>>>>>> >>
> > >>>>>>> >> Hi all,
> > >>>>>>> >> Sorry about the wall of text.
> > >>>>>>> >> So, first of all, I thought about this while reviewing a PR
> by Boyuan with an example of an SDF[1]. This is very exciting btw : ).
> > >>>>>>> >>
> > >>>>>>> >> Anyway... I certainly have a limited view of the whole SDF
> effort, but I think it's worth discussing this particular point about the
> API before finalizing SDF and making it widely available. So here I go:
> > >>>>>>> >>
> > >>>>>>> >> The Python API for SDF asks users to provide a restriction
> provider in their process function signature. More or less the following:
> > >>>>>>> >>
> > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> > >>>>>>> >>   def process(self, element,
> > >>>>>>> >>
>  restriction_tracker=MyOwnLittleRestrictionProvider()):
> > >>>>>>> >>     # My DoFn logic...
> > >>>>>>> >>
> > >>>>>>> >> This is all fine, but something that I found a little odd is
> that the restriction provider gets replaced at runtime with a restriction
> tracker:
> > >>>>>>> >>
> > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> > >>>>>>> >>   def process(self, element,
> > >>>>>>> >>
>  restriction_tracker=MyOwnLittleRestrictionProvider()):
> > >>>>>>> >>     # This assert succeeds : )
> > >>>>>>> >>     assert not isinstance(restriction_tracker,
> > >>>>>>> >>                           MyOwnLittleRestrictionProvider)
> > >>>>>>> >>
> > >>>>>>> >> After thinking a little bit about it, I realized that the
> default argument simply allows us to inform the runner where to find the
> restriction provider; but that the thing that we need at runtime is NOT the
> restriction provider - but rather, the restriction tracker.
> > >>>>>>> >>
> > >>>>>>> >> A similar pattern occurs with state and timers, where the
> runner needs to know the sort of state, the coder for the values in that
> state (or the time domain for timers); but the runtime parameter is
> different[2]. For state and timers (and window, timestamp, pane, etc.) we
> provide a pattern where users give a default value that is clearly a
> placeholder: beam.DoFn.TimerParam, or beam.DoFn.StateParam.
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> > This is the way (new) DoFn work for Python SDK. SDK (harness)
> identifies meanings of different (potential) arguments to a DoFn using
> pre-defined default values.
> > >>>>>>> >
> > >>>>>>> >>
> > >>>>>>> >>
> > >>>>>>> >> In this case, the API is fairly similar, but (at least in my
> imagination), it is much more clear about how the DoFnParam will be
> replaced with something else at runtime. A similar change could be done for
> SDF:
> > >>>>>>> >>
> > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> > >>>>>>> >>   MY_RESTRICTION = \
> > >>>>>>> >>
>  RestrictionSpec(provider=MyOwnLittleRestrictionProvider())
> > >>>>>>> >>
> > >>>>>>> >>   def process(
> > >>>>>>> >>       self, element,
> > >>>>>>> >>
>  restriction_tracker=beam.DoFn.RestrictionParam(MY_RESTRICTION)):
> > >>>>>>> >>     # My DoFn logic..
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> > If I understood correctly, what you propose is similar to the
> existing solution but we add a XXXParam parameter for consistency ?
> > >>>>>>> > I think this is fine and should be a relatively small change.
> Main point is, SDK should be able to find out the RestrictionProvider class
> to utilize it's methods before passing elements to DoFn.process() method:
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L241
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >>
> > >>>>>>> >>
> > >>>>>>> >> Perhaps it is a good opportunity to consider this, since SDF
> is still in progress.
> > >>>>>>> >>
> > >>>>>>> >> Some pros:
> > >>>>>>> >> - Consistent with other parameters that we pass to DoFn
> methods
> > >>>>>>> >> - A bit more clear about what will happen at runtime
> > >>>>>>> >>
> > >>>>>>> >> Some cons:
> > >>>>>>> >> - SDF developers are "power users", and will have gone
> through the SDF documentation. This point will be clear to them.
> > >>>>>>> >> - This may create unnecessary work, and perhaps unintended
> consequences.
> > >>>>>>> >> - I bet there's more
> > >>>>>>> >>
> > >>>>>>> >> Thoughts?
> > >>>>>>> >>
> > >>>>>>> >> -P.
> > >>>>>>> >>
> > >>>>>>> >> [1] https://github.com/apache/beam/pull/8338
> > >>>>>>> >> [2]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/userstate_test.py#L560-L586
> .
> > >>>>>>> >>
> > >>>>>>> >>
> > >>>>>>> >>
>

Reply via email to