Is there a JIRA already to track this?

On Fri, Jun 14, 2019 at 11:52 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>
> +1 for the change for Java too both for consistency with Python and
> with the way State/Timers work too.
>
> It would be really nice if possible to see a concrete proposed example
> (or even better design doc).
>
> Thanks for bringing this idea Pablo and sorry for delayed answer.
>
> On Wed, Jun 5, 2019 at 8:44 PM Pablo Estrada <pabl...@google.com> wrote:
> >
> > I have no objections.
> >
> > +Ismaël Mejía who has familiarity and interest in Java SDF.
> >
> > On Wed, Jun 5, 2019 at 11:31 AM Brian Hulette <bhule...@google.com> wrote:
> >>
> >> Just wanted to resurrect this to say that it seems appropriate to make the 
> >> same change in Java. All the same arguments apply there, and now there's 
> >> the additional argument for maintaining symmetry with Python.
> >>
> >> I think BEAM-7250 should be changed to a ticket to actually implement this 
> >> in Java unless someone has an objection.
> >>
> >> Brian
> >>
> >> On Wed, May 8, 2019 at 2:20 PM Pablo Estrada <pabl...@google.com> wrote:
> >>>
> >>> Hello all,
> >>> The API has been updated for Python (See 
> >>> https://github.com/apache/beam/pull/8430). Please, if you catch any 
> >>> documentation that needs updating, flag to me or just propose the change 
> >>> : )
> >>>
> >>> As for Java - we didn't end up determining whether it makes sense to 
> >>> update the API as well. Thoughts from others?
> >>>
> >>> In any case, I've filed https://jira.apache.org/jira/browse/BEAM-7250 to 
> >>> track this for Java.
> >>>
> >>> Best
> >>> -P.
> >>>
> >>> On Mon, Apr 29, 2019 at 2:41 PM Lukasz Cwik <lc...@google.com> wrote:
> >>>>
> >>>> Pablo, all the SplittableDoFn stuff is marked as @Experimental so one is 
> >>>> able to change it. There really is only one complicated one to change in 
> >>>> Watch.java, the rest are quite straightforward.
> >>>>
> >>>> On Mon, Apr 29, 2019 at 2:23 PM Pablo Estrada <pabl...@google.com> wrote:
> >>>>>
> >>>>> Thanks all,
> >>>>>  @Luke - I imagine that would be an improvement to the API, but this 
> >>>>> may be harder as this is already available to users, and there are 
> >>>>> those who have implemented SDFs under the current API. Would it be 
> >>>>> possible to make a backwards-compatible change to the API here?
> >>>>>
> >>>>> For the Python changes, I've proposed a pull request: 
> >>>>> https://github.com/apache/beam/pull/8430 - it was smaller than I 
> >>>>> thought : ) - All comments welcome please.
> >>>>>
> >>>>> +Boyuan Zhang I am happy to wait for your SyntheticSource PR to be 
> >>>>> merged and make the appropriate changes if you'd like.
> >>>>> Best
> >>>>> -P.
> >>>>>
> >>>>> On Mon, Apr 29, 2019 at 8:23 AM Lukasz Cwik <lc...@google.com> wrote:
> >>>>>>
> >>>>>> Would it make sense to also do this in the Java SDK?
> >>>>>>
> >>>>>> The would make the restriction provider also mirror the TimerSpec and 
> >>>>>> StateSpec which use annotations similar to how its done in Python.
> >>>>>>
> >>>>>> On Mon, Apr 29, 2019 at 3:42 AM Robert Bradshaw <rober...@google.com> 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> +1 to introducing this Param for consistency (and making the
> >>>>>>> substitution more obvious), and I think SDF is still new/experimental
> >>>>>>> enough we can do this. I don't know if we need Spec in addition to
> >>>>>>> Param and Provider.
> >>>>>>>
> >>>>>>> On Sat, Apr 27, 2019 at 1:07 AM Chamikara Jayalath 
> >>>>>>> <chamik...@google.com> wrote:
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > On Fri, Apr 26, 2019 at 3:43 PM Pablo Estrada <pabl...@google.com> 
> >>>>>>> > wrote:
> >>>>>>> >>
> >>>>>>> >> Hi all,
> >>>>>>> >> Sorry about the wall of text.
> >>>>>>> >> So, first of all, I thought about this while reviewing a PR by 
> >>>>>>> >> Boyuan with an example of an SDF[1]. This is very exciting btw : ).
> >>>>>>> >>
> >>>>>>> >> Anyway... I certainly have a limited view of the whole SDF effort, 
> >>>>>>> >> but I think it's worth discussing this particular point about the 
> >>>>>>> >> API before finalizing SDF and making it widely available. So here 
> >>>>>>> >> I go:
> >>>>>>> >>
> >>>>>>> >> The Python API for SDF asks users to provide a restriction 
> >>>>>>> >> provider in their process function signature. More or less the 
> >>>>>>> >> following:
> >>>>>>> >>
> >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> >>>>>>> >>   def process(self, element,
> >>>>>>> >>               
> >>>>>>> >> restriction_tracker=MyOwnLittleRestrictionProvider()):
> >>>>>>> >>     # My DoFn logic...
> >>>>>>> >>
> >>>>>>> >> This is all fine, but something that I found a little odd is that 
> >>>>>>> >> the restriction provider gets replaced at runtime with a 
> >>>>>>> >> restriction tracker:
> >>>>>>> >>
> >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> >>>>>>> >>   def process(self, element,
> >>>>>>> >>               
> >>>>>>> >> restriction_tracker=MyOwnLittleRestrictionProvider()):
> >>>>>>> >>     # This assert succeeds : )
> >>>>>>> >>     assert not isinstance(restriction_tracker,
> >>>>>>> >>                           MyOwnLittleRestrictionProvider)
> >>>>>>> >>
> >>>>>>> >> After thinking a little bit about it, I realized that the default 
> >>>>>>> >> argument simply allows us to inform the runner where to find the 
> >>>>>>> >> restriction provider; but that the thing that we need at runtime 
> >>>>>>> >> is NOT the restriction provider - but rather, the restriction 
> >>>>>>> >> tracker.
> >>>>>>> >>
> >>>>>>> >> A similar pattern occurs with state and timers, where the runner 
> >>>>>>> >> needs to know the sort of state, the coder for the values in that 
> >>>>>>> >> state (or the time domain for timers); but the runtime parameter 
> >>>>>>> >> is different[2]. For state and timers (and window, timestamp, 
> >>>>>>> >> pane, etc.) we provide a pattern where users give a default value 
> >>>>>>> >> that is clearly a placeholder: beam.DoFn.TimerParam, or 
> >>>>>>> >> beam.DoFn.StateParam.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > This is the way (new) DoFn work for Python SDK. SDK (harness) 
> >>>>>>> > identifies meanings of different (potential) arguments to a DoFn 
> >>>>>>> > using pre-defined default values.
> >>>>>>> >
> >>>>>>> >>
> >>>>>>> >>
> >>>>>>> >> In this case, the API is fairly similar, but (at least in my 
> >>>>>>> >> imagination), it is much more clear about how the DoFnParam will 
> >>>>>>> >> be replaced with something else at runtime. A similar change could 
> >>>>>>> >> be done for SDF:
> >>>>>>> >>
> >>>>>>> >> class MyOwnLittleSDF(beam.DoFn):
> >>>>>>> >>   MY_RESTRICTION = \
> >>>>>>> >>       RestrictionSpec(provider=MyOwnLittleRestrictionProvider())
> >>>>>>> >>
> >>>>>>> >>   def process(
> >>>>>>> >>       self, element,
> >>>>>>> >>       
> >>>>>>> >> restriction_tracker=beam.DoFn.RestrictionParam(MY_RESTRICTION)):
> >>>>>>> >>     # My DoFn logic..
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > If I understood correctly, what you propose is similar to the 
> >>>>>>> > existing solution but we add a XXXParam parameter for consistency ?
> >>>>>>> > I think this is fine and should be a relatively small change. Main 
> >>>>>>> > point is, SDK should be able to find out the RestrictionProvider 
> >>>>>>> > class to utilize it's methods before passing elements to 
> >>>>>>> > DoFn.process() method: 
> >>>>>>> > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L241
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >>
> >>>>>>> >>
> >>>>>>> >> Perhaps it is a good opportunity to consider this, since SDF is 
> >>>>>>> >> still in progress.
> >>>>>>> >>
> >>>>>>> >> Some pros:
> >>>>>>> >> - Consistent with other parameters that we pass to DoFn methods
> >>>>>>> >> - A bit more clear about what will happen at runtime
> >>>>>>> >>
> >>>>>>> >> Some cons:
> >>>>>>> >> - SDF developers are "power users", and will have gone through the 
> >>>>>>> >> SDF documentation. This point will be clear to them.
> >>>>>>> >> - This may create unnecessary work, and perhaps unintended 
> >>>>>>> >> consequences.
> >>>>>>> >> - I bet there's more
> >>>>>>> >>
> >>>>>>> >> Thoughts?
> >>>>>>> >>
> >>>>>>> >> -P.
> >>>>>>> >>
> >>>>>>> >> [1] https://github.com/apache/beam/pull/8338
> >>>>>>> >> [2] 
> >>>>>>> >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/userstate_test.py#L560-L586
> >>>>>>> >>  .
> >>>>>>> >>
> >>>>>>> >>
> >>>>>>> >>

Reply via email to