Jira for Java: https://jira.apache.org/jira/browse/BEAM-7250 Jira for Python........ no : / I just jumped right in to make the change: https://github.com/apache/beam/pull/8430
On Thu, Jul 18, 2019 at 1:59 AM Ismaël Mejía <ieme...@gmail.com> wrote: > Is there a JIRA already to track this? > > On Fri, Jun 14, 2019 at 11:52 PM Ismaël Mejía <ieme...@gmail.com> wrote: > > > > +1 for the change for Java too both for consistency with Python and > > with the way State/Timers work too. > > > > It would be really nice if possible to see a concrete proposed example > > (or even better design doc). > > > > Thanks for bringing this idea Pablo and sorry for delayed answer. > > > > On Wed, Jun 5, 2019 at 8:44 PM Pablo Estrada <pabl...@google.com> wrote: > > > > > > I have no objections. > > > > > > +Ismaël Mejía who has familiarity and interest in Java SDF. > > > > > > On Wed, Jun 5, 2019 at 11:31 AM Brian Hulette <bhule...@google.com> > wrote: > > >> > > >> Just wanted to resurrect this to say that it seems appropriate to > make the same change in Java. All the same arguments apply there, and now > there's the additional argument for maintaining symmetry with Python. > > >> > > >> I think BEAM-7250 should be changed to a ticket to actually implement > this in Java unless someone has an objection. > > >> > > >> Brian > > >> > > >> On Wed, May 8, 2019 at 2:20 PM Pablo Estrada <pabl...@google.com> > wrote: > > >>> > > >>> Hello all, > > >>> The API has been updated for Python (See > https://github.com/apache/beam/pull/8430). Please, if you catch any > documentation that needs updating, flag to me or just propose the change : ) > > >>> > > >>> As for Java - we didn't end up determining whether it makes sense to > update the API as well. Thoughts from others? > > >>> > > >>> In any case, I've filed > https://jira.apache.org/jira/browse/BEAM-7250 to track this for Java. > > >>> > > >>> Best > > >>> -P. > > >>> > > >>> On Mon, Apr 29, 2019 at 2:41 PM Lukasz Cwik <lc...@google.com> > wrote: > > >>>> > > >>>> Pablo, all the SplittableDoFn stuff is marked as @Experimental so > one is able to change it. There really is only one complicated one to > change in Watch.java, the rest are quite straightforward. > > >>>> > > >>>> On Mon, Apr 29, 2019 at 2:23 PM Pablo Estrada <pabl...@google.com> > wrote: > > >>>>> > > >>>>> Thanks all, > > >>>>> @Luke - I imagine that would be an improvement to the API, but > this may be harder as this is already available to users, and there are > those who have implemented SDFs under the current API. Would it be possible > to make a backwards-compatible change to the API here? > > >>>>> > > >>>>> For the Python changes, I've proposed a pull request: > https://github.com/apache/beam/pull/8430 - it was smaller than I thought > : ) - All comments welcome please. > > >>>>> > > >>>>> +Boyuan Zhang I am happy to wait for your SyntheticSource PR to be > merged and make the appropriate changes if you'd like. > > >>>>> Best > > >>>>> -P. > > >>>>> > > >>>>> On Mon, Apr 29, 2019 at 8:23 AM Lukasz Cwik <lc...@google.com> > wrote: > > >>>>>> > > >>>>>> Would it make sense to also do this in the Java SDK? > > >>>>>> > > >>>>>> The would make the restriction provider also mirror the TimerSpec > and StateSpec which use annotations similar to how its done in Python. > > >>>>>> > > >>>>>> On Mon, Apr 29, 2019 at 3:42 AM Robert Bradshaw < > rober...@google.com> wrote: > > >>>>>>> > > >>>>>>> +1 to introducing this Param for consistency (and making the > > >>>>>>> substitution more obvious), and I think SDF is still > new/experimental > > >>>>>>> enough we can do this. I don't know if we need Spec in addition > to > > >>>>>>> Param and Provider. > > >>>>>>> > > >>>>>>> On Sat, Apr 27, 2019 at 1:07 AM Chamikara Jayalath < > chamik...@google.com> wrote: > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > On Fri, Apr 26, 2019 at 3:43 PM Pablo Estrada < > pabl...@google.com> wrote: > > >>>>>>> >> > > >>>>>>> >> Hi all, > > >>>>>>> >> Sorry about the wall of text. > > >>>>>>> >> So, first of all, I thought about this while reviewing a PR > by Boyuan with an example of an SDF[1]. This is very exciting btw : ). > > >>>>>>> >> > > >>>>>>> >> Anyway... I certainly have a limited view of the whole SDF > effort, but I think it's worth discussing this particular point about the > API before finalizing SDF and making it widely available. So here I go: > > >>>>>>> >> > > >>>>>>> >> The Python API for SDF asks users to provide a restriction > provider in their process function signature. More or less the following: > > >>>>>>> >> > > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn): > > >>>>>>> >> def process(self, element, > > >>>>>>> >> > restriction_tracker=MyOwnLittleRestrictionProvider()): > > >>>>>>> >> # My DoFn logic... > > >>>>>>> >> > > >>>>>>> >> This is all fine, but something that I found a little odd is > that the restriction provider gets replaced at runtime with a restriction > tracker: > > >>>>>>> >> > > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn): > > >>>>>>> >> def process(self, element, > > >>>>>>> >> > restriction_tracker=MyOwnLittleRestrictionProvider()): > > >>>>>>> >> # This assert succeeds : ) > > >>>>>>> >> assert not isinstance(restriction_tracker, > > >>>>>>> >> MyOwnLittleRestrictionProvider) > > >>>>>>> >> > > >>>>>>> >> After thinking a little bit about it, I realized that the > default argument simply allows us to inform the runner where to find the > restriction provider; but that the thing that we need at runtime is NOT the > restriction provider - but rather, the restriction tracker. > > >>>>>>> >> > > >>>>>>> >> A similar pattern occurs with state and timers, where the > runner needs to know the sort of state, the coder for the values in that > state (or the time domain for timers); but the runtime parameter is > different[2]. For state and timers (and window, timestamp, pane, etc.) we > provide a pattern where users give a default value that is clearly a > placeholder: beam.DoFn.TimerParam, or beam.DoFn.StateParam. > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > This is the way (new) DoFn work for Python SDK. SDK (harness) > identifies meanings of different (potential) arguments to a DoFn using > pre-defined default values. > > >>>>>>> > > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> In this case, the API is fairly similar, but (at least in my > imagination), it is much more clear about how the DoFnParam will be > replaced with something else at runtime. A similar change could be done for > SDF: > > >>>>>>> >> > > >>>>>>> >> class MyOwnLittleSDF(beam.DoFn): > > >>>>>>> >> MY_RESTRICTION = \ > > >>>>>>> >> > RestrictionSpec(provider=MyOwnLittleRestrictionProvider()) > > >>>>>>> >> > > >>>>>>> >> def process( > > >>>>>>> >> self, element, > > >>>>>>> >> > restriction_tracker=beam.DoFn.RestrictionParam(MY_RESTRICTION)): > > >>>>>>> >> # My DoFn logic.. > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > If I understood correctly, what you propose is similar to the > existing solution but we add a XXXParam parameter for consistency ? > > >>>>>>> > I think this is fine and should be a relatively small change. > Main point is, SDK should be able to find out the RestrictionProvider class > to utilize it's methods before passing elements to DoFn.process() method: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L241 > > >>>>>>> > > > >>>>>>> > > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> Perhaps it is a good opportunity to consider this, since SDF > is still in progress. > > >>>>>>> >> > > >>>>>>> >> Some pros: > > >>>>>>> >> - Consistent with other parameters that we pass to DoFn > methods > > >>>>>>> >> - A bit more clear about what will happen at runtime > > >>>>>>> >> > > >>>>>>> >> Some cons: > > >>>>>>> >> - SDF developers are "power users", and will have gone > through the SDF documentation. This point will be clear to them. > > >>>>>>> >> - This may create unnecessary work, and perhaps unintended > consequences. > > >>>>>>> >> - I bet there's more > > >>>>>>> >> > > >>>>>>> >> Thoughts? > > >>>>>>> >> > > >>>>>>> >> -P. > > >>>>>>> >> > > >>>>>>> >> [1] https://github.com/apache/beam/pull/8338 > > >>>>>>> >> [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/userstate_test.py#L560-L586 > . > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> >