Maybe also adding Aggregation/GroupBy as utilities?
-Rui On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <[email protected]> wrote: > Thanks Valentyn, > > I'll outline the utilities and accept any suggestions to add / modify. > These are really just shortcut PTransforms that I am working on to simplify > creating pipelines. > > Currently the utilities contain the following PTransforms: > > - Inner Join > - Left Outer Join > - Right Outer Join > - Full Outer Join > - PrepareKey (For selecting items in a dictionary to act as a key for the > joins) > - Select (very simple filter that returns only items you want from the > dictionary) (allows for defining a default nullValue) > > Currently these operations only work with dictionaries, but I'd be > interested to see how it would work for <K,V> tuples. > > I'm new to python so they may not be optimized or the best way, but from > my understanding these seem to be the best way to do these types of > operations. Essentially I created a pipeline to be able to convert a simple > sql query into a flow of these utilities. Using prepareKey to define your > joining key, joining, and then selecting from the join allows you to do a > lot of powerful manipulation in a simple / familiar way. > > If this is something that we'd like to add to the Beam SDK I don't mind > looking at the contributor license agreement, and conversing more on how to > get them in. > > Thanks, > Shannon > > > > On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]> > wrote: > >> Hi Shannon, >> >> Thanks for considering a contribution to Beam Python SDK. With a direct >> contribution to Beam SDK, your change will reach larger audience of users, >> and you will not have to maintain a separate project and keep it up to date >> with new releases of Beam. >> >> I encourage you to take a look at https://beam.apache.org/contribute/ for >> general advice on how to get started. To echo some points mentioned in the >> guide: >> >> - If your change is large or it is your first change, it is a good idea >> to discuss it on the dev@ mailing list >> - For large changes create a design doc (template, examples) and email it >> to the dev@ mailing list. >> >> Thanks, >> Valentyn >> >> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <[email protected]> >> wrote: >> >>> I have been writing a bunch of utilities for the python SDK such as >>> joins, selections, composite transforms, etc... >>> >>> I am working with my company to see if I can open source the utilities. >>> Would it be best to post them on a separate PyPi project, or to PR them >>> into the beam SDK? I assume if they let me open source it they will want >>> some attribution or something like that. >>> >>> Thanks, >>> Shannon >>> >>
