I'm sure I could use some of the existing aggregations as a guide on how to make aggregations to fill the gap of missing ones. Such as creating Sum/Max/Min.
GroupBy is really already handled with GroupByKey and CoGroupByKey unless you are thinking of a different type of GroupBy? - Shannon On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <[email protected]> wrote: > Maybe also adding Aggregation/GroupBy as utilities? > > > -Rui > > On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <[email protected]> > wrote: > >> Thanks Valentyn, >> >> I'll outline the utilities and accept any suggestions to add / modify. >> These are really just shortcut PTransforms that I am working on to simplify >> creating pipelines. >> >> Currently the utilities contain the following PTransforms: >> >> - Inner Join >> - Left Outer Join >> - Right Outer Join >> - Full Outer Join >> - PrepareKey (For selecting items in a dictionary to act as a key for the >> joins) >> - Select (very simple filter that returns only items you want from the >> dictionary) (allows for defining a default nullValue) >> >> Currently these operations only work with dictionaries, but I'd be >> interested to see how it would work for <K,V> tuples. >> >> I'm new to python so they may not be optimized or the best way, but from >> my understanding these seem to be the best way to do these types of >> operations. Essentially I created a pipeline to be able to convert a simple >> sql query into a flow of these utilities. Using prepareKey to define your >> joining key, joining, and then selecting from the join allows you to do a >> lot of powerful manipulation in a simple / familiar way. >> >> If this is something that we'd like to add to the Beam SDK I don't mind >> looking at the contributor license agreement, and conversing more on how to >> get them in. >> >> Thanks, >> Shannon >> >> >> >> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]> >> wrote: >> >>> Hi Shannon, >>> >>> Thanks for considering a contribution to Beam Python SDK. With a direct >>> contribution to Beam SDK, your change will reach larger audience of users, >>> and you will not have to maintain a separate project and keep it up to date >>> with new releases of Beam. >>> >>> I encourage you to take a look at https://beam.apache.org/contribute/ for >>> general advice on how to get started. To echo some points mentioned in the >>> guide: >>> >>> - If your change is large or it is your first change, it is a good idea >>> to discuss it on the dev@ mailing list >>> - For large changes create a design doc (template, examples) and email >>> it to the dev@ mailing list. >>> >>> Thanks, >>> Valentyn >>> >>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan < >>> [email protected]> wrote: >>> >>>> I have been writing a bunch of utilities for the python SDK such as >>>> joins, selections, composite transforms, etc... >>>> >>>> I am working with my company to see if I can open source the utilities. >>>> Would it be best to post them on a separate PyPi project, or to PR them >>>> into the beam SDK? I assume if they let me open source it they will want >>>> some attribution or something like that. >>>> >>>> Thanks, >>>> Shannon >>>> >>>
