As a follow up. Here is the repo that contains the utilities for now. https://github.com/shadowcodex/apache-beam-utilities. Will put together a proper PR as code gets closer to production quality.
- Shannon On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan <[email protected]> wrote: > Thanks Frederik, > > That's exactly where I was looking. I did get permission to open source > the utilities module. So I'm going to throw them up on my personal github > soon and share with the email group for a look over. > > I'm going to work on the utilities there because it's a quick dev > environment and then once they are ready for proper PR I'll begin working > them into the actual SDK for a PR. > > I also joined the slack #beam and #beam-python channels, I was unsure of > where most collaborators discussed items. > > - Shannon > > On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode <[email protected]> wrote: > >> Hi Shannon, >> >> This is probably a good starting point: >> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68 >> . >> >> Frederik >> >> [image: https://ml6.eu] >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ml6.eu_&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=yd_him24QhfROm7uRZLbfSsUHaA68_8FMl6s1MgT5sM&e=> >> >> >> * Frederik Bode* >> >> ML6 Ghent >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.be_maps_place_ML6_-4051.037408-2C3.7044893-2C17z_data-3D-213m1-214b1-214m5-213m4-211s0x47c37161feeca14b-3A0xb8f72585fdd21c90-218m2-213d51.037408-214d3.706678-3Fhl-3Dnl&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=26TZxPGXg0A_mqgeiw1lMeZYekpkExBAZ5MpavpUZmw&e=> >> +32 4 92 78 96 18 >> >> >> **** DISCLAIMER **** >> >> This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they are >> addressed. If you have received this email in error please notify the >> system manager. This message contains confidential information and is >> intended only for the individual named. If you are not the named addressee >> you should not disseminate, distribute or copy this e-mail. Please notify >> the sender immediately by e-mail if you have received this e-mail by >> mistake and delete this e-mail from your system. If you are not the >> intended recipient you are notified that disclosing, copying, distributing >> or taking any action in reliance on the contents of this information is >> strictly prohibited. >> >> >> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan <[email protected]> >> wrote: >> >>> I'm sure I could use some of the existing aggregations as a guide on how >>> to make aggregations to fill the gap of missing ones. Such as creating >>> Sum/Max/Min. >>> >>> GroupBy is really already handled with GroupByKey and CoGroupByKey >>> unless you are thinking of a different type of GroupBy? >>> >>> - Shannon >>> >>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <[email protected]> wrote: >>> >>>> Maybe also adding Aggregation/GroupBy as utilities? >>>> >>>> >>>> -Rui >>>> >>>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan < >>>> [email protected]> wrote: >>>> >>>>> Thanks Valentyn, >>>>> >>>>> I'll outline the utilities and accept any suggestions to add / modify. >>>>> These are really just shortcut PTransforms that I am working on to >>>>> simplify >>>>> creating pipelines. >>>>> >>>>> Currently the utilities contain the following PTransforms: >>>>> >>>>> - Inner Join >>>>> - Left Outer Join >>>>> - Right Outer Join >>>>> - Full Outer Join >>>>> - PrepareKey (For selecting items in a dictionary to act as a key for >>>>> the joins) >>>>> - Select (very simple filter that returns only items you want from the >>>>> dictionary) (allows for defining a default nullValue) >>>>> >>>>> Currently these operations only work with dictionaries, but I'd be >>>>> interested to see how it would work for <K,V> tuples. >>>>> >>>>> I'm new to python so they may not be optimized or the best way, but >>>>> from my understanding these seem to be the best way to do these types of >>>>> operations. Essentially I created a pipeline to be able to convert a >>>>> simple >>>>> sql query into a flow of these utilities. Using prepareKey to define your >>>>> joining key, joining, and then selecting from the join allows you to do a >>>>> lot of powerful manipulation in a simple / familiar way. >>>>> >>>>> If this is something that we'd like to add to the Beam SDK I don't >>>>> mind looking at the contributor license agreement, and conversing more on >>>>> how to get them in. >>>>> >>>>> Thanks, >>>>> Shannon >>>>> >>>>> >>>>> >>>>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Shannon, >>>>>> >>>>>> Thanks for considering a contribution to Beam Python SDK. With a >>>>>> direct contribution to Beam SDK, your change will reach larger audience >>>>>> of >>>>>> users, and you will not have to maintain a separate project and keep it >>>>>> up >>>>>> to date with new releases of Beam. >>>>>> >>>>>> I encourage you to take a look at https://beam.apache.org/contribute/ for >>>>>> general advice on how to get started. To echo some points mentioned in >>>>>> the >>>>>> guide: >>>>>> >>>>>> - If your change is large or it is your first change, it is a good >>>>>> idea to discuss it on the dev@ mailing list >>>>>> - For large changes create a design doc (template, examples) and >>>>>> email it to the dev@ mailing list. >>>>>> >>>>>> Thanks, >>>>>> Valentyn >>>>>> >>>>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I have been writing a bunch of utilities for the python SDK such as >>>>>>> joins, selections, composite transforms, etc... >>>>>>> >>>>>>> I am working with my company to see if I can open source the >>>>>>> utilities. Would it be best to post them on a separate PyPi project, or >>>>>>> to >>>>>>> PR them into the beam SDK? I assume if they let me open source it they >>>>>>> will >>>>>>> want some attribution or something like that. >>>>>>> >>>>>>> Thanks, >>>>>>> Shannon >>>>>>> >>>>>>
