Hi Shannon,

This is probably a good starting point:
https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
.

Frederik

[image: https://ml6.eu]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__ml6.eu_&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=yd_him24QhfROm7uRZLbfSsUHaA68_8FMl6s1MgT5sM&e=>


* Frederik Bode*

ML6 Ghent
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.be_maps_place_ML6_-4051.037408-2C3.7044893-2C17z_data-3D-213m1-214b1-214m5-213m4-211s0x47c37161feeca14b-3A0xb8f72585fdd21c90-218m2-213d51.037408-214d3.706678-3Fhl-3Dnl&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=26TZxPGXg0A_mqgeiw1lMeZYekpkExBAZ5MpavpUZmw&e=>
+32 4 92 78 96 18


**** DISCLAIMER ****

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


On Mon, 8 Jul 2019 at 15:40, Shannon Duncan <[email protected]>
wrote:

> I'm sure I could use some of the existing aggregations as a guide on how
> to make aggregations to fill the gap of missing ones. Such as creating
> Sum/Max/Min.
>
> GroupBy is really already handled with GroupByKey and CoGroupByKey unless
> you are thinking of a different type of GroupBy?
>
> - Shannon
>
> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <[email protected]> wrote:
>
>> Maybe also adding Aggregation/GroupBy as utilities?
>>
>>
>> -Rui
>>
>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <[email protected]>
>> wrote:
>>
>>> Thanks Valentyn,
>>>
>>> I'll outline the utilities and accept any suggestions to add / modify.
>>> These are really just shortcut PTransforms that I am working on to simplify
>>> creating pipelines.
>>>
>>> Currently the utilities contain the following PTransforms:
>>>
>>> - Inner Join
>>> - Left Outer Join
>>> - Right Outer Join
>>> - Full Outer Join
>>> - PrepareKey (For selecting items in a dictionary to act as a key for
>>> the joins)
>>> - Select (very simple filter that returns only items you want from the
>>> dictionary) (allows for defining a default nullValue)
>>>
>>> Currently these operations only work with dictionaries, but I'd be
>>> interested to see how it would work for <K,V> tuples.
>>>
>>> I'm new to python so they may not be optimized or the best way, but from
>>> my understanding these seem to be the best way to do these types of
>>> operations. Essentially I created a pipeline to be able to convert a simple
>>> sql query into a flow of these utilities. Using prepareKey to define your
>>> joining key, joining, and then selecting from the join allows you to do a
>>> lot of powerful manipulation in a simple / familiar way.
>>>
>>> If this is something that we'd like to add to the Beam SDK I don't mind
>>> looking at the contributor license agreement, and conversing more on how to
>>> get them in.
>>>
>>> Thanks,
>>> Shannon
>>>
>>>
>>>
>>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]>
>>> wrote:
>>>
>>>> Hi Shannon,
>>>>
>>>> Thanks for considering a contribution to Beam Python SDK. With a direct
>>>> contribution to Beam SDK, your change will reach larger audience of users,
>>>> and you will not have to maintain a separate project and keep it up to date
>>>> with new releases of Beam.
>>>>
>>>> I encourage you to take a look at https://beam.apache.org/contribute/ for
>>>> general advice on how to get started. To echo some points mentioned in the
>>>> guide:
>>>>
>>>> - If your change is large or it is your first change, it is a good idea
>>>> to discuss it on the dev@ mailing list
>>>> - For large changes create a design doc (template, examples) and email
>>>> it to the dev@ mailing list.
>>>>
>>>> Thanks,
>>>> Valentyn
>>>>
>>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
>>>> [email protected]> wrote:
>>>>
>>>>> I have been writing a bunch of utilities for the python SDK such as
>>>>> joins, selections, composite transforms, etc...
>>>>>
>>>>> I am working with my company to see if I can open source the
>>>>> utilities. Would it be best to post them on a separate PyPi project, or to
>>>>> PR them into the beam SDK? I assume if they let me open source it they 
>>>>> will
>>>>> want some attribution or something like that.
>>>>>
>>>>> Thanks,
>>>>> Shannon
>>>>>
>>>>

Reply via email to