Thanks Frederik,

That's exactly where I was looking. I did get permission to open source the
utilities module. So I'm going to throw them up on my personal github soon
and share with the email group for a look over.

I'm going to work on the utilities there because it's a quick dev
environment and then once they are ready for proper PR I'll begin working
them into the actual SDK for a PR.

I also joined the slack #beam and #beam-python channels, I was unsure of
where most collaborators discussed items.

- Shannon

On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode <frederik.b...@ml6.eu> wrote:

> Hi Shannon,
>
> This is probably a good starting point:
> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
> .
>
> Frederik
>
> [image: https://ml6.eu]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ml6.eu_&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=yd_him24QhfROm7uRZLbfSsUHaA68_8FMl6s1MgT5sM&e=>
>
>
> * Frederik Bode*
>
> ML6 Ghent
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.be_maps_place_ML6_-4051.037408-2C3.7044893-2C17z_data-3D-213m1-214b1-214m5-213m4-211s0x47c37161feeca14b-3A0xb8f72585fdd21c90-218m2-213d51.037408-214d3.706678-3Fhl-3Dnl&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=26TZxPGXg0A_mqgeiw1lMeZYekpkExBAZ5MpavpUZmw&e=>
> +32 4 92 78 96 18
>
>
> **** DISCLAIMER ****
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan <joseph.dun...@liveramp.com>
> wrote:
>
>> I'm sure I could use some of the existing aggregations as a guide on how
>> to make aggregations to fill the gap of missing ones. Such as creating
>> Sum/Max/Min.
>>
>> GroupBy is really already handled with GroupByKey and CoGroupByKey unless
>> you are thinking of a different type of GroupBy?
>>
>> - Shannon
>>
>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <ruw...@google.com> wrote:
>>
>>> Maybe also adding Aggregation/GroupBy as utilities?
>>>
>>>
>>> -Rui
>>>
>>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
>>>> Thanks Valentyn,
>>>>
>>>> I'll outline the utilities and accept any suggestions to add / modify.
>>>> These are really just shortcut PTransforms that I am working on to simplify
>>>> creating pipelines.
>>>>
>>>> Currently the utilities contain the following PTransforms:
>>>>
>>>> - Inner Join
>>>> - Left Outer Join
>>>> - Right Outer Join
>>>> - Full Outer Join
>>>> - PrepareKey (For selecting items in a dictionary to act as a key for
>>>> the joins)
>>>> - Select (very simple filter that returns only items you want from the
>>>> dictionary) (allows for defining a default nullValue)
>>>>
>>>> Currently these operations only work with dictionaries, but I'd be
>>>> interested to see how it would work for <K,V> tuples.
>>>>
>>>> I'm new to python so they may not be optimized or the best way, but
>>>> from my understanding these seem to be the best way to do these types of
>>>> operations. Essentially I created a pipeline to be able to convert a simple
>>>> sql query into a flow of these utilities. Using prepareKey to define your
>>>> joining key, joining, and then selecting from the join allows you to do a
>>>> lot of powerful manipulation in a simple / familiar way.
>>>>
>>>> If this is something that we'd like to add to the Beam SDK I don't mind
>>>> looking at the contributor license agreement, and conversing more on how to
>>>> get them in.
>>>>
>>>> Thanks,
>>>> Shannon
>>>>
>>>>
>>>>
>>>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <valen...@google.com>
>>>> wrote:
>>>>
>>>>> Hi Shannon,
>>>>>
>>>>> Thanks for considering a contribution to Beam Python SDK. With a
>>>>> direct contribution to Beam SDK, your change will reach larger audience of
>>>>> users, and you will not have to maintain a separate project and keep it up
>>>>> to date with new releases of Beam.
>>>>>
>>>>> I encourage you to take a look at https://beam.apache.org/contribute/ for
>>>>> general advice on how to get started. To echo some points mentioned in the
>>>>> guide:
>>>>>
>>>>> - If your change is large or it is your first change, it is a good
>>>>> idea to discuss it on the dev@ mailing list
>>>>> - For large changes create a design doc (template, examples) and email
>>>>> it to the dev@ mailing list.
>>>>>
>>>>> Thanks,
>>>>> Valentyn
>>>>>
>>>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>
>>>>>> I have been writing a bunch of utilities for the python SDK such as
>>>>>> joins, selections, composite transforms, etc...
>>>>>>
>>>>>> I am working with my company to see if I can open source the
>>>>>> utilities. Would it be best to post them on a separate PyPi project, or 
>>>>>> to
>>>>>> PR them into the beam SDK? I assume if they let me open source it they 
>>>>>> will
>>>>>> want some attribution or something like that.
>>>>>>
>>>>>> Thanks,
>>>>>> Shannon
>>>>>>
>>>>>

Reply via email to