I'm sure I could use some of the existing aggregations as a guide on how to
make aggregations to fill the gap of missing ones. Such as creating
Sum/Max/Min.

GroupBy is really already handled with GroupByKey and CoGroupByKey unless
you are thinking of a different type of GroupBy?

- Shannon

On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <[email protected]> wrote:

> Maybe also adding Aggregation/GroupBy as utilities?
>
>
> -Rui
>
> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <[email protected]>
> wrote:
>
>> Thanks Valentyn,
>>
>> I'll outline the utilities and accept any suggestions to add / modify.
>> These are really just shortcut PTransforms that I am working on to simplify
>> creating pipelines.
>>
>> Currently the utilities contain the following PTransforms:
>>
>> - Inner Join
>> - Left Outer Join
>> - Right Outer Join
>> - Full Outer Join
>> - PrepareKey (For selecting items in a dictionary to act as a key for the
>> joins)
>> - Select (very simple filter that returns only items you want from the
>> dictionary) (allows for defining a default nullValue)
>>
>> Currently these operations only work with dictionaries, but I'd be
>> interested to see how it would work for <K,V> tuples.
>>
>> I'm new to python so they may not be optimized or the best way, but from
>> my understanding these seem to be the best way to do these types of
>> operations. Essentially I created a pipeline to be able to convert a simple
>> sql query into a flow of these utilities. Using prepareKey to define your
>> joining key, joining, and then selecting from the join allows you to do a
>> lot of powerful manipulation in a simple / familiar way.
>>
>> If this is something that we'd like to add to the Beam SDK I don't mind
>> looking at the contributor license agreement, and conversing more on how to
>> get them in.
>>
>> Thanks,
>> Shannon
>>
>>
>>
>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> Hi Shannon,
>>>
>>> Thanks for considering a contribution to Beam Python SDK. With a direct
>>> contribution to Beam SDK, your change will reach larger audience of users,
>>> and you will not have to maintain a separate project and keep it up to date
>>> with new releases of Beam.
>>>
>>> I encourage you to take a look at https://beam.apache.org/contribute/ for
>>> general advice on how to get started. To echo some points mentioned in the
>>> guide:
>>>
>>> - If your change is large or it is your first change, it is a good idea
>>> to discuss it on the dev@ mailing list
>>> - For large changes create a design doc (template, examples) and email
>>> it to the dev@ mailing list.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
>>> [email protected]> wrote:
>>>
>>>> I have been writing a bunch of utilities for the python SDK such as
>>>> joins, selections, composite transforms, etc...
>>>>
>>>> I am working with my company to see if I can open source the utilities.
>>>> Would it be best to post them on a separate PyPi project, or to PR them
>>>> into the beam SDK? I assume if they let me open source it they will want
>>>> some attribution or something like that.
>>>>
>>>> Thanks,
>>>> Shannon
>>>>
>>>

Reply via email to