Maybe also adding Aggregation/GroupBy as utilities?

-Rui

On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <[email protected]>
wrote:

> Thanks Valentyn,
>
> I'll outline the utilities and accept any suggestions to add / modify.
> These are really just shortcut PTransforms that I am working on to simplify
> creating pipelines.
>
> Currently the utilities contain the following PTransforms:
>
> - Inner Join
> - Left Outer Join
> - Right Outer Join
> - Full Outer Join
> - PrepareKey (For selecting items in a dictionary to act as a key for the
> joins)
> - Select (very simple filter that returns only items you want from the
> dictionary) (allows for defining a default nullValue)
>
> Currently these operations only work with dictionaries, but I'd be
> interested to see how it would work for <K,V> tuples.
>
> I'm new to python so they may not be optimized or the best way, but from
> my understanding these seem to be the best way to do these types of
> operations. Essentially I created a pipeline to be able to convert a simple
> sql query into a flow of these utilities. Using prepareKey to define your
> joining key, joining, and then selecting from the join allows you to do a
> lot of powerful manipulation in a simple / familiar way.
>
> If this is something that we'd like to add to the Beam SDK I don't mind
> looking at the contributor license agreement, and conversing more on how to
> get them in.
>
> Thanks,
> Shannon
>
>
>
> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> Hi Shannon,
>>
>> Thanks for considering a contribution to Beam Python SDK. With a direct
>> contribution to Beam SDK, your change will reach larger audience of users,
>> and you will not have to maintain a separate project and keep it up to date
>> with new releases of Beam.
>>
>> I encourage you to take a look at https://beam.apache.org/contribute/ for
>> general advice on how to get started. To echo some points mentioned in the
>> guide:
>>
>> - If your change is large or it is your first change, it is a good idea
>> to discuss it on the dev@ mailing list
>> - For large changes create a design doc (template, examples) and email it
>> to the dev@ mailing list.
>>
>> Thanks,
>> Valentyn
>>
>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <[email protected]>
>> wrote:
>>
>>> I have been writing a bunch of utilities for the python SDK such as
>>> joins, selections, composite transforms, etc...
>>>
>>> I am working with my company to see if I can open source the utilities.
>>> Would it be best to post them on a separate PyPi project, or to PR them
>>> into the beam SDK? I assume if they let me open source it they will want
>>> some attribution or something like that.
>>>
>>> Thanks,
>>> Shannon
>>>
>>

Reply via email to