Thanks Valentyn,

I'll outline the utilities and accept any suggestions to add / modify.
These are really just shortcut PTransforms that I am working on to simplify
creating pipelines.

Currently the utilities contain the following PTransforms:

- Inner Join
- Left Outer Join
- Right Outer Join
- Full Outer Join
- PrepareKey (For selecting items in a dictionary to act as a key for the
joins)
- Select (very simple filter that returns only items you want from the
dictionary) (allows for defining a default nullValue)

Currently these operations only work with dictionaries, but I'd be
interested to see how it would work for <K,V> tuples.

I'm new to python so they may not be optimized or the best way, but from my
understanding these seem to be the best way to do these types of
operations. Essentially I created a pipeline to be able to convert a simple
sql query into a flow of these utilities. Using prepareKey to define your
joining key, joining, and then selecting from the join allows you to do a
lot of powerful manipulation in a simple / familiar way.

If this is something that we'd like to add to the Beam SDK I don't mind
looking at the contributor license agreement, and conversing more on how to
get them in.

Thanks,
Shannon



On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <[email protected]>
wrote:

> Hi Shannon,
>
> Thanks for considering a contribution to Beam Python SDK. With a direct
> contribution to Beam SDK, your change will reach larger audience of users,
> and you will not have to maintain a separate project and keep it up to date
> with new releases of Beam.
>
> I encourage you to take a look at https://beam.apache.org/contribute/ for
> general advice on how to get started. To echo some points mentioned in the
> guide:
>
> - If your change is large or it is your first change, it is a good idea to
> discuss it on the dev@ mailing list
> - For large changes create a design doc (template, examples) and email it
> to the dev@ mailing list.
>
> Thanks,
> Valentyn
>
> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <[email protected]>
> wrote:
>
>> I have been writing a bunch of utilities for the python SDK such as
>> joins, selections, composite transforms, etc...
>>
>> I am working with my company to see if I can open source the utilities.
>> Would it be best to post them on a separate PyPi project, or to PR them
>> into the beam SDK? I assume if they let me open source it they will want
>> some attribution or something like that.
>>
>> Thanks,
>> Shannon
>>
>

Reply via email to