Yeah these are for local testing right now. I was hoping to gain insight on
better naming.

I was thinking of creating an "extras" module.

On Mon, Jul 8, 2019, 12:28 PM Robin Qiu <robi...@google.com> wrote:

> Hi Shannon,
>
> Thanks for sharing the repo! I took a quick look and I have a concern with
> the naming of the transforms.
>
> Currently, Beam Java already have "Select" and "Join" transforms. However,
> they work on schemas, a feature that is not yet implemented in Beam Python.
> (See
> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
> )
>
> To maintain consistency between SDKs, I think it is good to avoid having
> two different transforms with the same name but different functions. So
> maybe you can consider renaming the transforms or/and putting it in an
> extension Python module, instead of the main ones?
>
> Best,
> Robin
>
> On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan <joseph.dun...@liveramp.com>
> wrote:
>
>> As a follow up. Here is the repo that contains the utilities for now.
>> https://github.com/shadowcodex/apache-beam-utilities. Will put together
>> a proper PR as code gets closer to production quality.
>>
>> - Shannon
>>
>> On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan <joseph.dun...@liveramp.com>
>> wrote:
>>
>>> Thanks Frederik,
>>>
>>> That's exactly where I was looking. I did get permission to open source
>>> the utilities module. So I'm going to throw them up on my personal github
>>> soon and share with the email group for a look over.
>>>
>>> I'm going to work on the utilities there because it's a quick dev
>>> environment and then once they are ready for proper PR I'll begin working
>>> them into the actual SDK for a PR.
>>>
>>> I also joined the slack #beam and #beam-python channels, I was unsure of
>>> where most collaborators discussed items.
>>>
>>> - Shannon
>>>
>>> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode <frederik.b...@ml6.eu>
>>> wrote:
>>>
>>>> Hi Shannon,
>>>>
>>>> This is probably a good starting point:
>>>> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
>>>> .
>>>>
>>>> Frederik
>>>>
>>>> [image: https://ml6.eu]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ml6.eu_&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=yd_him24QhfROm7uRZLbfSsUHaA68_8FMl6s1MgT5sM&e=>
>>>>
>>>>
>>>> * Frederik Bode*
>>>>
>>>> ML6 Ghent
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.be_maps_place_ML6_-4051.037408-2C3.7044893-2C17z_data-3D-213m1-214b1-214m5-213m4-211s0x47c37161feeca14b-3A0xb8f72585fdd21c90-218m2-213d51.037408-214d3.706678-3Fhl-3Dnl&d=DwMFaQ&c=fP4tf--1dS0biCFlB0saz0I0kjO5v7-GLPtvShAo4cc&r=pVqtPRV3xHPbewK5Cnv1OugvWbha6Poxqp5n4ssIg74&m=FLed4d0BjB5-R2hz9IHrat47LfDj7YhMNHbEVeZ0dw8&s=26TZxPGXg0A_mqgeiw1lMeZYekpkExBAZ5MpavpUZmw&e=>
>>>> +32 4 92 78 96 18
>>>>
>>>>
>>>> **** DISCLAIMER ****
>>>>
>>>> This email and any files transmitted with it are confidential and
>>>> intended solely for the use of the individual or entity to whom they are
>>>> addressed. If you have received this email in error please notify the
>>>> system manager. This message contains confidential information and is
>>>> intended only for the individual named. If you are not the named addressee
>>>> you should not disseminate, distribute or copy this e-mail. Please notify
>>>> the sender immediately by e-mail if you have received this e-mail by
>>>> mistake and delete this e-mail from your system. If you are not the
>>>> intended recipient you are notified that disclosing, copying, distributing
>>>> or taking any action in reliance on the contents of this information is
>>>> strictly prohibited.
>>>>
>>>>
>>>> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan <joseph.dun...@liveramp.com>
>>>> wrote:
>>>>
>>>>> I'm sure I could use some of the existing aggregations as a guide on
>>>>> how to make aggregations to fill the gap of missing ones. Such as creating
>>>>> Sum/Max/Min.
>>>>>
>>>>> GroupBy is really already handled with GroupByKey and CoGroupByKey
>>>>> unless you are thinking of a different type of GroupBy?
>>>>>
>>>>> - Shannon
>>>>>
>>>>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang <ruw...@google.com> wrote:
>>>>>
>>>>>> Maybe also adding Aggregation/GroupBy as utilities?
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>
>>>>>>> Thanks Valentyn,
>>>>>>>
>>>>>>> I'll outline the utilities and accept any suggestions to add /
>>>>>>> modify. These are really just shortcut PTransforms that I am working on 
>>>>>>> to
>>>>>>> simplify creating pipelines.
>>>>>>>
>>>>>>> Currently the utilities contain the following PTransforms:
>>>>>>>
>>>>>>> - Inner Join
>>>>>>> - Left Outer Join
>>>>>>> - Right Outer Join
>>>>>>> - Full Outer Join
>>>>>>> - PrepareKey (For selecting items in a dictionary to act as a key
>>>>>>> for the joins)
>>>>>>> - Select (very simple filter that returns only items you want from
>>>>>>> the dictionary) (allows for defining a default nullValue)
>>>>>>>
>>>>>>> Currently these operations only work with dictionaries, but I'd be
>>>>>>> interested to see how it would work for <K,V> tuples.
>>>>>>>
>>>>>>> I'm new to python so they may not be optimized or the best way, but
>>>>>>> from my understanding these seem to be the best way to do these types of
>>>>>>> operations. Essentially I created a pipeline to be able to convert a 
>>>>>>> simple
>>>>>>> sql query into a flow of these utilities. Using prepareKey to define 
>>>>>>> your
>>>>>>> joining key, joining, and then selecting from the join allows you to do 
>>>>>>> a
>>>>>>> lot of powerful manipulation in a simple / familiar way.
>>>>>>>
>>>>>>> If this is something that we'd like to add to the Beam SDK I don't
>>>>>>> mind looking at the contributor license agreement, and conversing more 
>>>>>>> on
>>>>>>> how to get them in.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shannon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <
>>>>>>> valen...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi Shannon,
>>>>>>>>
>>>>>>>> Thanks for considering a contribution to Beam Python SDK. With a
>>>>>>>> direct contribution to Beam SDK, your change will reach larger 
>>>>>>>> audience of
>>>>>>>> users, and you will not have to maintain a separate project and keep 
>>>>>>>> it up
>>>>>>>> to date with new releases of Beam.
>>>>>>>>
>>>>>>>> I encourage you to take a look at
>>>>>>>> https://beam.apache.org/contribute/ for general advice on how to
>>>>>>>> get started. To echo some points mentioned in the guide:
>>>>>>>>
>>>>>>>> - If your change is large or it is your first change, it is a good
>>>>>>>> idea to discuss it on the dev@ mailing list
>>>>>>>> - For large changes create a design doc (template, examples) and
>>>>>>>> email it to the dev@ mailing list.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Valentyn
>>>>>>>>
>>>>>>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
>>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>>
>>>>>>>>> I have been writing a bunch of utilities for the python SDK such
>>>>>>>>> as joins, selections, composite transforms, etc...
>>>>>>>>>
>>>>>>>>> I am working with my company to see if I can open source the
>>>>>>>>> utilities. Would it be best to post them on a separate PyPi project, 
>>>>>>>>> or to
>>>>>>>>> PR them into the beam SDK? I assume if they let me open source it 
>>>>>>>>> they will
>>>>>>>>> want some attribution or something like that.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Shannon
>>>>>>>>>
>>>>>>>>

Reply via email to