+1

On Thu, Mar 21, 2024 at 6:30 PM Robert Bradshaw via dev <dev@beam.apache.org>
wrote:

> I would be more comfortable with a default for FlatMap than overloading
> Flatten in this way. Distinguishing between
>
>     (pcoll,) | beam.Flatten()
>
> and
>
>     (pcoll) | beam.Flatten()
>
> seems a bit error prone.
>
>
> On Thu, Mar 21, 2024 at 2:23 PM Joey Tran <joey.t...@schrodinger.com>
> wrote:
>
>> Ah, I misunderstood your original suggestion then. That makes sense then.
>> I have already seen someone get a little confused about the names and
>> surprised that Flatten doesn't do what FlatMap does.
>>
>> On Thu, Mar 21, 2024 at 5:20 PM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> Beam throws an error at submission time in Python if you pass a single
>>> PCollection  to Flatten. The scenario you describe concerns a one-element
>>> list.
>>>
>>> On Thu, Mar 21, 2024, 13:43 Joey Tran <joey.t...@schrodinger.com> wrote:
>>>
>>>> I think it'd be quite surprising if beam.Flatten would become
>>>> equivalent to FlatMap if passed only a single pcollection. One use case
>>>> that would be broken from that is cases where someone might be flattening a
>>>> variable number of pcollections, including possibly only one pcollection.
>>>> In that case, that single pcollection suddenly get FlatMapped.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> One possible alternative is to define beam.Flatten for a single
>>>>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), but
>>>>> that would be a larger change and such behavior might need to be
>>>>> consistent across SDKs and documented. Adding a default value is a simpler
>>>>> change.
>>>>>
>>>>> I can also confirm that the usage
>>>>>
>>>>>     |  'Flatten' >> beam.FlatMap(lambda x: x)
>>>>>
>>>>> is fairly common by inspecting uses of Beam internally.
>>>>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the
>>>>>> first of which does what you want.
>>>>>>
>>>>>> Giving FlatMap a default arg of lambda x: x is an interesting idea.
>>>>>> The only downside I see is a less clear error if one forgets to provide
>>>>>> this (now mandatory) parameter, but maybe that's low enough to be worth 
>>>>>> the
>>>>>> convenience?
>>>>>>
>>>>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <joey.t...@schrodinger.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's not really the same thing, is it? `beam.Flatten` combines two
>>>>>>> or more pcollections into a single pcollection while beam.FlatMap 
>>>>>>> unpacks
>>>>>>> iterables of elements (i.e. PCollection<Iterable<T>> -> PCollection<T>)
>>>>>>>
>>>>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi, you can use beam.Flatten() instead.
>>>>>>>>
>>>>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran <
>>>>>>>> joey.t...@schrodinger.com> wrote:
>>>>>>>>
>>>>>>>>> Hey all,
>>>>>>>>>
>>>>>>>>> Using an identity function for FlatMap comes up more often than
>>>>>>>>> using FlatMap without an identity function. Would it make sense to 
>>>>>>>>> use the
>>>>>>>>> identity function as a default?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Reply via email to