+1 On Thu, Mar 21, 2024 at 6:30 PM Robert Bradshaw via dev <dev@beam.apache.org> wrote:
> I would be more comfortable with a default for FlatMap than overloading > Flatten in this way. Distinguishing between > > (pcoll,) | beam.Flatten() > > and > > (pcoll) | beam.Flatten() > > seems a bit error prone. > > > On Thu, Mar 21, 2024 at 2:23 PM Joey Tran <joey.t...@schrodinger.com> > wrote: > >> Ah, I misunderstood your original suggestion then. That makes sense then. >> I have already seen someone get a little confused about the names and >> surprised that Flatten doesn't do what FlatMap does. >> >> On Thu, Mar 21, 2024 at 5:20 PM Valentyn Tymofieiev <valen...@google.com> >> wrote: >> >>> Beam throws an error at submission time in Python if you pass a single >>> PCollection to Flatten. The scenario you describe concerns a one-element >>> list. >>> >>> On Thu, Mar 21, 2024, 13:43 Joey Tran <joey.t...@schrodinger.com> wrote: >>> >>>> I think it'd be quite surprising if beam.Flatten would become >>>> equivalent to FlatMap if passed only a single pcollection. One use case >>>> that would be broken from that is cases where someone might be flattening a >>>> variable number of pcollections, including possibly only one pcollection. >>>> In that case, that single pcollection suddenly get FlatMapped. >>>> >>>> >>>> >>>> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev < >>>> dev@beam.apache.org> wrote: >>>> >>>>> One possible alternative is to define beam.Flatten for a single >>>>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), but >>>>> that would be a larger change and such behavior might need to be >>>>> consistent across SDKs and documented. Adding a default value is a simpler >>>>> change. >>>>> >>>>> I can also confirm that the usage >>>>> >>>>> | 'Flatten' >> beam.FlatMap(lambda x: x) >>>>> >>>>> is fairly common by inspecting uses of Beam internally. >>>>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev < >>>>> dev@beam.apache.org> wrote: >>>>> >>>>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the >>>>>> first of which does what you want. >>>>>> >>>>>> Giving FlatMap a default arg of lambda x: x is an interesting idea. >>>>>> The only downside I see is a less clear error if one forgets to provide >>>>>> this (now mandatory) parameter, but maybe that's low enough to be worth >>>>>> the >>>>>> convenience? >>>>>> >>>>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <joey.t...@schrodinger.com> >>>>>> wrote: >>>>>> >>>>>>> That's not really the same thing, is it? `beam.Flatten` combines two >>>>>>> or more pcollections into a single pcollection while beam.FlatMap >>>>>>> unpacks >>>>>>> iterables of elements (i.e. PCollection<Iterable<T>> -> PCollection<T>) >>>>>>> >>>>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev < >>>>>>> dev@beam.apache.org> wrote: >>>>>>> >>>>>>>> Hi, you can use beam.Flatten() instead. >>>>>>>> >>>>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran < >>>>>>>> joey.t...@schrodinger.com> wrote: >>>>>>>> >>>>>>>>> Hey all, >>>>>>>>> >>>>>>>>> Using an identity function for FlatMap comes up more often than >>>>>>>>> using FlatMap without an identity function. Would it make sense to >>>>>>>>> use the >>>>>>>>> identity function as a default? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>