I think for now it's fine to just require Singleton partitioning for this. In the future we could add a couple optimizations: - Recognize elementwise np.ufunc implementations. I think we can do this by looking at the signature [1]. - Allow the user to indicate their function is elementwise with a beam-specific argument (as Robert suggested).
[1] https://numpy.org/doc/stable/reference/generated/numpy.ufunc.signature.html#numpy.ufunc.signature On Fri, Apr 30, 2021 at 11:52 AM Robert Bradshaw <[email protected]> wrote: > On Fri, Apr 30, 2021 at 7:04 AM Irwin Alejandro Rodriguez Ramirez > <[email protected]> wrote: > > > > Awesome, thanks! It helps me a lot, > > You're welcome. Looking forward to a PR :). > > > Now I don't know how to tell if the callable would act on a full column > or will be pure elementwise, there are some examples of this? > > I don't think it's possible to figure this out in general. which is > why we'd have to take it as explicit user input or use the Singleton > partitioning (which brings everything to the same machine where it > doesn't matter as the full columns would then be available). > > > On Wed, Apr 28, 2021 at 7:57 PM Robert Bradshaw <[email protected]> > wrote: > >> > >> Hi Irwin, > >> > >> Looking forward to your first contribution! > >> > >> For combine_first, reading the documentation, is completely elementwise. > >> One could implement it as > >> > >> > https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/dataframe/frames.py#L182 > >> > >> and then update the tests to allow this > >> > >> > https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/dataframe/pandas_doctests_test.py#L98 > >> > >> The plaine old combine has the unfortunate property that the passed > >> callable may act on a full column, but in practice is often > >> elementwise. It could be implemented similar to the non-pearson > >> variant of corr: > >> > >> > https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/dataframe/frames.py#L636 > >> > >> requiring Singleton partitioning. One could consider adding an extra > >> flag "elementwise" which would allow one to only require Index > >> partitioning. > >> > >> > >> > >> > >> On Wed, Apr 28, 2021 at 5:00 PM Irwin Alejandro Rodriguez Ramirez > >> <[email protected]> wrote: > >> > > >> > Hi team, > >> > > >> > I'm a new contributor at Beam, and I'm trying to implement the > methods combine and combine_first from BEAM-12017, I couldn't solve it yet, > I was looking for some suggestions on how to implement these methods. > >> > I would appreciate any help you can provide. > >> > > >> > > >> > -- > >> > > >> > Irwin Alejandro Rodríguez Ramírez | WIZELINE > >> > > >> > Software Engineer > >> > > >> > [email protected] | +52 1(55) 6694 6649 > <+52%2055%206694%206649> > >> > > >> > Paseo de la Reforma #296, Piso 32, Col. Juárez, Del. Cuauhtémoc, > 06600 CDMX. > >> > > >> > This email and its contents (including any attachments) are being > sent to > >> > you on the condition of confidentiality and may be protected by legal > >> > privilege. Access to this email by anyone other than the intended > recipient > >> > is unauthorized. If you are not the intended recipient, please > immediately > >> > notify the sender by replying to this message and delete the material > >> > immediately from your system. Any further use, dissemination, > distribution > >> > or reproduction of this email is strictly prohibited. Further, no > >> > representation is made with respect to any content contained in this > email. > > > > > > This email and its contents (including any attachments) are being sent to > > you on the condition of confidentiality and may be protected by legal > > privilege. Access to this email by anyone other than the intended > recipient > > is unauthorized. If you are not the intended recipient, please > immediately > > notify the sender by replying to this message and delete the material > > immediately from your system. Any further use, dissemination, > distribution > > or reproduction of this email is strictly prohibited. Further, no > > representation is made with respect to any content contained in this > email. >
