On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev <dev@beam.apache.org> wrote: > > > Given the increasing importance of multi language pipelines, it does seem > > that we should expand the capabilities of the DirectRunner or just go all > > in on FlinkRunner for testing and local / small scale development > > +1 - annecdotally I've found local testing of multi-language pipelines to be > tricky, and have had multiple conversations with others who have run into > similar challenges in multiple contexts (both users and people working on the > project).
I generally do all my testing against the Python runner which works well. This is, of course, more natural for Python pipelines using other languages, but when I was working on typescript which uses cross-language even more heavily I just made it auto-start the python runner just like the expansion services are auto-started which works quite well. (The auto-started runner is just a plain-old portable runner speaking the runner API, so no additional support is required on the source side once it's started. And if you're already trying to use dataframes and/or ML, you need to have Python available anyway.) We could consider bundling it as a docker image to reduce the required dependency set, but we'd have to solve the docker-in-docker issue to do that. I really think it's important to make cross-language a first-class citizen--the end use should not care most of the time whether the pipelines they use are native or not. > On Wed, Dec 28, 2022 at 7:50 AM Sachin Agarwal via dev <dev@beam.apache.org> > wrote: >> >> Given the increasing importance of multi language pipelines, it does seem >> that we should expand the capabilities of the DirectRunner or just go all in >> on FlinkRunner for testing and local / small scale development >> >> On Wed, Dec 28, 2022 at 12:47 AM Robert Burke <rob...@frantil.com> wrote: >>> >>> Probably either on Flink, or the Python Portable runner at this juncture. >>> >>> On Tue, Dec 27, 2022, 8:40 PM Byron Ellis via dev <dev@beam.apache.org> >>> wrote: >>>> >>>> Hi all, >>>> >>>> I spent some more time adding things to my dbt-for-Beam clone >>>> (https://github.com/apache/beam/pull/24670) and actually made a fair >>>> amount of progress, including starting to add in the profile support so I >>>> can start to run it against real workloads (though at the moment only the >>>> "test" connector is properly configured). More interestingly, though, is >>>> adding in support for Python Dataframe external transforms... which >>>> expands properly, but then (unsurprisingly) hangs if you try to actually >>>> run the pipeline with Java's TestPipeline. >>>> >>>> I was wondering how people go about testing Java/Python hybrid pipelines >>>> locally? The Java<->Python tests don't seem to actually execute a >>>> pipeline, but I was hoping that maybe the direct runner could be set up >>>> properly to do that? >>>> >>>> Best, >>>> B