On Wed, Dec 28, 2022 at 9:49 AM Robert Bradshaw <rober...@google.com> wrote:

> On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev
> <dev@beam.apache.org> wrote:
> >
> > > Given the increasing importance of multi language pipelines, it does
> seem that we should expand the capabilities of the DirectRunner or just go
> all in on FlinkRunner for testing and local / small scale development
> >
> > +1 - annecdotally I've found local testing of multi-language pipelines
> to be tricky, and have had multiple conversations with others who have run
> into similar challenges in multiple contexts (both users and people working
> on the project).
>
> I generally do all my testing against the Python runner which works
> well. This is, of course, more natural for Python pipelines using
> other languages, but when I was working on typescript which uses
> cross-language even more heavily I just made it auto-start the python
> runner just like the expansion services are auto-started which works
> quite well. (The auto-started runner is just a plain-old portable
> runner speaking the runner API, so no additional support is required
> on the source side once it's started. And if you're already trying to
> use dataframes and/or ML, you need to have Python available anyway.)
>
> We could consider bundling it as a docker image to reduce the required
> dependency set, but we'd have to solve the docker-in-docker issue to
> do that.
>
> I really think it's important to make cross-language a first-class
> citizen--the end use should not care most of the time whether the
> pipelines they use are native or not.
>

Thanks! That's helpful. In this case getting the Python runner to
auto-start sounds like the most straightforward option for testing. After
all it's explicitly to provide Python initiated from Java so Python is
already going to be around and running (and in fact the test auto-starts
the Python expansion service already to get the graph in the first place)
and the deps are already going to be there. I'm personally on the fence
about Docker in these sorts of situations. Yes, it makes life easier for
the most part but gets complicated quickly. It's also not an option for
everyone. I'll give things a shot and report back (if you have an example
of auto-starting the Python runner that'd be cool too---if I get inspired I
might try to add that to the Python extensions in Java since right now they
don't actually appear to be exercising the runner itself based on the TODOs)

Best,
B




>
> > On Wed, Dec 28, 2022 at 7:50 AM Sachin Agarwal via dev <
> dev@beam.apache.org> wrote:
> >>
> >> Given the increasing importance of multi language pipelines, it does
> seem that we should expand the capabilities of the DirectRunner or just go
> all in on FlinkRunner for testing and local / small scale development
> >>
> >> On Wed, Dec 28, 2022 at 12:47 AM Robert Burke <rob...@frantil.com>
> wrote:
> >>>
> >>> Probably either on Flink, or the Python Portable runner at this
> juncture.
> >>>
> >>> On Tue, Dec 27, 2022, 8:40 PM Byron Ellis via dev <dev@beam.apache.org>
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I spent some more time adding things to my dbt-for-Beam clone (
> https://github.com/apache/beam/pull/24670) and actually made a fair
> amount of progress, including starting to add in the profile support so I
> can start to run it against real workloads (though at the moment only the
> "test" connector is properly configured). More interestingly, though, is
> adding in support for Python Dataframe external transforms... which expands
> properly, but then (unsurprisingly) hangs if you try to actually run the
> pipeline with Java's TestPipeline.
> >>>>
> >>>> I was wondering how people go about testing Java/Python hybrid
> pipelines locally? The Java<->Python tests don't seem to actually execute a
> pipeline, but I was hoping that maybe the direct runner could be set up
> properly to do that?
> >>>>
> >>>> Best,
> >>>> B
>

Reply via email to