Re: Testing Multilanguage Pipelines?

Robert Bradshaw via dev Wed, 28 Dec 2022 09:50:33 -0800

On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev
<dev@beam.apache.org> wrote:
>
> > Given the increasing importance of multi language pipelines, it does seem 
> > that we should expand the capabilities of the DirectRunner or just go all 
> > in on FlinkRunner for testing and local / small scale development
>
> +1 - annecdotally I've found local testing of multi-language pipelines to be 
> tricky, and have had multiple conversations with others who have run into 
> similar challenges in multiple contexts (both users and people working on the 
> project).

I generally do all my testing against the Python runner which works
well. This is, of course, more natural for Python pipelines using
other languages, but when I was working on typescript which uses
cross-language even more heavily I just made it auto-start the python
runner just like the expansion services are auto-started which works
quite well. (The auto-started runner is just a plain-old portable
runner speaking the runner API, so no additional support is required
on the source side once it's started. And if you're already trying to
use dataframes and/or ML, you need to have Python available anyway.)

We could consider bundling it as a docker image to reduce the required
dependency set, but we'd have to solve the docker-in-docker issue to
do that.

I really think it's important to make cross-language a first-class
citizen--the end use should not care most of the time whether the
pipelines they use are native or not.

> On Wed, Dec 28, 2022 at 7:50 AM Sachin Agarwal via dev <dev@beam.apache.org> 
> wrote:
>>
>> Given the increasing importance of multi language pipelines, it does seem 
>> that we should expand the capabilities of the DirectRunner or just go all in 
>> on FlinkRunner for testing and local / small scale development
>>
>> On Wed, Dec 28, 2022 at 12:47 AM Robert Burke <rob...@frantil.com> wrote:
>>>
>>> Probably either on Flink, or the Python Portable runner at this juncture.
>>>
>>> On Tue, Dec 27, 2022, 8:40 PM Byron Ellis via dev <dev@beam.apache.org> 
>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I spent some more time adding things to my dbt-for-Beam clone 
>>>> (https://github.com/apache/beam/pull/24670) and actually made a fair 
>>>> amount of progress, including starting to add in the profile support so I 
>>>> can start to run it against real workloads (though at the moment only the 
>>>> "test" connector is properly configured). More interestingly, though, is 
>>>> adding in support for Python Dataframe external transforms... which 
>>>> expands properly, but then (unsurprisingly) hangs if you try to actually 
>>>> run the pipeline with Java's TestPipeline.
>>>>
>>>> I was wondering how people go about testing Java/Python hybrid pipelines 
>>>> locally? The Java<->Python tests don't seem to actually execute a 
>>>> pipeline, but I was hoping that maybe the direct runner could be set up 
>>>> properly to do that?
>>>>
>>>> Best,
>>>> B

Re: Testing Multilanguage Pipelines?

Reply via email to