On Wed, Jan 6, 2021 at 12:28 PM Robert Burke <rob...@frantil.com> wrote:

> +1 on consolidating and being consistent with our terms.
>
> I've always considered them (Runner/Engine) synonymous. From a user
> perspective, an engine without a runner isn't any good for their beam
> pipeline. That there's an adapter is an implementation detail in some
> instances. I do appreciate not using Adapter a term, avoiding confusing
> descriptions.
>
> However, if we make the change and there's a clear glossary of terms
> somewhere then
>
> That puts the lifecycle of a pipeline to be (loosely) something like...
>
> A Beam User authors Pipelines by writing DoFns, adding them as PTransforms
> connected by PCollections into a Pipeline using a Beam SDK. An SDK converts
> the pipeline into a portable representation, and submit it to the Job
> Management Service of a Beam Runner. A Beam Runner translates the portable
> pipeline representation into terms an underlying Engine understands for
> Execution. The Beam Runner also reverses this translation when the Engine
> delegates tasks to workers, so that the Beam SDKs can execute the user's
> DoFns in keeping with the Beam Semantics.
>

An explicit glossary is a great idea to combine with standardizing
terminology across the site. I think the important context is that most of
the engines already existed before Beam and many of them are more
well-known. In fact, a pretty good way for a user to understand the essence
of what Beam is about is by taking a look at all the engines for which
there are Beam runners :-)

Engine: a system/product for doing [big] data processing
Pipeline: user authors this logic that says what they want to compute (I
think the fact that it is a DAG of PTransforms is relevant but we can get
away with omitting it for the high-level view and to avoid introducing the
term PTransform too early)
Runner: executes a Beam pipeline on an engine (agree that "adapter" is too
generic)

I'd say below that level of granularity is getting into things that you
need to know only after you have started writing pipelines. Possibly you
need to introduce SDK harness to make clear that Beam pipelines are
inherently multi-language/multi-runtime, even if the engine isn't (my
personal opinion is that "UDF server" is the best understood terminology
for this, and so much better that it is never too late to abandon the
cryptic term "SDK harness").

Kenn


> (Not covered, bundles etc, but you get the idea...)
>
> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw <rober...@google.com> wrote:
>
>> +1 to keeping the distinction between Runner and Engine as Kenn
>> described, and cleaning up the site with these in mind (I don't think the
>> term engine is widely used yet).
>>
>> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang <zyi...@google.com> wrote:
>>
>>> I agree with what kenn said, in most cases I would refer to the term
>>> runner as the adapter for translating user's pipeline code into a job
>>> representation and submitting it to the execution engine. Though in some
>>> cases they may still be used interchangeably such as direct runner?
>>>
>>> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> I personally try to always distinguish two concepts: the thing doing
>>>> the computing (like Spark or Flink), and the adapter for running a Beam
>>>> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
>>>> the adapter, and have been trying to use the term "engine" to refer to the
>>>> thing doing the computing. Do you think that users will use these two
>>>> interchangeably? Do you have recommendations about if these terms makes
>>>> sense to users?
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas <g...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi dev@ community, Happy New Year!
>>>>>
>>>>> I'm working on updating the copy of a few website pages, and something
>>>>> that I want to solve is standardize how we refer to runners across the
>>>>> site. So far I've identified these definitions:
>>>>>
>>>>>    - Back-end
>>>>>    - Backend systems
>>>>>    - Execution environments
>>>>>    - Runtime
>>>>>    - Runtime system
>>>>>    - Runner
>>>>>
>>>>> Even when the majority of users will understand these concepts
>>>>> interchangeably, it's a good idea to be consistent so new users get
>>>>> familiar with how Beam works and its components.
>>>>>
>>>>> I'm going to start using the word "Runner" as I update the copy and
>>>>> will ask the team working in te UI revamp to do the same. Let me know if
>>>>> you have any questions/concerns.
>>>>>
>>>>

Reply via email to