I think it makes a lot of sense to move it to the Beam web site. There's already a good landing point: https://beam.apache.org/contribute/runner-guide/
That page is a collection of advice for legacy-style runners on how to use runners-core, etc, and just general stuff about how to write one, gathered from lots of emails I've written in response to questions from runner authors. The places I think need update specifically for impulse are: - conceptual list of primitives: https://beam.apache.org/contribute/runner-guide/#ptransforms (it also does not say CoGroupByKey because AFAIK the work to switch to it is not done) - how to implement the primitives: https://beam.apache.org/contribute/runner-guide/#implementing-the-beam-primitives The how-to-implement part needs a rewrite to just talk about how to use the new utilities to do fusion and implement the runners side of the Fn API. It is deliberatley more of a tutorial and code walk than a reference doc, so it is not redundant with the existing docs. As for SDF, I think that ParDo is useful to talk about as "elementwise" processing only at the high level, but needs to immediately be split into "vanilla", "stateful" and "SDF" which are really different primitive modes of computation. And bigger picture, the last bit on https://beam.apache.org/contribute/runner-guide/#writing-an-sdk-independent-runner should be pulled up to be the main topic. Kenn On Fri, May 18, 2018 at 8:14 AM Lukasz Cwik <lc...@google.com> wrote: > The Beam Runner API doc needs a lot of updating to discuss impulse and > SDF, (and deprecate / remove Read): https://s.apache.org/beam-runner-api > It could also use examples from Go/Python <https://goto.google.com/Python> > code base. > > Alternatively we could start to codify this information on the Apache Beam > website as the definitions/contracts are less influx. > > On Fri, May 18, 2018 at 7:49 AM Eugene Kirpichov <kirpic...@google.com> > wrote: > >> Hi Ismael, >> Impulse is a primitive necessary for the Portability world, where sources >> do not exist. Impulse is the only possible root of the pipeline, it emits a >> single empty byte array, and it's all DoFn's and SDF's from there. E.g. >> when using Fn API, Read.from(BoundedSource) is translated into: Impulse + >> ParDo(emit source) + ParDo(call .split()) + reshuffle + ParDo(call >> .createReader() and read from it). >> Agree that it makes sense to document it somewhere on the portability >> page. >> >> On Fri, May 18, 2018 at 7:21 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Fully agree. >>> >>> I already started to take a look. >>> >>> Regards >>> JB >>> >>> On 18/05/2018 16:12, Ismaël Mejía wrote: >>> > I have seen multiple mentions of 'Impulse' in JIRAs and some on other >>> > discussions, but have not seen any document or concrete explanation on >>> > what's Impulse and why we need it. This seems like an internal >>> > implementation detail but it is probably a good idea to explain it >>> > somewhere (my excuses if this is in some document and I missed it). >>> > >>> >>