+1 having a NeedsRam(x) annotation would be incredibly helpful. On Fri, 13 Nov 2020 at 05:57, Robert Burke <[email protected]> wrote:
> (Disclaimer, Mirac and their team did approach me about this beforehand as > their interest is in the Go SDK.) > > +1 I think it's a good idea. As you've pointed out, there are many > opportunities for optional pipeline analysis here as well. > > A strawman counter point would be to re-used the static DisplayData for > this kind of thing, but I think that's not necessarily the same thing. It's > very hard to get something that's purely intended for Human consumption to > also be suitable for machine consumption, without various adapters and > such, and it would be an awful hack. Having something specifically for > Machines to understand is valuable in and of itself. > > I appreciate the versatility of simply using known URNs and their defined > formats, and especially keeping the proposal to optional annotations that > don't affect correctness. This will work well with most DoFns that need > specialized hardware. They can usually be emulated on ordinary CPUs, which > is good for testing, but can perform much better if the hardware is > available. This also allows the runners to move execution of specific DoFns > to the machines with the specialized hardware, for better scheduling of > resources. > > I look forward to the PR, and before then, all the discussion the > community has about this new field in the model proto. > > > > > > On Thu, 12 Nov 2020 at 09:41, Mirac Vuslat Basaran <[email protected]> > wrote: > >> Hi all, >> >> We would like to propose adding functionality to add annotations to Beam >> transforms. These annotations would be readable by the runner, and the >> runner could then act on this information; for example by doing some >> special resource allocation. There have been discussions around annotations >> (or hints as they are sometimes called) in the past ( >> https://lists.apache.org/thread.html/rdf247cfa3a509f80578f03b2454ea1e50474ee3576a059486d58fdf4%40%3Cdev.beam.apache.org%3E, >> >> https://lists.apache.org/thread.html/fc090d8acd96c4cf2d23071b5d99f538165d3ff7fbe6f65297655309%40%3Cdev.beam.apache.org%3E). >> This proposal aims to come up with an accepted lightweight solution with a >> follow-up Pull Request to implement it in Go. >> >> By annotations, we refer to optional information / hints provided to the >> runner. This proposal explicitly excludes “required” annotations that could >> cause incorrect output. A runner that does not understand the annotations >> and ignores them must still produce correct output, with perhaps a >> degradation in performance or other nonfunctional requirements. Supporting >> only “optional” annotations allows for compatibility with runners that do >> not recognize those annotations. >> >> A good example of an optional annotation is marking a transform to be run >> on GPU or TPU or that it needs a certain amount of RAM. If the runner knows >> about this annotation, it can then allocate the requested resources for >> that transform only to improve performance and avoid using these scarce >> resources for other transforms. >> >> Another example of an optional annotation is marking a transform to run >> on secure hardware, or to give hints to profiling/dynamic analysis tools. >> >> In all these cases, the runner can run the pipeline with or without the >> annotation, and in both cases the same output would be produced. There >> would be differences in nonfunctional requirements (performance, security, >> ease of profiling), hence the optional part. >> >> A counter-example that this proposal explicitly excludes is marking a >> transform as requiring sorted input. For example, on a transform that >> expects time-sorted input in order to produce the correct output. If the >> runner ignores this requirement, it would risk producing an incorrect >> output. In order to avoid this, we exclude these required annotations. >> >> Implementation-wise, we propose to add a field: >> - map<string, bytes> annotations = 8; >> to PTransform proto ( >> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L127). >> The key would be a URN that uniquely identifies the type of annotation. The >> value is an opaque byte array (e.g., a serialized protocol buffer) to allow >> for maximum flexibility to the implementation of that specific type of >> annotation. >> >> We have a specific interest in adding this to the Go SDK. In Go, the user >> would specify the annotations to a structural ParDo as follows, by defining >> a field: >> - Annotations map[string][]byte >> and filling it out. For simplicity, we will only support structural doFns >> in Go for the time being. >> >> The runners could then read the annotations from the PTransform proto and >> support the annotations that they would like to in the way they want. >> >> Please let me know what you think, and what would be the best way to >> proceed, e.g., we can share a small design doc or, in case there are no >> major objections, directly create a pull request for Go where we can >> discuss the implementation details. >> >> Best, >> Mirac and team >> >
