Can the prefix still be generated programmatically at graph creation time? On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw <rober...@google.com> wrote:
> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <re...@google.com> wrote: > >> This does seem like the best compromise, though I think there will still >> end up being performance issues. A common pattern I've seen is that there >> is a long common prefix to the dynamic destination followed the dynamic >> component. e.g. the destination might be >> long/common/path/to/destination/files/<per-user-file>. In this case, the >> prefix is often much larger than messages themselves and is what gets >> effectively encoded in the lambda. >> > > The idea here is that the destination would be given as a format string, > say, "long/common/path/to/destination/files/{dest_info.user}". Another way > to put this is that we support (only) "lambdas" that are represented as > string substitutions. (The fact that dest_info does not have to be part of > the record, and can be the output of an arbitrary map if need be, makes > this restriction not so bad.) > > As well as solving the performance issues, I think this is actually a > pretty convenient and natural way for the user to name their destination > (for the common usecase, even easier than providing a lambda), and has the > benefit of being much more transparent than an arbitrary callable as well > for introspection (for both machine and human that may look at the > resulting pipeline). > > >> I'm not entirely sure how to address this in a portable context. We might >> simply have to accept the extra overhead when going cross language. >> >> Reuven >> >> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> Thanks for putting this together, it will be a really useful feature to >>> have. >>> >>> I am in favor of the string-pattern approaches. I think we need to >>> support both the {record=..., dest_info=...} and the elide-fields >>> approaches, as the former is nicer when one has a fixed representation for >>> the output record (e.g. a proto or avro schema) and the flattened form for >>> ease of use in more free-form contexts (e.g. when producing records from >>> YAML and SQL). >>> >>> Also left some comments on the doc. >>> >>> >>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev < >>> dev@beam.apache.org> wrote: >>> >>>> Hey all, >>>> >>>> There have been some conversations lately about how best to enable >>>> dynamic destinations in a portable context. Usually, this comes up for >>>> cross-language transforms and more recently for Beam YAML. >>>> >>>> I've started a short doc outlining some routes we could take. The >>>> purpose is to establish a good standard for supporting dynamic destinations >>>> with portability, one that can be applied to most use cases and IOs. Please >>>> take a look and add any thoughts! >>>> >>>> https://s.apache.org/portable-dynamic-destinations >>>> >>>> Best, >>>> Ahmed >>>> >>>