Re: Supporting Dynamic Destinations in a portable context

Reuven Lax via dev Wed, 27 Mar 2024 10:21:08 -0700

Can the prefix still be generated programmatically at graph creation time?

On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw <[email protected]> wrote:


> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <[email protected]> wrote:
>
>> This does seem like the best compromise, though I think there will still
>> end up being performance issues. A common pattern I've seen is that there
>> is a long common prefix to the dynamic destination followed the dynamic
>> component. e.g. the destination might be
>> long/common/path/to/destination/files/<per-user-file>. In this case, the
>> prefix is often much larger than messages themselves and is what gets
>> effectively encoded in the lambda.
>>
>
> The idea here is that the destination would be given as a format string,
> say, "long/common/path/to/destination/files/{dest_info.user}". Another way
> to put this is that we support (only) "lambdas" that are represented as
> string substitutions. (The fact that dest_info does not have to be part of
> the record, and can be the output of an arbitrary map if need be, makes
> this restriction not so bad.)
>
> As well as solving the performance issues, I think this is actually a
> pretty convenient and natural way for the user to name their destination
> (for the common usecase, even easier than providing a lambda), and has the
> benefit of being much more transparent than an arbitrary callable as well
> for introspection (for both machine and human that may look at the
> resulting pipeline).
>
>
>> I'm not entirely sure how to address this in a portable context. We might
>> simply have to accept the extra overhead when going cross language.
>>
>> Reuven
>>
>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>> [email protected]> wrote:
>>
>>> Thanks for putting this together, it will be a really useful feature to
>>> have.
>>>
>>> I am in favor of the string-pattern approaches. I think we need to
>>> support both the {record=..., dest_info=...} and the elide-fields
>>> approaches, as the former is nicer when one has a fixed representation for
>>> the output record (e.g. a proto or avro schema) and the flattened form for
>>> ease of use in more free-form contexts (e.g. when producing records from
>>> YAML and SQL).
>>>
>>> Also left some comments on the doc.
>>>
>>>
>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>>> [email protected]> wrote:
>>>
>>>> Hey all,
>>>>
>>>> There have been some conversations lately about how best to enable
>>>> dynamic destinations in a portable context. Usually, this comes up for
>>>> cross-language transforms and more recently for Beam YAML.
>>>>
>>>> I've started a short doc outlining some routes we could take. The
>>>> purpose is to establish a good standard for supporting dynamic destinations
>>>> with portability, one that can be applied to most use cases and IOs. Please
>>>> take a look and add any thoughts!
>>>>
>>>> https://s.apache.org/portable-dynamic-destinations
>>>>
>>>> Best,
>>>> Ahmed
>>>>
>>>

Re: Supporting Dynamic Destinations in a portable context

Reply via email to