Per-key ordered delivery makes a ton of sense. I'd guess CDC has the same
needs as retractions, so that the changelog can be applied in order as it
arrives. And since it is per-key you still get parallelism.

Global ordering is quite different. I know that SQL and Dataframes have
global sorting operations. The question has always been how does
"embarassingly paralllel" processing interact with sorting/ordering. I
imagine some other systems have the features so we can look at how it is
used?

Kenn

Kenn

On Mon, May 10, 2021 at 4:39 PM Sam Rohde <sro...@google.com> wrote:

> Awesome, thanks Pablo!
>
> On Mon, May 10, 2021 at 4:05 PM Pablo Estrada <pabl...@google.com> wrote:
>
>> CDC would also benefit. I am working on a proposal for this that is
>> concerned with streaming pipelines, and per-key ordered delivery. I will
>> share with you as soon as I have a draft.
>> Best
>> -P.
>>
>> On Mon, May 10, 2021 at 2:56 PM Reuven Lax <re...@google.com> wrote:
>>
>>> There has been talk, but nothing concrete.
>>>
>>> On Mon, May 10, 2021 at 1:42 PM Sam Rohde <sro...@google.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I was wondering if there had been any plans for creating ordered
>>>> PCollections in the Beam model? Or if there might be plans for them in the
>>>> future?
>>>>
>>>> I know that Beam SQL and Beam DataFrames would directly benefit from an
>>>> ordered PCollection.
>>>>
>>>> Regards,
>>>> Sam
>>>>
>>>

Reply via email to