Hi,

I actually thought that the proposal refers to Dataflow only. If this is supposed to be general, can we remove the Dataflow/Windmill specific parts and replace them with generic ones?

I'd have two more questions:

 a) the proposal is named "Slowly changing", why is the rate of change essential to the proposal? Once running on event time, that should not matter, or what am I missing?

 b) The description says: 'User wants to solve a stream enrichment problem. In brief request sounds like: ”I want to enrich each event in this stream by corresponding data from given table.”'. That is understandable, but would it be better to enable the user to express this intent directly (via Join operation)? The actual implementation might be runner (and input!) specific. The analogy is that when doing group-by-key operation, runner can choose hash grouping or sort-merge grouping, but that is not (directly) expressed in user code. I'm not saying that we should not have low-level transforms, just asking if it would be better to leave this decision to the runner (at least in some cases). It might be the case that we want to make core SDK as low level as possible (and as reasonable), I just want to make sure that that is really the intent.

Thanks for the proposal!

Jan

On 12/17/19 12:01 AM, Kenneth Knowles wrote:
I want to highlight that this design works for definitely more runners than just Dataflow. I see two pieces of it that I want to bring onto the thread:

1. A new kind of "unbounded source" which is a periodic refresh of a bounded source, and use that as a side input. Each main input element has a window that maps to a specific refresh of the side input. 2. Distributed map side inputs: supporting very large lookup tables, but with consistency challenges. Even the part about "windmill API" probably applies to other runners

So I hope the title and "Objective" section do not cause people to stop reading.

Kenn

On Mon, Dec 16, 2019 at 11:36 AM Mikhail Gryzykhin <mig...@google.com <mailto:mig...@google.com>> wrote:

    +some people explicitly

    Can you please check on the doc and comment if it looks fine?

    Thank you,
    --Mikhail

    On Tue, Dec 10, 2019 at 1:43 PM Mikhail Gryzykhin
    <mig...@google.com <mailto:mig...@google.com>> wrote:

        "Good news, everyone-"
        ―Farnsworth

        Hi everyone,

        Recently, I was looking into relaxing limitations on side
        inputs in Dataflow runner. As part of it, I came up with
        design proposal for standardizing slowly changing dimensions
        use case in Beam and relevant changes to add support for
        distributed map side inputs.

        Please review and comment on design doc.
        
<https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg>
 [1]

        Thank you,
        Mikhail.

        -----

        [1]
        
https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg

Reply via email to