I'm generally in favor of viewing these as seekable reads rather than
an entirely new concept. Not sure how it would fit into the SDFs
architecture.

On Wed, Jul 5, 2017 at 10:27 AM, Lukasz Cwik <lc...@google.com.invalid> wrote:
> Yes, I was thinking the same thing about side inputs. Our current IOs don't
> support "seeking" and we could make HBaseIO/JdbcIO/... become seekable by
> key+window which would allow a Runner to optimize the Read + SideInput into
> any kind of deferred lookup when its accessed as a side input instead of
> loading it all into state. The Runner could interrogate the properties of
> the "seekable" IO to see if its compatible with what the user is doing
> before performing the optimization. Granted I believe it will be difficult
> to express when something becomes available, how to handle updates to the
> external store, etc...
>
> What I like about modelling it as seekable IOs + Runner optimization is
> that users don't need to change their pipeline to get benefits when they
> upgrade to newer versions of Apache Beam.
>
> On Tue, Jul 4, 2017 at 9:48 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Hi,
>>
>> This is a very interesting proposal! I read you comment about side inputs
>> and I tend to agree, though I think that side inputs don’t have to be
>> strictly streams. It’s easily possible to imagine a Beam where a side input
>> can be based on an external system and accessing side input simply goes
>> through to the external system. In this world, it would be somewhat hard to
>> reason about side input availability and making sure to only process main
>> input when side-input is available. Though it’s not unsolvable, I think.
>>
>> What I like about your solution is that it is implementable as a DoFn,
>> without any special support by the Runners. However, I think that in the
>> Flink Runner it should be possible to execute this with the Async I/O
>> operator and therefore get asynchronous accesses to the external system. I
>> also think that this is not always better than batching, though.
>>
>> Best,
>> Aljoscha
>> > On 3. Jul 2017, at 04:36, JingsongLee <lzljs3620...@aliyun.com> wrote:
>> >
>> > Hi all:
>> > In some scenarios, the user needs to query some information from
>> external kv store in the pipeline.I think we can have a good abstraction
>> that allows users to get as little as possible with the underlying
>> details.Here is a docs of this proposal, would like to receive your
>> feedback.
>> > https://docs.google.com/document/d/1B-XnUwXh64lbswRieckU0BxtygSV58hy
>> sqZbpZmk03A/edit?usp=sharing
>> > Best, Jingsong Lee
>> >
>>
>>

Reply via email to