wanna bring up this thread as we're looking for similar feature in SQL. --Please point me if something is there, I don't find any JIRA task.
Now the streaming+batch/batch+batch join is implemented with sideInput. It's not a one-fit-all rule as Jingsong mentioned, the batch data may be too large, and it would be changed periodically. A userland PTransform sounds a more straight-forward option, as it doesn't require support in runner level. Mingmin On Mon, Jul 17, 2017 at 8:56 PM, JingsongLee <lzljs3620...@aliyun.com> wrote: > Sorry for so long to reply. > Hi, Aljoscha, I think Async I/O operator and Batch the same, and Async is > a better interface. All IO-related operations may be more appropriate > for asynchronous use. Just like you said, the beginning > is like no any special support by the Runners. > I really like Luke's idea, let the user see a SeekableRea > d + Sideinput interface, and in the runner layer will > optimize it to the direct access to external > store. This requires a suitable SeekableRead interface and more efficient > compiler optimization. > Kenn's idea is exciting. If we can have an interface similar > to FileSystem (Maybe like SeekableRead), abstract and unify a interface > for multiple of KV stores, we can let users to see only the concept > of Beam rather than the specific KVStore. > Best, Jingsong Lee > ------------------------------------------------------------------From:Kenneth > Knowles <k...@google.com.INVALID>Time:2017 Jul 7 (Fri) 11:43To:dev < > dev@beam.apache.org>Cc:JingsongLee <lzljs3620...@aliyun.com>Subject:Re: > [PROPOSAL] External Join with KV Stores > In the streams/tables way of talking, side inputs are tables. External KV > stores are basically also [globally windowed] tables. Both > are time-varying. > > I think it makes perfect sense to access an external KV store in userland > directly rather than listen to its changelog and reproduce the same table > as a multimap side input. I'm sure many users are already doing this. I'm > sure users will always do this. Providing a common interface (simpler than > Filesystem) and helpful transform(s) in an extension module seems nice. > Does it require any support in the core SDK? > > If I understand, Luke & Robert, you favor adding metadata to Read/SDF so > that a user _does_ write it as a changelog listener that is observed as a > multimap side input, and each runner optimizes it if they can to just > directly access the KV store? A runner is free to use any kind of storage > they like to materialize a side input anyhow, so this is surely possible, > but it is a "sufficiently smart compiler" issue. As for semantics, I'm not > worried about availability - it is globally windowed and always available. > But I think this requires retractions to be correctly equivalent to direct > access. > > I think we can have a userland PTransform in much less time than a model > concept, so I favor it. > > Kenn > > -- ---- Mingmin