Jinsong, what do you mean by the batch data is too large?

To my knowledge, nothing requires an SDK/runner to hold the entire side
input in memory. Lists, maps, iterables, ... can all be broken up into
smaller segments which can be loaded, cached and discarded separately.

On Thu, Aug 24, 2017 at 5:10 PM, Mingmin Xu <mingm...@gmail.com> wrote:

> wanna bring up this thread as we're looking for similar feature in SQL.
> --Please point me if something is there, I don't find any JIRA task.
> Now the streaming+batch/batch+batch join is implemented with sideInput.
> It's not a one-fit-all rule as Jingsong mentioned, the batch data may be
> too large, and it would be changed periodically. A userland PTransform
> sounds a more straight-forward option, as it doesn't require support in
> runner level.
> Mingmin
> On Mon, Jul 17, 2017 at 8:56 PM, JingsongLee <lzljs3620...@aliyun.com>
> wrote:
> > Sorry for so long to reply.
> > Hi, Aljoscha, I think Async I/O operator and Batch the same, and Async is
> > a better interface. All IO-related operations may be more appropriate
> >  for asynchronous use. Just like you said, the beginning
> > is like no any special support by the Runners.
> > I really like Luke's idea, let the user see a SeekableRea
> > d + Sideinput interface, and in the runner layer will
> > optimize it to the direct access to external
> > store. This requires a suitable SeekableRead interface and more efficient
> > compiler optimization.
> > Kenn's idea is exciting. If we can have an interface similar
> >  to FileSystem (Maybe like SeekableRead), abstract and unify a interface
> > for multiple of KV stores, we can let users to see only the concept
> > of Beam rather than the specific KVStore.
> > Best, Jingsong Lee
> > ------------------------------------------------------------
> ------From:Kenneth
> > Knowles <k...@google.com.INVALID>Time:2017 Jul 7 (Fri) 11:43To:dev <
> > dev@beam.apache.org>Cc:JingsongLee <lzljs3620...@aliyun.com>Subject:Re:
> > [PROPOSAL] External Join with KV Stores
> > In the streams/tables way of talking, side inputs are tables. External KV
> > stores are basically also [globally windowed] tables. Both
> > are time-varying.
> >
> > I think it makes perfect sense to access an external KV store in userland
> > directly rather than listen to its changelog and reproduce the same table
> > as a multimap side input. I'm sure many users are already doing this. I'm
> > sure users will always do this. Providing a common interface (simpler
> than
> > Filesystem) and helpful transform(s) in an extension module seems nice.
> > Does it require any support in the core SDK?
> >
> > If I understand, Luke & Robert, you favor adding metadata to Read/SDF so
> > that a user _does_ write it as a changelog listener that is observed as a
> > multimap side input, and each runner optimizes it if they can to just
> > directly access the KV store? A runner is free to use any kind of storage
> > they like to materialize a side input anyhow, so this is surely possible,
> > but it is a "sufficiently smart compiler" issue. As for semantics, I'm
> not
> > worried about availability - it is globally windowed and always
> available.
> > But I think this requires retractions to be correctly equivalent to
> direct
> > access.
> >
> > I think we can have a userland PTransform in much less time than a model
> > concept, so I favor it.
> >
> > Kenn
> >
> >
> --
> ----
> Mingmin

Reply via email to