Hi,

Note that there's a JIRA about this -
https://issues.apache.org/jira/browse/BEAM-1197 - but no solution yet :)

On Fri, Apr 14, 2017 at 10:58 AM Dan Halperin <[email protected]>
wrote:

> Hi Jingsong,
>
> This seems like a fantastic, reusable pattern to add, and indeed it's a
> fairly common one. There are probably some interesting API issues too --
> such as how you make a nice clean interface that works for many backends
> (Bigtable? HBase? Redis? Memcache? etc.), and how you let users supply a
> caching policy.
>
> It sounds like you may have already worked through these -- would you like
> to write down what you've learned and send out a short proposal?
>
> Thanks!
>
> On Thu, Apr 13, 2017 at 8:40 AM, JingsongLee <[email protected]>
> wrote:
>
> > Hi all,
> >
> >
> > I've seen repeatedly the following pattern:
> > Consider a sql (Joining stream to table, from Calcite):
> > SELECT STREAM o.rowtime, o.productId, o.orderId, o.units,
> >   p.name, p.unitPrice
> > FROM Orders AS o
> > JOIN Products AS p
> >   ON o.productId = p.productId;
> > A stream-to-table join is straightforward if the contents of the table
> are
> > not
> > changing(or slowly changing). This query enriches a stream of orders with
> > each product’s list price.
> >
> > This table is mostly in HBase or Mysql or Redis. Most of our users think
> > that
> > we should use SideInputs to implement it. But there are some difficulties
> > here:
> > 1.Maybe this table is very large! AFAIK, SideInputs will load all data to
> > internal.
> > We can not load all, but we can do some caching work.
> > 2.This table may be updated periodically. As mentioned in
> > https://issues.apache.org/jira/browse/BEAM-1197
> > 3.Sometimes users want to update this table, in some scene which doesn’t
> > need high accuracy. (The read and write to the external storage can’t
> > guarantee
> > Exacly-Once)
> >
> > So we developed a component called DimState(Maybe the name is not right).
> > Use cache(It is LoadingCache now) or load all.  They all have
> Time-To-Live
> > mechanism. An abstract interface is called ExternalState. There are
> > HBaseState, JDBCState, RedisState. It is accessed by key and namespace.
> > Provides bulk access to the external table for performance.
> >
> > Is there a better way to implement it? Can we make some abstracts in Beam
> > Model?
> >
> > What do you think?
> >
> > Best,
> > JingsongLee
> >
>

Reply via email to