Hi Jingsong, This seems like a fantastic, reusable pattern to add, and indeed it's a fairly common one. There are probably some interesting API issues too -- such as how you make a nice clean interface that works for many backends (Bigtable? HBase? Redis? Memcache? etc.), and how you let users supply a caching policy.
It sounds like you may have already worked through these -- would you like to write down what you've learned and send out a short proposal? Thanks! On Thu, Apr 13, 2017 at 8:40 AM, JingsongLee <[email protected]> wrote: > Hi all, > > > I've seen repeatedly the following pattern: > Consider a sql (Joining stream to table, from Calcite): > SELECT STREAM o.rowtime, o.productId, o.orderId, o.units, > p.name, p.unitPrice > FROM Orders AS o > JOIN Products AS p > ON o.productId = p.productId; > A stream-to-table join is straightforward if the contents of the table are > not > changing(or slowly changing). This query enriches a stream of orders with > each product’s list price. > > This table is mostly in HBase or Mysql or Redis. Most of our users think > that > we should use SideInputs to implement it. But there are some difficulties > here: > 1.Maybe this table is very large! AFAIK, SideInputs will load all data to > internal. > We can not load all, but we can do some caching work. > 2.This table may be updated periodically. As mentioned in > https://issues.apache.org/jira/browse/BEAM-1197 > 3.Sometimes users want to update this table, in some scene which doesn’t > need high accuracy. (The read and write to the external storage can’t > guarantee > Exacly-Once) > > So we developed a component called DimState(Maybe the name is not right). > Use cache(It is LoadingCache now) or load all. They all have Time-To-Live > mechanism. An abstract interface is called ExternalState. There are > HBaseState, JDBCState, RedisState. It is accessed by key and namespace. > Provides bulk access to the external table for performance. > > Is there a better way to implement it? Can we make some abstracts in Beam > Model? > > What do you think? > > Best, > JingsongLee >
