This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zlgonza...@yahoo.com> wrote: > Hi, > I was wondering if anyone has encountered or used Beam in the following > manner: > > 1. During machine learning training, use Beam to create the event table. > The flow may consist of some joins, aggregations, row-based > transformations, etc... > 2. Once the model is created, deploy the model to some scoring service > via PMML (or some other scoring service). > 3. Enable the SAME transformations used in #1 by using a separate engine > but thereby guaranteeing that it will transform the data identically as the > engine used in #1. > > I think this is a pretty interesting use case where Beam is used to > guarantee portability across engines and deployment (batch to true > streaming, not micro-batch). What's not clear to me is with respect to how > batch joins would translate during one-by-one scoring (probably lookups) or > how aggregations given that some kind of history would need to be stored > (and how much is kept is configurable too). > > Thoughts? > > Thanks, > Ron >