The Beam portability framework will enable this in Java too; not sure we can do much sooner than that!
On Fri, Feb 24, 2017 at 3:33 PM, Amit Sela <amitsel...@gmail.com> wrote: > That's great! many people have asked me about that and I'm glad to see this > happening. > Anyone know if there's something at work for the Java SDK (assuming I don't > want to wait for Fn API support) ? > > On Fri, Feb 24, 2017 at 8:44 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Fantastic ! > > > > That's a great addition and awesome to see that with Beam ! > > > > Regards > > JB > > > > On 02/24/2017 02:51 AM, Robert Bradshaw wrote: > > > One thing I'm really excited about this library is that it allows one > to > > > more easily express transforms on columnar data (which is useful beyond > > > just ML). For example, if your input elements have two fields "x" and > "y" > > > then you can write functions like > > > > > > def preprocessing_fn(inputs): > > > x_centered = tft.map(lambda x, mean: x - mean, inputs['x'], > > > tft.mean(inputs['x'])) > > > y_normalized = tft.scale_to_0_1(inputs['y']) > > > return { > > > 'x_centered': x_centered, > > > 'y_normalized': y_normalized, > > > 'x_centered_times_y_normalized': tft.map(operations.mul, > > > x_centered, y_normalized) > > > } > > > > > > # Read PCollection of dicts with 'x' and 'y' keys and numeric values > > > input = p | Read(...) > > > > > > # output will contain dicts with 'x_centered', 'y_normalized', and > > > 'x_centered_times_y_normalized' keys > > > # with the expected values, and fn can be used to transform other data > > > using the > > > # statistics (mean, mins, and maxes) without re-analysis. > > > output, fn = (input, schema) | > > > beam_impl.AnalyzeAndTransformDataset(preprocessing_fn) > > > > > > This automatically injects the relevant global aggregations (which can > be > > > interleaved) and builds up tensorflow graphs to apply the > transformations > > > very efficiently. > > > > > > > > > On Thu, Feb 23, 2017 at 4:55 PM, Davor Bonaci <da...@apache.org> > wrote: > > > > > >> Beam and TensorFlow coming together -- a big deal for us! > > >> > > >> On Thu, Feb 23, 2017 at 3:49 PM, Ahmet Altay <al...@google.com.invalid > > > > >> wrote: > > >> > > >>> Hi all, > > >>> > > >>> Yesterday, there was an announcement from TensorFlow community about > > the > > >>> new tf.Transform library [1]. It is a library that allows users to > > define > > >>> pre-processing pipelines and run using large scale data processing > > >>> frameworks. It is a library specifically designed to work with Apache > > >> Beam. > > >>> It is great to see Python SDK getting a larger ecosystem and > increased > > >>> usage. > > >>> > > >>> Also worth mentioning is, PMC member Robert Bradshaw was one of the > > >>> contributors to this new library. > > >>> > > >>> Thank you, > > >>> Ahmet > > >>> > > >>> [1] https://research.googleblog.com/2017/02/preprocessing-for- > machine- > > >>> learning-with.html > > >>> > > >> > > > > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >