Fantastic !

That's a great addition and awesome to see that with Beam !

Regards
JB

On 02/24/2017 02:51 AM, Robert Bradshaw wrote:
One thing I'm really excited about this library is that it allows one to
more easily express transforms on columnar data (which is useful beyond
just ML). For example, if your input elements have two fields "x" and "y"
then you can write functions like

def preprocessing_fn(inputs):
    x_centered = tft.map(lambda x, mean: x - mean, inputs['x'],
tft.mean(inputs['x']))
    y_normalized = tft.scale_to_0_1(inputs['y'])
    return {
        'x_centered': x_centered,
        'y_normalized': y_normalized,
        'x_centered_times_y_normalized': tft.map(operations.mul,
x_centered, y_normalized)
    }

# Read PCollection of dicts with 'x' and 'y' keys and numeric values
input = p | Read(...)

# output will contain dicts with 'x_centered', 'y_normalized', and
'x_centered_times_y_normalized' keys
# with the expected values, and fn can be used to transform other data
using the
# statistics (mean, mins, and maxes) without re-analysis.
output, fn = (input, schema) |
beam_impl.AnalyzeAndTransformDataset(preprocessing_fn)

This automatically injects the relevant global aggregations (which can be
interleaved) and builds up tensorflow graphs to apply the transformations
very efficiently.


On Thu, Feb 23, 2017 at 4:55 PM, Davor Bonaci <da...@apache.org> wrote:

Beam and TensorFlow coming together -- a big deal for us!

On Thu, Feb 23, 2017 at 3:49 PM, Ahmet Altay <al...@google.com.invalid>
wrote:

Hi all,

Yesterday, there was an announcement from TensorFlow community about the
new tf.Transform library [1]. It is a library that allows users to define
pre-processing pipelines and run using large scale data processing
frameworks. It is a library specifically designed to work with Apache
Beam.
It is great to see Python SDK getting a larger ecosystem and increased
usage.

Also worth mentioning is, PMC member Robert Bradshaw was one of the
contributors to this new library.

Thank you,
Ahmet

[1] https://research.googleblog.com/2017/02/preprocessing-for-machine-
learning-with.html




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to