Hello all, I would like to bring your attention to a small project to integrate TensorFlow with Apache Spark, called TensorFrames. With this library, you can map, reduce or aggregate numerical data stored in Spark dataframes using TensorFlow computation graphs. It is published as a Spark package and available in this github repository:
https://github.com/tjhunter/tensorframes More detailed examples can be found in the user guide: https://github.com/tjhunter/tensorframes/wiki/TensorFrames-user-guide This is a technical preview at this point. I am looking forward to some feedback about the current python API if some adventurous users want to try it out. Of course, contributions are most welcome, for example to fix bugs or to add support for platforms other than linux-x86_64. It should support all the most common inputs in dataframes (dense tensors of rank 0, 1, 2 of ints, longs, floats and doubles). Please note that this is not an endorsement by Databricks of TensorFlow, or any other deep learning framework for that matter. If users want to use deep learning in production, some other more robust solutions are available: SparkNet, CaffeOnSpark, DeepLearning4J. Best regards Tim Hunter