All of these tools are reasonable choices. I don't think the Spark project
itself has a view on what works best. These things do different things. For
example petastorm is not a training framework, but a way to feed data to a
distributed DL training process on Spark. For what it's worth, Databricks
ships Horovod and Petastorm, but that doesn't mean the other projects are
second-class.

On Tue, Jun 1, 2021 at 4:59 PM Gourav Sengupta <
gourav.sengupta.develo...@gmail.com> wrote:

> Dear TD, Matei, Michael, Reynold,
>
> I hope all of you and your loved ones are staying safe and doing well.
>
> as a member of the community the direction from the SPARK mentors is
> getting to be a bit confusing for me and I was wondering if I can seek your
> help.
>
> We have to make long term decisions which is aligned with the open source
> SPARK compatibility and directions and it will be wonderful to know what is
> the most dependable route to get data from SPARK to tensorflow, is it:
> 1. petastorm
> 2. horovod
> 3. tensorflowonspark
> 4. spark_tensorflow_distributor
> or something else.
>
>
> Any comments from you will be super useful.
>
> If I am not wrong, seamless integration between SPARK to tensorflow/
> pytorch was one of the most exciting visions of SPARK 3.x
>
> While using SPARK ML has its own favourite space, I think that tensorflow
> and pytorch will see a lot of focused development as well.
>
>
> Regards,
> Gourav Sengupta
>

Reply via email to