Hi Bitfox,

yes distributed training using Pytorch and Tensorflow is really superb and
great and you are spot on. There is actually absolutely no need for
solutions like Ray/ Petastorm etc...

But in case I want to pre process data in SPARK and push the results to
these deep learning libraries, then what do we do? Because creating
professional quality data loaders is a very big job, therefore, these
solutions try to occupy that space as an entry point.


Regards,
Gourav Sengupta



On Thu, Feb 24, 2022 at 1:21 PM Bitfox <bit...@bitfox.top> wrote:

> I have been using tensorflow for a long time, it's not hard to implement a
> distributed training job at all, either by model parallelization or data
> parallelization. I don't think there is much need to develop spark to
> support tensorflow jobs. Just my thoughts...
>
>
> On Thu, Feb 24, 2022 at 4:36 PM Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I do not think that there is any reason for using over engineered
>> platforms like Petastorm and Ray, except for certain use cases.
>>
>> What Ray is doing, except for certain use cases, could have been easily
>> done by SPARK, I think, had the open source community got that steer. But
>> maybe I am wrong and someone should be able to explain why the SPARK open
>> source community cannot develop the capabilities which are so natural to
>> almost all use cases of data processing in SPARK where the data gets
>> consumed by deep learning frameworks and we are asked to use Ray or
>> Petastorm?
>>
>> For those of us who are asking what does native integrations means please
>> try to compare delta between release 2.x and 3.x and koalas before 3.2 and
>> after 3.2.
>>
>> I am sure that the SPARK community can push for extending the dataframes
>> from SPARK to deep learning and other frameworks by natively integrating
>> them.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>> On Wed, Feb 23, 2022 at 4:42 PM Dennis Suhari <d.suh...@icloud.com.invalid>
>> wrote:
>>
>>> Currently we are trying AnalyticsZoo and Ray
>>>
>>>
>>> Von meinem iPhone gesendet
>>>
>>> Am 23.02.2022 um 04:53 schrieb Bitfox <bit...@bitfox.top>:
>>>
>>> 
>>> tensorflow itself can implement the distributed computing via a
>>> parameter server. Why did you want spark here?
>>>
>>> regards.
>>>
>>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>>> <vijayant.ku...@mavenir.com.invalid> wrote:
>>>
>>>> Thanks Sean for your response. !!
>>>>
>>>>
>>>>
>>>> Want to add some more background here.
>>>>
>>>>
>>>>
>>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>>
>>>> My use case is not for the image data but for the Time-series data
>>>> where I am using LSTM and transformers to forecast.
>>>>
>>>>
>>>>
>>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>>> there has been no major development recently on those libraries. I faced
>>>> the issue of version dependencies on those and had a hard time fixing the
>>>> library compatibilities. Hence a couple of below doubts:-
>>>>
>>>>
>>>>
>>>>    - Does *Horovod* have any dependencies?
>>>>    - Any other library which is suitable for my use case.?
>>>>    - Any example code would really be of great help to understand.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>>
>>>>
>>>>
>>>> *From:* Sean Owen <sro...@gmail.com>
>>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>>> *To:* Vijayant Kumar <vijayant.ku...@mavenir.com.invalid>
>>>> *Cc:* user @spark <user@spark.apache.org>
>>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>>
>>>>
>>>>
>>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware
>>>> of Phishing Scams, Report questionable emails to s...@mavenir.com
>>>>
>>>> Sure, Horovod is commonly used on Spark for this:
>>>>
>>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>>
>>>>
>>>>
>>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>>> vijayant.ku...@mavenir.com.invalid> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> Anyone using Apache spark with TensorFlow for building models. My
>>>> requirement is to use TensorFlow distributed model training across the
>>>> Spark executors.
>>>>
>>>> Please help me with some resources or some sample code.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> secur...@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> secur...@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>

Reply via email to