Re: Spark for Image Processing Acceleration

2021-10-14 Thread Sean Owen
(The suggestion here is to use Tensorflow with Spark - definitely doable
for a long time with things like Horovod. Spark handles the image
processing just fine)

On Thu, Oct 14, 2021 at 10:17 AM Artemis User 
wrote:

> Spark is good with SQL type of structured data, not image data.  Unless
> you algorithms don' t require dealing with image data directly. I guess
> your best option would be to go with Tensorflow since it has image
> classification models built-in and can integrate with NVidia GPUs out of
> the box.  There is no out-of-the-box data source APIs for image data in
> Spark.  Hope this helps.
>
> -- ND
>
> On 10/13/21 11:54 PM, 刘沛文 wrote:
>
> Hi,
> My name is Peiwen. I'm working with Dr. Brain, an AI company focused on
> medical imaging processing and deep learning. Our website is
> http://drbrain.net/index_en.aspx
> We basically do 2 major things. 1. image process, like lesion drawing 2.
> deep learning for neural disease prediction, like stroke, Alzheimer's
> Disease.
> Currently we use Tensorflow and other deep learning frameworks. Due to the
> size of the medical image (1 ~ 5 GB per record), with traditional framework
> on single computer, it takes long time (a few hours) for data processing
> and model training before we get the result.
> I'm writing the email to check if there's some good solution that Apache
> Spark can provide to accelerate the calculation.
> I know Tensorflow can work with Spark. Just want to have a brief
> understanding that compared to traditional Tensorflow, how faster Apache
> Spark can help achieve, saying a cluster of 10 nodes.
>
> Thank you very much!
>
> Peiwen
>
>
>


Re: Spark for Image Processing Acceleration

2021-10-14 Thread Artemis User
Spark is good with SQL type of structured data, not image data. Unless 
you algorithms don' t require dealing with image data directly. I guess 
your best option would be to go with Tensorflow since it has image 
classification models built-in and can integrate with NVidia GPUs out of 
the box.  There is no out-of-the-box data source APIs for image data in 
Spark.  Hope this helps.


-- ND

On 10/13/21 11:54 PM, 刘沛文 wrote:

Hi,
My name is Peiwen. I'm working with Dr. Brain, an AI company focused 
on medical imaging processing and deep learning. Our website is 
http://drbrain.net/index_en.aspx 
We basically do 2 major things. 1. image process, like lesion drawing 
2. deep learning for neural disease prediction, like stroke, 
Alzheimer's Disease.
Currently we use Tensorflow and other deep learning frameworks. Due to 
the size of the medical image (1 ~ 5 GB per record), with traditional 
framework on single computer, it takes long time (a few hours) for 
data processing and model training before we get the result.
I'm writing the email to check if there's some good solution that 
Apache Spark can provide to accelerate the calculation.
I know Tensorflow can work with Spark. Just want to have a brief 
understanding that compared to traditional Tensorflow, how faster 
Apache Spark can help achieve, saying a cluster of 10 nodes.


Thank you very much!

Peiwen




Re: Spark for Image Processing Acceleration

2021-10-13 Thread Sean Owen
You could distribute the computation across a cluster with Spark and
Horovod (and Petastorm) for example:
https://github.com/horovod/horovod
https://github.com/uber/petastorm

If you're at a few hours, it may not be worth it - it's not hard to set up
but is more involved. You may do better with a larger GPU or multiple GPUs
on a single VM first. But totally possible, definitely have seen this work

On Wed, Oct 13, 2021 at 11:01 PM 刘沛文  wrote:

> Hi,
> My name is Peiwen. I'm working with Dr. Brain, an AI company focused on
> medical imaging processing and deep learning. Our website is
> http://drbrain.net/index_en.aspx
> We basically do 2 major things. 1. image process, like lesion drawing 2.
> deep learning for neural disease prediction, like stroke, Alzheimer's
> Disease.
> Currently we use Tensorflow and other deep learning frameworks. Due to the
> size of the medical image (1 ~ 5 GB per record), with traditional framework
> on single computer, it takes long time (a few hours) for data processing
> and model training before we get the result.
> I'm writing the email to check if there's some good solution that Apache
> Spark can provide to accelerate the calculation.
> I know Tensorflow can work with Spark. Just want to have a brief
> understanding that compared to traditional Tensorflow, how faster Apache
> Spark can help achieve, saying a cluster of 10 nodes.
>
> Thank you very much!
>
> Peiwen
>