Spark is good with SQL type of structured data, not image data. Unless
you algorithms don' t require dealing with image data directly. I guess
your best option would be to go with Tensorflow since it has image
classification models built-in and can integrate with NVidia GPUs out of
the box. There is no out-of-the-box data source APIs for image data in
Spark. Hope this helps.
-- ND
On 10/13/21 11:54 PM, 刘沛文 wrote:
Hi,
My name is Peiwen. I'm working with Dr. Brain, an AI company focused
on medical imaging processing and deep learning. Our website is
http://drbrain.net/index_en.aspx <http://drbrain.net/index_en.aspx>
We basically do 2 major things. 1. image process, like lesion drawing
2. deep learning for neural disease prediction, like stroke,
Alzheimer's Disease.
Currently we use Tensorflow and other deep learning frameworks. Due to
the size of the medical image (1 ~ 5 GB per record), with traditional
framework on single computer, it takes long time (a few hours) for
data processing and model training before we get the result.
I'm writing the email to check if there's some good solution that
Apache Spark can provide to accelerate the calculation.
I know Tensorflow can work with Spark. Just want to have a brief
understanding that compared to traditional Tensorflow, how faster
Apache Spark can help achieve, saying a cluster of 10 nodes.
Thank you very much!
Peiwen