Spark is good with SQL type of structured data, not image data. Unless you algorithms don' t require dealing with image data directly. I guess your best option would be to go with Tensorflow since it has image classification models built-in and can integrate with NVidia GPUs out of the box.  There is no out-of-the-box data source APIs for image data in Spark.  Hope this helps.

-- ND

On 10/13/21 11:54 PM, 刘沛文 wrote:
Hi,
My name is Peiwen. I'm working with Dr. Brain, an AI company focused on medical imaging processing and deep learning. Our website is http://drbrain.net/index_en.aspx <http://drbrain.net/index_en.aspx> We basically do 2 major things. 1. image process, like lesion drawing 2. deep learning for neural disease prediction, like stroke, Alzheimer's Disease. Currently we use Tensorflow and other deep learning frameworks. Due to the size of the medical image (1 ~ 5 GB per record), with traditional framework on single computer, it takes long time (a few hours) for data processing and model training before we get the result. I'm writing the email to check if there's some good solution that Apache Spark can provide to accelerate the calculation. I know Tensorflow can work with Spark. Just want to have a brief understanding that compared to traditional Tensorflow, how faster Apache Spark can help achieve, saying a cluster of 10 nodes.

Thank you very much!

Peiwen

Reply via email to