Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-64283686 @manishamde Thanks for the useful references! It seems that model parallelization for ANN is a challenging problem. I asked this question to few presenters on the recent AMP CAMP and they confirm this point given that present MLlib interfaces are not very well suited for this task. Moreover, there will be a huge communication overhead during the update step for big models that can still fit into memory. I took a look at the other algorithms rather than back propagation listed in this paper: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=393138&tag=1. A number of models needs to be evaluated in genetic algorithm which even hardens the task. Simulated annealing which is a global optimization routine seems to be more promising. However, with the model distributed across several nodes one needs to copy data points to all nodes that store the model. I suggest to stick with the current implementation until one finds a clear and better approach. Does it make sense?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org