Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-64283686
  
    @manishamde Thanks for the useful references! It seems that model 
parallelization for ANN is a challenging problem. I asked this question to few 
presenters on the recent AMP CAMP and they confirm this point given that 
present MLlib interfaces are not very well suited for this task. Moreover, 
there will be a huge communication overhead during the update step for big 
models that can still fit into memory. I took a look at the other algorithms 
rather than back propagation listed in this paper: 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=393138&tag=1. A number of 
models needs to be evaluated in genetic algorithm which even hardens the task. 
Simulated annealing which is a global optimization routine seems to be more 
promising. However, with the model distributed across several nodes one needs 
to copy data points to all nodes that store the model. I suggest to stick with 
the current implementation until one finds a clear and better approach. Does it 
make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to