[ 
https://issues.apache.org/jira/browse/SYSTEMML-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050817#comment-16050817
 ] 

Janardhan commented on SYSTEMML-1159:
-------------------------------------

Hi [~dusenberrymw],

I think your idea of global sharing of whole dataset by all the models ( in 
parallel ), can be implemented by the blackboard system. Please once navigate 
to the brainstorming on parameter servers at systemml-1695

> Enable Remote Hyperparameter Tuning
> -----------------------------------
>
>                 Key: SYSTEMML-1159
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1159
>             Project: SystemML
>          Issue Type: Improvement
>    Affects Versions: SystemML 1.0
>            Reporter: Mike Dusenberry
>            Priority: Blocker
>
> Training a parameterized machine learning model (such as a large neural net 
> in deep learning) requires learning a set of ideal model parameters from the 
> data, as well as determining appropriate hyperparameters (or "settings") for 
> the training process itself.  In the latter case, the hyperparameters (i.e. 
> learning rate, regularization strength, dropout percentage, model 
> architecture, etc.) can not be learned from the data, and instead are 
> determined via a search across a space for each hyperparameter.  For large 
> numbers of hyperparameters (such as in deep learning models), the current 
> literature points to performing staged, randomized grid searches over the 
> space to produce distributions of performance, narrowing the space after each 
> search \[1].  Thus, for efficient hyperparameter optimization, it is 
> desirable to train several models in parallel, with each model trained over 
> the full dataset.  For deep learning models, a mini-batch training approach 
> is currently state-of-the-art, and thus separate models with different 
> hyperparameters could, conceivably, be easily trained on each of the nodes in 
> a cluster.
> In order to allow for the training of deep learning models, SystemML needs to 
> determine a solution to enable this scenario with the Spark backend.  
> Specifically, if the user has a {{train}} function that takes a set of 
> hyperparameters and trains a model with a mini-batch approach (and thus is 
> only making use of single-node instructions within the function), the user 
> should be able to wrap this function with, for example, a remote {{parfor}} 
> construct that samples hyperparameters and calls the {{train}} function on 
> each machine in parallel.
> To be clear, each model would need access to the entire dataset, and each 
> model would be trained independently.
> \[1]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to