Hi All, I posted this earlier on the mxnet slack channel, based on a suggestion there I am reposting it here for a wider audience -
I was searching for ways of performing HPO for models built with MXNet, and I came across Sherpa, an open source distributed HPO library presented in NeurIPS 2018 - https://openreview.net/pdf?id=HklSUMyJcQ. I have been trying it out and it is very easy to use and extensible. It already supports RandomSearch, Grid Search and BayesianOpt for performing the search in the hyper-parameter space. I have submitted a PR with an example gluon use-case - https://github.com/sherpa-ai/sherpa/pull/27 But I am yet to try it with large distributed training use cases. But the library does support it, we can run it in distributed mode for running heavy workloads. It also comes with a neat UI dashboard to monitor the jobs being run. [image: Screen Shot 2019-03-13 at 8.08.48 AM.png] I think we should explore this as an option for performing HPO with gluon. What might integration entail - 1. I have not fully evaluated what changes might be necessary but I think the integration can be fairly unobtrusive for both repositories. As demonstrated above we can already use sherpa for performing HPO, but the experience is a bit clunky. It can be made smooth by adding a few callback functions that will track and log the metrics of the different experiment runs( å la the keras callback function defined here - https://github.com/sherpa-ai/sherpa/blob/master/sherpa/core.py#L368 ) 2. The library is developed and maintained by folks in academia and is published under GPL license. I was given to understand that GPL license might be a problem for Apache products, but since we are not explicitly using it within mxnet as a sub-component, I am thinking we might have some wiggle room there. MXNet needs HPO functionality and instead of building something from scratch we could just use existing open source projects. Would like to hear more from the community. Thanks Anirudh Acharya