Hey Andrew! Thank you so much for sharing this and start this conversation. We had a meeting at All Hands with all people interested in "Image Classification" https://phabricator.wikimedia.org/T215413 , and one of the open questions was exactly how to find a "common repository" for ML models that different groups and products within the organization can use. So, please, count me in!
Thanks, M On Thu, Feb 7, 2019 at 4:38 PM Aaron Halfaker <ahalfa...@wikimedia.org> wrote: > Just gave the article a quick read. I think this article pushes on some > key issues for sure. I definitely agree with the focus on python/jupyter > as essential for a productive workflow that leverages the best from > research scientists. We've been thinking about what ORES 2.0 would look > like and event streams are the dominant proposal for improving on the > limitations of our queue-based worker pool. > > One of the nice things about ORES/revscoring is that it provides a nice > framework for operating using the *exact same code* no matter the > environment. E.g. it doesn't matter if we're calling out to an API to get > data for feature extraction or providing it via a stream. By investing in > a dependency injection strategy, we get that flexibility. So to me, the > hardest problem -- the one I don't quite know how to solve -- is how we'll > mix and merge streams to get all of the data we want available for feature > extraction. If I understand correctly, that's where Kafka shines. :) > > I'm definitely interested in fleshing out this proposal. We should > probably be exploring the processes for training new types of models (e.g. > image processing) using different strategies than ORES. In ORES, we're > almost entirely focused on using sklearn but we have some basic > abstractions for other estimator libraries. We also make some strong > assumptions about running on a single CPU that could probably be broken for > some performance gains using real concurrency. > > -Aaron > > On Thu, Feb 7, 2019 at 10:05 AM Goran Milovanovic < > goran.milovanovic_...@wikimedia.de> wrote: > >> Hi Andrew, >> >> I have recently started a six month AI/Machine Learning Engineering >> course which focuses exactly on the topics that you've shown interest in. >> >> So, >> >> >>> I'd love it if we had a working group (or whatever) that focused on >> how to standardize how we train and deploy ML for production use. >> >> Count me in. >> >> Regards, >> Goran >> >> >> Goran S. Milovanović, PhD >> Data Scientist, Software Department >> Wikimedia Deutschland >> >> ------------------------------------------------ >> "It's not the size of the dog in the fight, >> it's the size of the fight in the dog." >> - Mark Twain >> ------------------------------------------------ >> >> >> On Thu, Feb 7, 2019 at 4:16 PM Andrew Otto <o...@wikimedia.org> wrote: >> >>> Just came across >>> >>> https://www.confluent.io/blog/machine-learning-with-python-jupyter-ksql-tensorflow >>> >>> In it, the author discusses some of what he calls the 'impedance >>> mismatch' between data engineers and production engineers. The links to >>> Ubers Michelangelo <https://eng.uber.com/michelangelo/> (which as far >>> as I can tell has not been open sourced) and the Hidden Technical Debt >>> in Machine Learning Systems paper >>> <https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf> >>> are >>> also very interesting! >>> >>> At All hands I've been hearing more and more about using ML in >>> production, so these things seem very relevant to us. I'd love it if we >>> had a working group (or whatever) that focused on how to standardize how we >>> train and deploy ML for production use. >>> >>> :) >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > > -- > > Aaron Halfaker > > Principal Research Scientist > > Head of the Scoring Platform team > Wikimedia Foundation > _______________________________________________ > Research-Internal mailing list > research-inter...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/research-internal >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics