Re: [Analytics] [Research-Internal] Article about ML in production woes

Miriam Redi Thu, 07 Feb 2019 08:41:02 -0800

Hey Andrew!

Thank you so much for sharing this and start this conversation. We had a
meeting at All Hands with all people interested in "Image Classification"
https://phabricator.wikimedia.org/T215413 , and one of the open questions
was exactly how to find a "common repository" for ML models that different
groups and products within the organization can use. So, please, count me
in!


Thanks,

M


On Thu, Feb 7, 2019 at 4:38 PM Aaron Halfaker <ahalfa...@wikimedia.org>
wrote:

> Just gave the article a quick read.  I think this article pushes on some
> key issues for sure.  I definitely agree with the focus on python/jupyter
> as essential for a productive workflow that leverages the best from
> research scientists.  We've been thinking about what ORES 2.0 would look
> like and event streams are the dominant proposal for improving on the
> limitations of our queue-based worker pool.
>
> One of the nice things about ORES/revscoring is that it provides a nice
> framework for operating using the *exact same code* no matter the
> environment.  E.g. it doesn't matter if we're calling out to an API to get
> data for feature extraction or providing it via a stream.  By investing in
> a dependency injection strategy, we get that flexibility.  So to me, the
> hardest problem -- the one I don't quite know how to solve -- is how we'll
> mix and merge streams to get all of the data we want available for feature
> extraction.  If I understand correctly, that's where Kafka shines.  :)
>
> I'm definitely interested in fleshing out this proposal.  We should
> probably be exploring the processes for training new types of models (e.g.
> image processing) using different strategies than ORES.  In ORES, we're
> almost entirely focused on using sklearn but we have some basic
> abstractions for other estimator libraries.  We also make some strong
> assumptions about running on a single CPU that could probably be broken for
> some performance gains using real concurrency.
>
> -Aaron
>
> On Thu, Feb 7, 2019 at 10:05 AM Goran Milovanovic <
> goran.milovanovic_...@wikimedia.de> wrote:
>
>> Hi Andrew,
>>
>> I have recently started a six month AI/Machine Learning Engineering
>> course which focuses exactly on the topics that you've shown interest in.
>>
>> So,
>>
>> >>>  I'd love it if we had a working group (or whatever) that focused on
>> how to standardize how we train and deploy ML for production use.
>>
>> Count me in.
>>
>> Regards,
>> Goran
>>
>>
>> Goran S. Milovanović, PhD
>> Data Scientist, Software Department
>> Wikimedia Deutschland
>>
>> ------------------------------------------------
>> "It's not the size of the dog in the fight,
>> it's the size of the fight in the dog."
>> - Mark Twain
>> ------------------------------------------------
>>
>>
>> On Thu, Feb 7, 2019 at 4:16 PM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> Just came across
>>>
>>> https://www.confluent.io/blog/machine-learning-with-python-jupyter-ksql-tensorflow
>>>
>>> In it, the author discusses some of what he calls the 'impedance
>>> mismatch' between data engineers and production engineers.  The links to
>>> Ubers Michelangelo <https://eng.uber.com/michelangelo/> (which as far
>>> as I can tell has not been open sourced) and the Hidden Technical Debt
>>> in Machine Learning Systems paper
>>> <https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf>
>>>  are
>>> also very interesting!
>>>
>>> At All hands I've been hearing more and more about using ML in
>>> production, so these things seem very relevant to us.  I'd love it if we
>>> had a working group (or whatever) that focused on how to standardize how we
>>> train and deploy ML for production use.
>>>
>>> :)
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>
> --
>
> Aaron Halfaker
>
> Principal Research Scientist
>
> Head of the Scoring Platform team
> Wikimedia Foundation
> _______________________________________________
> Research-Internal mailing list
> research-inter...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/research-internal
>

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Research-Internal] Article about ML in production woes

Reply via email to