Re: ml models distribution

Chris Fregly Fri, 22 Jul 2016 12:26:03 -0700

hey everyone-

this concept of deploying your Spark ML Pipelines and Algos into Production
(user-facing production) has been coming up a lot recently.

so much so, that i've dedicated the last few months of my research and
engineering efforts to build out the infrastructure to support this in a
highly-scalable, highly-available way.

i've combined my Netflix + NetflixOSS work experience with my
Databricks/IBM + Spark work experience into an open source project,
PipelineIO, here:  http://pipeline.io

we're even serving up TensorFlow AI models using the same infrastructure -
incorporating key patterns from TensorFlow Distributed + TensorFlow Serving!

everything is open source, based on Docker + Kubernetes + NetflixOSS +
Spark + TensorFlow + Redis + Hybrid Cloud + On-Premise + Kafka + Zeppelin +
Jupyter/iPython with a heavy emphasis on metrics and monitoring of models
and server production statistics.

we're doing code generation directly from the saved Spark ML models (thanks
Spark 2.0 for giving us save/load parity across all models!) for optimized
model serving using both CPUs and GPUs, incremental training of models,
autoscaling, the whole works.

our friend from Netflix, Chaos Monkey, even makes a grim appearance from
time to time to prove that we're resilient to failure.

take a peek.  it's cool.  we've come a long way in the last couple months,
and we've got a lot of work left to do, but the core infrastructure is in
place, key features have been built, and we're moving quickly.

shoot me an email if you'd like to get involved.  lots of TODO's.

we're dedicating my upcoming Advanced Spark and TensorFlow Meetup on August
4th in SF to demo'ing this infrastructure to you all.

here's the link:
http://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/231457813/

video recording + screen capture will be posted afterward, as always.

we've got a workshop dedicated to building an end-to-end Spark ML and
Kafka-based Recommendation Pipeline - including the PipelineIO serving
platform.  link is here:  http://pipeline.io

and i'm finishing a blog post soon to detail everything we've done so far -
and everything we're actively building.  this post will be available on
http://pipeline.io - as well as cross-posted to a number of my favorite
engineering blogs.

global demo roadshow starts 8/8.  shoot me an email if you want to see all
this in action, otherwise i'll see you at a workshop or meetup near you!  :)

On Fri, Jul 22, 2016 at 10:34 AM, Inam Ur Rehman <inam.rehma...@gmail.com>
wrote:

> Hello guys..i know its irrelevant to this topic but i've been looking
> desperately for the solution. I am facing en exception
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html
>
> plz help me.. I couldn't find any solution.. plz
>
> On Fri, Jul 22, 2016 at 6:12 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> No there isn't anything in particular, beyond the various bits of
>> serialization support that write out something to put in your storage
>> to begin with. What you do with it after reading and before writing is
>> up to your app, on purpose.
>>
>> If you mean you're producing data outside the model that your model
>> uses, your model data might be produced by an RDD operation, and saved
>> that way. There it's no different than anything else you do with RDDs.
>>
>> What part are you looking to automate beyond those things? that's most of
>> it.
>>
>> On Fri, Jul 22, 2016 at 2:04 PM, Sergio Fernández <wik...@apache.org>
>> wrote:
>> > Hi Sean,
>> >
>> > On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> If you mean, how do you distribute a new model in your application,
>> >> then there's no magic to it. Just reference the new model in the
>> >> functions you're executing in your driver.
>> >>
>> >> If you implemented some other manual way of deploying model info, just
>> >> do that again. There's no special thing to know.
>> >
>> >
>> > Well, because some huge model, we typically bundle both logic
>> > (pipeline/application)  and models separately. Normally we use a shared
>> > stores (e.g., HDFS) or coordinated distribution of the models. But I
>> wanted
>> > to know if there is any infrastructure in Spark that specifically
>> addresses
>> > such need.
>> >
>> > Thanks.
>> >
>> > Cheers,
>> >
>> > P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
>> > quite spread acronym. Sorry for the possible confusion.
>> >
>> >
>> > --
>> > Sergio Fernández
>> > Partner Technology Manager
>> > Redlink GmbH
>> > m: +43 6602747925
>> > e: sergio.fernan...@redlink.co
>> > w: http://redlink.co
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

-- 
*Chris Fregly*
Research Scientist @ PipelineIO
San Francisco, CA
pipeline.io
advancedspark.com

Re: ml models distribution

Reply via email to