Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Chris Fregly Sat, 04 Feb 2017 14:18:53 -0800

to date, i haven't seen very good performance coming from mleap. i believe ram 
from databricks keeps getting you guys on stage at the spark summits, but i've 
been unimpressed with the performance numbers - as well as your choice to 
reimplement own non-standard "pmml-like" mechanism which incurs heavy technical 
debt on the development side.


creating technical debt is a very databricks-like thing as seen in their own 
product - so it's no surprise that databricks supports and encourages this type 
of engineering effort.

@hollin: please correct me if i'm wrong, but the numbers you guys have quoted 
in the past are at very low scale. at one point you were quoting 40-50ms which 
is pretty bad. 11ms is better, but these are all at low scale which is not good.

i'm not sure where the 2-3ms numbers are coming from, but even that is not 
realistic in most real-world scenarios at scale.

checkout our 100% open source solution to this exact problem starting at 
http://pipeline.io. you'll find links to the github repo, youtube demos, and 
slideshare conference talks, online training, and lots more.

our entire focus at PipelineIO is optimizing. deploying, a/b + bandit testing, 
and scaling Scikit-Learn + Spark ML + Tensorflow AI models for high-performance 
predictions.

this focus on performance and scale is an extension of our team's long history 
of building highly scalable, highly available, and highly performance 
distributed ML and AI systems at netflix, twitter, mesosphere - and even 
databricks. :)

reminder that everything here is 100% open source. no product pitches here. we 
work for you guys/gals - aka the community!

please contact me directly if you're looking to solve this problem the best way 
possible.

we can get you up and running in your own cloud-based or on-premise environment 
in minutes. we support aws, google cloud, and azure - basically anywhere that 
runs docker.

any time zone works. we're completely global with free 24x7 support for 
everyone in the community.

thanks! hope this is useful.

Chris Fregly
Research Scientist @ PipelineIO
Founder @ Advanced Spark and TensorFlow Meetup
San Francisco - Chicago - Washington DC - London

On Feb 4, 2017, 12:06 PM -0600, Debasish Das <debasish.da...@gmail.com>, wrote:
>
> Except of course lda als and neural net model....for them the model need to 
> be either prescored and cached on a kv store or the matrices / graph should 
> be kept on kv store to access them using a REST API to serve the output..for 
> neural net its more fun since its a distributed or local graph over which 
> tensorflow compute needs to run...
>
>
> In trapezium we support writing these models to store like cassandra and 
> lucene for example and then provide config driven akka-http based API to add 
> the business logic to access these model from a store and expose the model 
> serving as REST endpoint
>
>
> Matrix, graph and kernel models we use a lot and for them turned out that 
> mllib style model predict were useful if we change the underlying store...
>
> On Feb 4, 2017 9:37 AM, "Debasish Das" <debasish.da...@gmail.com 
> (mailto:debasish.da...@gmail.com)> wrote:
> >
> > If we expose an API to access the raw models out of PipelineModel can't we 
> > call predict directly on it from an API ? Is there a task open to expose 
> > the model out of PipelineModel so that predict can be called on it....there 
> > is no dependency of spark context in ml model...
> >
> > On Feb 4, 2017 9:11 AM, "Aseem Bansal" <asmbans...@gmail.com 
> > (mailto:asmbans...@gmail.com)> wrote:
> > > In Spark 2.0 there is a class called PipelineModel. I know that the title 
> > > says pipeline but it is actually talking about PipelineModel trained via 
> > > using a Pipeline.
> > > Why PipelineModel instead of pipeline? Because usually there is a series 
> > > of stuff that needs to be done when doing ML which warrants an ordered 
> > > sequence of operations. Read the new spark ml docs or one of the 
> > > databricks blogs related to spark pipelines. If you have used python's 
> > > sklearn library the concept is inspired from there.
> > > "once model is deserialized as ml model from the store of choice within 
> > > ms" - The timing of loading the model was not what I was referring to 
> > > when I was talking about timing.
> > > "it can be used on incoming features to score through spark.ml.Model 
> > > predict API". The predict API is in the old mllib package not the new ml 
> > > package.
> > > "why r we using dataframe and not the ML model directly from API" - 
> > > Because as of now the new ml package does not have the direct API.
> > >
> > >
> > >
> > > On Sat, Feb 4, 2017 at 10:24 PM, Debasish Das <debasish.da...@gmail.com 
> > > (mailto:debasish.da...@gmail.com)> wrote:
> > > >
> > > > I am not sure why I will use pipeline to do scoring...idea is to build 
> > > > a model, use model ser/deser feature to put it in the row or column 
> > > > store of choice and provide a api access to the model...we support 
> > > > these primitives in github.com/Verizon/trapezium...the 
> > > > (http://github.com/Verizon/trapezium...the) api has access to spark 
> > > > context in local or distributed mode...once model is deserialized as ml 
> > > > model from the store of choice within ms, it can be used on incoming 
> > > > features to score through spark.ml.Model predict API...I am not clear 
> > > > on 2200x speedup...why r we using dataframe and not the ML model 
> > > > directly from API ?
> > > >
> > > > On Feb 4, 2017 7:52 AM, "Aseem Bansal" <asmbans...@gmail.com 
> > > > (mailto:asmbans...@gmail.com)> wrote:
> > > > > Does this support Java 7?
> > > > > What is your timezone in case someone wanted to talk?
> > > > >
> > > > >
> > > > > On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins <hol...@combust.ml 
> > > > > (mailto:hol...@combust.ml)> wrote:
> > > > > > Hey Aseem,
> > > > > >
> > > > > > We have built pipelines that execute several string indexers, one 
> > > > > > hot encoders, scaling, and a random forest or linear regression at 
> > > > > > the end. Execution time for the linear regression was on the order 
> > > > > > of 11 microseconds, a bit longer for random forest. This can be 
> > > > > > further optimized by using row-based transformations if your 
> > > > > > pipeline is simple to around 2-3 microseconds. The pipeline 
> > > > > > operated on roughly 12 input features, and by the time all the 
> > > > > > processing was done, we had somewhere around 1000 features or so 
> > > > > > going into the linear regression after one hot encoding and 
> > > > > > everything else.
> > > > > >
> > > > > > Hope this helps,
> > > > > > Hollin
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbans...@gmail.com 
> > > > > > (mailto:asmbans...@gmail.com)> wrote:
> > > > > > > Does this support Java 7?
> > > > > > >
> > > > > > > On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal 
> > > > > > > <asmbans...@gmail.com (mailto:asmbans...@gmail.com)> wrote:
> > > > > > > > Is computational time for predictions on the order of few 
> > > > > > > > milliseconds (< 10 ms) like the old mllib library?
> > > > > > > >
> > > > > > > > On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins 
> > > > > > > > <hol...@combust.ml (mailto:hol...@combust.ml)> wrote:
> > > > > > > > >
> > > > > > > > > Hey everyone,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Some of you may have seen Mikhail and I talk at Spark/Hadoop 
> > > > > > > > > Summits about MLeap and how you can use it to build 
> > > > > > > > > production services from your Spark-trained ML pipelines. 
> > > > > > > > > MLeap is an open-source technology that allows Data 
> > > > > > > > > Scientists and Engineers to deploy Spark-trained ML Pipelines 
> > > > > > > > > and Models to a scoring engine instantly. The MLeap execution 
> > > > > > > > > engine has no dependencies on a Spark context and the 
> > > > > > > > > serialization format is entirely based on Protobuf 3 and JSON.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The recent 0.5.0 release provides serialization and inference 
> > > > > > > > > support for close to 100% of Spark transformers (we don’t yet 
> > > > > > > > > support ALS and LDA).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > MLeap is open-source, take a look at our Github page:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://github.com/combust/mleap
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Or join the conversation on Gitter:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://gitter.im/combust/mleap
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > We have a set of documentation to help get you started here:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > http://mleap-docs.combust.ml/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > We even have a set of demos, for training ML Pipelines and 
> > > > > > > > > linear, logistic and random forest models:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://github.com/combust/mleap-demo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Check out our latest MLeap-serving Docker image, which allows 
> > > > > > > > > you to expose a REST interface to your Spark ML pipeline 
> > > > > > > > > models:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > http://mleap-docs.combust.ml/mleap-serving/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Several companies are using MLeap in production and even more 
> > > > > > > > > are currently evaluating it. Take a look and tell us what you 
> > > > > > > > > think! We hope to talk with you soon and welcome 
> > > > > > > > > feedback/suggestions!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Sincerely,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hollin and Mikhail
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Reply via email to