Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Hollin Wilkins Fri, 03 Feb 2017 16:34:10 -0800

Hey Asher,

A phone call may be the best to discuss all of this. But in short:
1. It is quite easy to add custom pipelines/models to MLeap. All of our
out-of-the-box transformers can serve as a good example of how to do this.
We are also putting together documentation on how to do this in our docs
web site.
2. MLlib models are not supported, but it wouldn't be too difficult to add
support for them
3. We have benchmarked this, and with MLeap it was roughly 2200x faster
than SparkContext with a LocalRelation-backed DataFrame. The pipeline we
used for benchmarking included string indexing, one hot encoding, vector
assembly, scaling and a linear regression model. The reason for the speed
difference is that MLeap is optimized for one off requests, Spark is
incredible for scoring large batches of data because it takes time to
optimize your pipeline before execution. That time it takes to optimize is
noticeable when trying to build services around models.
4. Tensorflow support is early, but we have already built pipelines
including a Spark pipeline and a Tensorflow neural network all served from
one MLeap pipeline, using the same data structures as you would with just a
regular Spark pipeline. Eventually we will offer Tensorflow support as a
module that *just works TM* from Maven Central, but we are not quite there
yet.


Feel free to email me privately if you would like to discuss any of this
more, or join our gitter:
https://gitter.im/combust/mleap

Best,
Hollin

On Fri, Feb 3, 2017 at 10:48 AM, Asher Krim <ak...@hubspot.com> wrote:

> I have a bunch of questions for you Hollin:
>
> How easy is it to add support for custom pipelines/models?
> Are Spark mllib models supported?
> We currently run spark in local mode in an api service. It's not super
> terrible, but performance is a constant struggle. Have you benchmarked any
> performance differences between MLeap and vanilla Spark?
> What does Tensorflow support look like? I would love to serve models from
> a java stack while being agnostic to what framework was used to train them.
>
> Thanks,
> Asher Krim
> Senior Software Engineer
>
> On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins <hol...@combust.ml> wrote:
>
>> Hey Aseem,
>>
>> We have built pipelines that execute several string indexers, one hot
>> encoders, scaling, and a random forest or linear regression at the end.
>> Execution time for the linear regression was on the order of 11
>> microseconds, a bit longer for random forest. This can be further optimized
>> by using row-based transformations if your pipeline is simple to around 2-3
>> microseconds. The pipeline operated on roughly 12 input features, and by
>> the time all the processing was done, we had somewhere around 1000 features
>> or so going into the linear regression after one hot encoding and
>> everything else.
>>
>> Hope this helps,
>> Hollin
>>
>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbans...@gmail.com>
>> wrote:
>>
>>> Does this support Java 7?
>>>
>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <asmbans...@gmail.com>
>>> wrote:
>>>
>>>> Is computational time for predictions on the order of few milliseconds
>>>> (< 10 ms) like the old mllib library?
>>>>
>>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <hol...@combust.ml>
>>>> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>>
>>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
>>>>> about MLeap and how you can use it to build production services from your
>>>>> Spark-trained ML pipelines. MLeap is an open-source technology that allows
>>>>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
>>>>> Models to a scoring engine instantly. The MLeap execution engine has no
>>>>> dependencies on a Spark context and the serialization format is entirely
>>>>> based on Protobuf 3 and JSON.
>>>>>
>>>>>
>>>>> The recent 0.5.0 release provides serialization and inference support
>>>>> for close to 100% of Spark transformers (we don’t yet support ALS and 
>>>>> LDA).
>>>>>
>>>>>
>>>>> MLeap is open-source, take a look at our Github page:
>>>>>
>>>>> https://github.com/combust/mleap
>>>>>
>>>>>
>>>>> Or join the conversation on Gitter:
>>>>>
>>>>> https://gitter.im/combust/mleap
>>>>>
>>>>>
>>>>> We have a set of documentation to help get you started here:
>>>>>
>>>>> http://mleap-docs.combust.ml/
>>>>>
>>>>>
>>>>> We even have a set of demos, for training ML Pipelines and linear,
>>>>> logistic and random forest models:
>>>>>
>>>>> https://github.com/combust/mleap-demo
>>>>>
>>>>>
>>>>> Check out our latest MLeap-serving Docker image, which allows you to
>>>>> expose a REST interface to your Spark ML pipeline models:
>>>>>
>>>>> http://mleap-docs.combust.ml/mleap-serving/
>>>>>
>>>>>
>>>>> Several companies are using MLeap in production and even more are
>>>>> currently evaluating it. Take a look and tell us what you think! We hope 
>>>>> to
>>>>> talk with you soon and welcome feedback/suggestions!
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Hollin and Mikhail
>>>>>
>>>>
>>>>
>>>
>>
>

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Reply via email to