Re: Revisiting Online serving of Spark models?

Maximiliano Felice Sat, 02 Jun 2018 20:40:15 -0700

Hi!

We're already in San Francisco waiting for the summit. We even think that
we spotted @holdenk this afternoon.


@chris, we're really interested in the Meetup you're hosting. My team will
probably join it since the beginning of you have room for us, and I'll join
it later after discussing the topics on this thread. I'll send you an email
regarding this request.

Thanks

El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <[email protected]>
escribió:

> @Chris This sounds fantastic, please send summary notes for Seattle folks
>
> @Felix I work in downtown Seattle, am wondering if we should a tech meetup
> around model serving in spark at my work or elsewhere close, thoughts?  I’m
> actually in the midst of building microservices to manage models and when I
> say models I mean much more than machine learning models (think OR, process
> models as well)
>
> Regards
>
> Sent from my iPhone
>
> On May 31, 2018, at 10:32 PM, Chris Fregly <[email protected]> wrote:
>
> Hey everyone!
>
> @Felix:  thanks for putting this together.  i sent some of you a quick
> calendar event - mostly for me, so i don’t forget!  :)
>
> Coincidentally, this is the focus of June 6th's *Advanced Spark and
> TensorFlow Meetup*
> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/>
>  @5:30pm
> on June 6th (same night) here in SF!
>
> Everybody is welcome to come.  Here’s the link to the meetup that includes
> the signup link:
> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/>
>
> We have an awesome lineup of speakers covered a lot of deep, technical
> ground.
>
> For those who can’t attend in person, we’ll be broadcasting live - and
> posting the recording afterward.
>
> All details are in the meetup link above…
>
> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
> welcome to give a talk. I can move things around to make room.
>
> @joseph:  I’d personally like an update on the direction of the Databricks
> proprietary ML Serving export format which is similar to PMML but not a
> standard in any way.
>
> Also, the Databricks ML Serving Runtime is only available to Databricks
> customers.  This seems in conflict with the community efforts described
> here.  Can you comment on behalf of Databricks?
>
> Look forward to your response, joseph.
>
> See you all soon!
>
> —
>
>
> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000
> Users)
> Organizer @ *Advanced Spark and TensorFlow Meetup*
> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000
> Global Members)
>
>
>
> *San Francisco - Chicago - Austin -  Washington DC - London - Dusseldorf *
> *Try our PipelineAI Community Edition with GPUs and TPUs!!
> <http://community.pipeline.ai/>*
>
>
> On May 30, 2018, at 9:32 AM, Felix Cheung <[email protected]>
> wrote:
>
> Hi!
>
> Thank you! Let’s meet then
>
> June 6 4pm
>
> Moscone West Convention Center
> 800 Howard Street, San Francisco, CA 94103
> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g>
>
> Ground floor (outside of conference area - should be available for all) -
> we will meet and decide where to go
>
> (Would not send invite because that would be too much noise for dev@)
>
> To paraphrase Joseph, we will use this to kick off the discusssion and
> post notes after and follow up online. As for Seattle, I would be very
> interested to meet in person lateen and discuss ;)
>
>
> _____________________________
> From: Saikat Kanjilal <[email protected]>
> Sent: Tuesday, May 29, 2018 11:46 AM
> Subject: Re: Revisiting Online serving of Spark models?
> To: Maximiliano Felice <[email protected]>
> Cc: Felix Cheung <[email protected]>, Holden Karau <
> [email protected]>, Joseph Bradley <[email protected]>, Leif Walsh
> <[email protected]>, dev <[email protected]>
>
>
> Would love to join but am in Seattle, thoughts on how to make this work?
>
> Regards
>
> Sent from my iPhone
>
> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> [email protected]> wrote:
>
> Big +1 to a meeting with fresh air.
>
> Could anyone send the invites? I don't really know which is the place
> Holden is talking about.
>
> 2018-05-29 14:27 GMT-03:00 Felix Cheung <[email protected]>:
>
>> You had me at blue bottle!
>>
>> _____________________________
>> From: Holden Karau <[email protected]>
>> Sent: Tuesday, May 29, 2018 9:47 AM
>> Subject: Re: Revisiting Online serving of Spark models?
>> To: Felix Cheung <[email protected]>
>> Cc: Saikat Kanjilal <[email protected]>, Maximiliano Felice <
>> [email protected]>, Joseph Bradley <[email protected]>,
>> Leif Walsh <[email protected]>, dev <[email protected]>
>>
>>
>>
>> I'm down for that, we could all go for a walk maybe to the mint plazaa
>> blue bottle and grab coffee (if the weather holds have our design meeting
>> outside :p)?
>>
>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <[email protected]>
>> wrote:
>>
>>> Bump.
>>>
>>> ------------------------------
>>> *From:* Felix Cheung <[email protected]>
>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>>> *Cc:* Leif Walsh; Holden Karau; dev
>>>
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> Hi! How about we meet the community and discuss on June 6 4pm at (near)
>>> the Summit?
>>>
>>> (I propose we meet at the venue entrance so we could accommodate people
>>> might not be in the conference)
>>>
>>> ------------------------------
>>> *From:* Saikat Kanjilal <[email protected]>
>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
>>> *To:* Maximiliano Felice
>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> I’m in the same exact boat as Maximiliano and have use cases as well for
>>> model serving and would love to join this discussion.
>>>
>>> Sent from my iPhone
>>>
>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
>>> [email protected]> wrote:
>>>
>>> Hi!
>>>
>>> I'm don't usually write a lot on this list but I keep up to date with
>>> the discussions and I'm a heavy user of Spark. This topic caught my
>>> attention, as we're currently facing this issue at work. I'm attending to
>>> the summit and was wondering if it would it be possible for me to join that
>>> meeting. I might be able to share some helpful usecases and ideas.
>>>
>>> Thanks,
>>> Maximiliano Felice
>>>
>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <[email protected]>
>>> escribió:
>>>
>>>> I’m with you on json being more readable than parquet, but we’ve had
>>>> success using pyarrow’s parquet reader and have been quite happy with it so
>>>> far. If your target is python (and probably if not now, then soon, R), you
>>>> should look in to it.
>>>>
>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <[email protected]>
>>>> wrote:
>>>>
>>>>> Regarding model reading and writing, I'll give quick thoughts here:
>>>>> * Our approach was to use the same format but write JSON instead of
>>>>> Parquet.  It's easier to parse JSON without Spark, and using the same
>>>>> format simplifies architecture.  Plus, some people want to check files 
>>>>> into
>>>>> version control, and JSON is nice for that.
>>>>> * The reader/writer APIs could be extended to take format parameters
>>>>> (just like DataFrame reader/writers) to handle JSON (and maybe, 
>>>>> eventually,
>>>>> handle Parquet in the online serving setting).
>>>>>
>>>>> This would be a big project, so proposing a SPIP might be best.  If
>>>>> people are around at the Spark Summit, that could be a good time to meet 
>>>>> up
>>>>> & then post notes back to the dev list.
>>>>>
>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Specifically I’d like bring part of the discussion to Model and
>>>>>> PipelineModel, and various ModelReader and SharedReadWrite 
>>>>>> implementations
>>>>>> that rely on SparkContext. This is a big blocker on reusing  trained 
>>>>>> models
>>>>>> outside of Spark for online serving.
>>>>>>
>>>>>> What’s the next step? Would folks be interested in getting together
>>>>>> to discuss/get some feedback?
>>>>>>
>>>>>>
>>>>>> _____________________________
>>>>>> From: Felix Cheung <[email protected]>
>>>>>> Sent: Thursday, May 10, 2018 10:10 AM
>>>>>> Subject: Re: Revisiting Online serving of Spark models?
>>>>>> To: Holden Karau <[email protected]>, Joseph Bradley <
>>>>>> [email protected]>
>>>>>> Cc: dev <[email protected]>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Huge +1 on this!
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:*[email protected] <[email protected]> on behalf of
>>>>>> Holden Karau <[email protected]>
>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM
>>>>>> *To:* Joseph Bradley
>>>>>> *Cc:* dev
>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Thanks for bringing this up Holden!  I'm a strong supporter of this.
>>>>>>>
>>>>>>> Awesome! I'm glad other folks think something like this belongs in
>>>>>> Spark.
>>>>>>
>>>>>>> This was one of the original goals for mllib-local: to have local
>>>>>>> versions of MLlib models which could be deployed without the big Spark 
>>>>>>> JARs
>>>>>>> and without a SparkContext or SparkSession.  There are related 
>>>>>>> commercial
>>>>>>> offerings like this : ) but the overhead of maintaining those offerings 
>>>>>>> is
>>>>>>> pretty high.  Building good APIs within MLlib to avoid copying logic 
>>>>>>> across
>>>>>>> libraries will be well worth it.
>>>>>>>
>>>>>>> We've talked about this need at Databricks and have also been
>>>>>>> syncing with the creators of MLeap.  It'd be great to get this
>>>>>>> functionality into Spark itself.  Some thoughts:
>>>>>>> * It'd be valuable to have this go beyond adding transform() methods
>>>>>>> taking a Row to the current Models.  Instead, it would be ideal to have
>>>>>>> local, lightweight versions of models in mllib-local, outside of the 
>>>>>>> main
>>>>>>> mllib package (for easier deployment with smaller & fewer dependencies).
>>>>>>> * Supporting Pipelines is important.  For this, it would be ideal to
>>>>>>> utilize elements of Spark SQL, particularly Rows and Types, which could 
>>>>>>> be
>>>>>>> moved into a local sql package.
>>>>>>> * This architecture may require some awkward APIs currently to have
>>>>>>> model prediction logic in mllib-local, local model classes in 
>>>>>>> mllib-local,
>>>>>>> and regular (DataFrame-friendly) model classes in mllib.  We might find 
>>>>>>> it
>>>>>>> helpful to break some DeveloperApis in Spark 3.0 to facilitate this
>>>>>>> architecture while making it feasible for 3rd party developers to extend
>>>>>>> MLlib APIs (especially in Java).
>>>>>>>
>>>>>> I agree this could be interesting, and feed into the other discussion
>>>>>> around when (or if) we should be considering Spark 3.0
>>>>>> I _think_ we could probably do it with optional traits people could
>>>>>> mix in to avoid breaking the current APIs but I could be wrong on that
>>>>>> point.
>>>>>>
>>>>>>> * It could also be worth discussing local DataFrames.  They might
>>>>>>> not be as important as per-Row transformations, but they would be 
>>>>>>> helpful
>>>>>>> for batching for higher throughput.
>>>>>>>
>>>>>> That could be interesting as well.
>>>>>>
>>>>>>>
>>>>>>> I'll be interested to hear others' thoughts too!
>>>>>>>
>>>>>>> Joseph
>>>>>>>
>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi y'all,
>>>>>>>>
>>>>>>>> With the renewed interest in ML in Apache Spark now seems like a
>>>>>>>> good a time as any to revisit the online serving situation in Spark 
>>>>>>>> ML. DB
>>>>>>>> & other's have done some excellent working moving a lot of the 
>>>>>>>> necessary
>>>>>>>> tools into a local linear algebra package that doesn't depend on 
>>>>>>>> having a
>>>>>>>> SparkContext.
>>>>>>>>
>>>>>>>> There are a few different commercial and non-commercial solutions
>>>>>>>> round this, but currently our individual transform/predict methods are
>>>>>>>> private so they either need to copy or re-implement (or put them 
>>>>>>>> selves in
>>>>>>>> org.apache.spark) to access them. How would folks feel about adding a 
>>>>>>>> new
>>>>>>>> trait for ML pipeline stages to expose to do transformation of single
>>>>>>>> element inputs (or local collections) that could be optionally 
>>>>>>>> implemented
>>>>>>>> by stages which support this? That way we can have less copy and paste 
>>>>>>>> code
>>>>>>>> possibly getting out of sync with our model training.
>>>>>>>>
>>>>>>>> I think continuing to have on-line serving grow in different
>>>>>>>> projects is probably the right path, forward (folks have different 
>>>>>>>> needs),
>>>>>>>> but I'd love to see us make it simpler for other projects to build 
>>>>>>>> reliable
>>>>>>>> serving tools.
>>>>>>>>
>>>>>>>> I realize this maybe puts some of the folks in an awkward position
>>>>>>>> with their own commercial offerings, but hopefully if we make it 
>>>>>>>> easier for
>>>>>>>> everyone the commercial vendors can benefit as well.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Holden :)
>>>>>>>>
>>>>>>>> --
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Joseph Bradley
>>>>>>> Software Engineer - Machine Learning
>>>>>>> Databricks, Inc.
>>>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Bradley
>>>>> Software Engineer - Machine Learning
>>>>> Databricks, Inc.
>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>
>>>> --
>>>> --
>>>> Cheers,
>>>> Leif
>>>>
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>>
>>
>
>
>
>

Re: Revisiting Online serving of Spark models?

Reply via email to