On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < maximilianofel...@gmail.com> wrote:
> Hi! > > We're already in San Francisco waiting for the summit. We even think that > we spotted @holdenk this afternoon. > Unless you happened to be walking by my garage probably not super likely, spent the day working on scooters/motorcycles (my style is a little less unique in SF :)). Also if you see me feel free to say hi unless I look like I haven't had my first coffee of the day, love chatting with folks IRL :) > > @chris, we're really interested in the Meetup you're hosting. My team will > probably join it since the beginning of you have room for us, and I'll join > it later after discussing the topics on this thread. I'll send you an email > regarding this request. > > Thanks > > El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <sxk1...@hotmail.com> > escribió: > >> @Chris This sounds fantastic, please send summary notes for Seattle folks >> >> @Felix I work in downtown Seattle, am wondering if we should a tech >> meetup around model serving in spark at my work or elsewhere close, >> thoughts? I’m actually in the midst of building microservices to manage >> models and when I say models I mean much more than machine learning models >> (think OR, process models as well) >> >> Regards >> >> Sent from my iPhone >> >> On May 31, 2018, at 10:32 PM, Chris Fregly <ch...@fregly.com> wrote: >> >> Hey everyone! >> >> @Felix: thanks for putting this together. i sent some of you a quick >> calendar event - mostly for me, so i don’t forget! :) >> >> Coincidentally, this is the focus of June 6th's *Advanced Spark and >> TensorFlow Meetup* >> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >> @5:30pm >> on June 6th (same night) here in SF! >> >> Everybody is welcome to come. Here’s the link to the meetup that >> includes the signup link: >> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* >> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >> >> We have an awesome lineup of speakers covered a lot of deep, technical >> ground. >> >> For those who can’t attend in person, we’ll be broadcasting live - and >> posting the recording afterward. >> >> All details are in the meetup link above… >> >> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than >> welcome to give a talk. I can move things around to make room. >> >> @joseph: I’d personally like an update on the direction of the >> Databricks proprietary ML Serving export format which is similar to PMML >> but not a standard in any way. >> >> Also, the Databricks ML Serving Runtime is only available to Databricks >> customers. This seems in conflict with the community efforts described >> here. Can you comment on behalf of Databricks? >> >> Look forward to your response, joseph. >> >> See you all soon! >> >> — >> >> >> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000 >> Users) >> Organizer @ *Advanced Spark and TensorFlow Meetup* >> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000 >> Global Members) >> >> >> >> *San Francisco - Chicago - Austin - Washington DC - London - Dusseldorf * >> *Try our PipelineAI Community Edition with GPUs and TPUs!! >> <http://community.pipeline.ai/>* >> >> >> On May 30, 2018, at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >> Hi! >> >> Thank you! Let’s meet then >> >> June 6 4pm >> >> Moscone West Convention Center >> 800 Howard Street, San Francisco, CA 94103 >> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g> >> >> Ground floor (outside of conference area - should be available for all) - >> we will meet and decide where to go >> >> (Would not send invite because that would be too much noise for dev@) >> >> To paraphrase Joseph, we will use this to kick off the discusssion and >> post notes after and follow up online. As for Seattle, I would be very >> interested to meet in person lateen and discuss ;) >> >> >> _____________________________ >> From: Saikat Kanjilal <sxk1...@hotmail.com> >> Sent: Tuesday, May 29, 2018 11:46 AM >> Subject: Re: Revisiting Online serving of Spark models? >> To: Maximiliano Felice <maximilianofel...@gmail.com> >> Cc: Felix Cheung <felixcheun...@hotmail.com>, Holden Karau < >> hol...@pigscanfly.ca>, Joseph Bradley <jos...@databricks.com>, Leif >> Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >> >> >> Would love to join but am in Seattle, thoughts on how to make this work? >> >> Regards >> >> Sent from my iPhone >> >> On May 29, 2018, at 10:35 AM, Maximiliano Felice < >> maximilianofel...@gmail.com> wrote: >> >> Big +1 to a meeting with fresh air. >> >> Could anyone send the invites? I don't really know which is the place >> Holden is talking about. >> >> 2018-05-29 14:27 GMT-03:00 Felix Cheung <felixcheun...@hotmail.com>: >> >>> You had me at blue bottle! >>> >>> _____________________________ >>> From: Holden Karau <hol...@pigscanfly.ca> >>> Sent: Tuesday, May 29, 2018 9:47 AM >>> Subject: Re: Revisiting Online serving of Spark models? >>> To: Felix Cheung <felixcheun...@hotmail.com> >>> Cc: Saikat Kanjilal <sxk1...@hotmail.com>, Maximiliano Felice < >>> maximilianofel...@gmail.com>, Joseph Bradley <jos...@databricks.com>, >>> Leif Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >>> >>> >>> >>> I'm down for that, we could all go for a walk maybe to the mint plazaa >>> blue bottle and grab coffee (if the weather holds have our design meeting >>> outside :p)? >>> >>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <felixcheun...@hotmail.com >>> > wrote: >>> >>>> Bump. >>>> >>>> ------------------------------ >>>> *From:* Felix Cheung <felixcheun...@hotmail.com> >>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM >>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley >>>> *Cc:* Leif Walsh; Holden Karau; dev >>>> >>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>> >>>> Hi! How about we meet the community and discuss on June 6 4pm at (near) >>>> the Summit? >>>> >>>> (I propose we meet at the venue entrance so we could accommodate people >>>> might not be in the conference) >>>> >>>> ------------------------------ >>>> *From:* Saikat Kanjilal <sxk1...@hotmail.com> >>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM >>>> *To:* Maximiliano Felice >>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev >>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>> >>>> I’m in the same exact boat as Maximiliano and have use cases as well >>>> for model serving and would love to join this discussion. >>>> >>>> Sent from my iPhone >>>> >>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice < >>>> maximilianofel...@gmail.com> wrote: >>>> >>>> Hi! >>>> >>>> I'm don't usually write a lot on this list but I keep up to date with >>>> the discussions and I'm a heavy user of Spark. This topic caught my >>>> attention, as we're currently facing this issue at work. I'm attending to >>>> the summit and was wondering if it would it be possible for me to join that >>>> meeting. I might be able to share some helpful usecases and ideas. >>>> >>>> Thanks, >>>> Maximiliano Felice >>>> >>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <leif.wa...@gmail.com> >>>> escribió: >>>> >>>>> I’m with you on json being more readable than parquet, but we’ve had >>>>> success using pyarrow’s parquet reader and have been quite happy with it >>>>> so >>>>> far. If your target is python (and probably if not now, then soon, R), you >>>>> should look in to it. >>>>> >>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <jos...@databricks.com> >>>>> wrote: >>>>> >>>>>> Regarding model reading and writing, I'll give quick thoughts here: >>>>>> * Our approach was to use the same format but write JSON instead of >>>>>> Parquet. It's easier to parse JSON without Spark, and using the same >>>>>> format simplifies architecture. Plus, some people want to check files >>>>>> into >>>>>> version control, and JSON is nice for that. >>>>>> * The reader/writer APIs could be extended to take format parameters >>>>>> (just like DataFrame reader/writers) to handle JSON (and maybe, >>>>>> eventually, >>>>>> handle Parquet in the online serving setting). >>>>>> >>>>>> This would be a big project, so proposing a SPIP might be best. If >>>>>> people are around at the Spark Summit, that could be a good time to meet >>>>>> up >>>>>> & then post notes back to the dev list. >>>>>> >>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung < >>>>>> felixcheun...@hotmail.com> wrote: >>>>>> >>>>>>> Specifically I’d like bring part of the discussion to Model and >>>>>>> PipelineModel, and various ModelReader and SharedReadWrite >>>>>>> implementations >>>>>>> that rely on SparkContext. This is a big blocker on reusing trained >>>>>>> models >>>>>>> outside of Spark for online serving. >>>>>>> >>>>>>> What’s the next step? Would folks be interested in getting together >>>>>>> to discuss/get some feedback? >>>>>>> >>>>>>> >>>>>>> _____________________________ >>>>>>> From: Felix Cheung <felixcheun...@hotmail.com> >>>>>>> Sent: Thursday, May 10, 2018 10:10 AM >>>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>>> To: Holden Karau <hol...@pigscanfly.ca>, Joseph Bradley < >>>>>>> jos...@databricks.com> >>>>>>> Cc: dev <dev@spark.apache.org> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Huge +1 on this! >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:*holden.ka...@gmail.com <holden.ka...@gmail.com> on behalf of >>>>>>> Holden Karau <hol...@pigscanfly.ca> >>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM >>>>>>> *To:* Joseph Bradley >>>>>>> *Cc:* dev >>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley < >>>>>>> jos...@databricks.com> wrote: >>>>>>> >>>>>>>> Thanks for bringing this up Holden! I'm a strong supporter of this. >>>>>>>> >>>>>>>> Awesome! I'm glad other folks think something like this belongs in >>>>>>> Spark. >>>>>>> >>>>>>>> This was one of the original goals for mllib-local: to have local >>>>>>>> versions of MLlib models which could be deployed without the big Spark >>>>>>>> JARs >>>>>>>> and without a SparkContext or SparkSession. There are related >>>>>>>> commercial >>>>>>>> offerings like this : ) but the overhead of maintaining those >>>>>>>> offerings is >>>>>>>> pretty high. Building good APIs within MLlib to avoid copying logic >>>>>>>> across >>>>>>>> libraries will be well worth it. >>>>>>>> >>>>>>>> We've talked about this need at Databricks and have also been >>>>>>>> syncing with the creators of MLeap. It'd be great to get this >>>>>>>> functionality into Spark itself. Some thoughts: >>>>>>>> * It'd be valuable to have this go beyond adding transform() >>>>>>>> methods taking a Row to the current Models. Instead, it would be >>>>>>>> ideal to >>>>>>>> have local, lightweight versions of models in mllib-local, outside of >>>>>>>> the >>>>>>>> main mllib package (for easier deployment with smaller & fewer >>>>>>>> dependencies). >>>>>>>> * Supporting Pipelines is important. For this, it would be ideal >>>>>>>> to utilize elements of Spark SQL, particularly Rows and Types, which >>>>>>>> could >>>>>>>> be moved into a local sql package. >>>>>>>> * This architecture may require some awkward APIs currently to have >>>>>>>> model prediction logic in mllib-local, local model classes in >>>>>>>> mllib-local, >>>>>>>> and regular (DataFrame-friendly) model classes in mllib. We might >>>>>>>> find it >>>>>>>> helpful to break some DeveloperApis in Spark 3.0 to facilitate this >>>>>>>> architecture while making it feasible for 3rd party developers to >>>>>>>> extend >>>>>>>> MLlib APIs (especially in Java). >>>>>>>> >>>>>>> I agree this could be interesting, and feed into the other >>>>>>> discussion around when (or if) we should be considering Spark 3.0 >>>>>>> I _think_ we could probably do it with optional traits people could >>>>>>> mix in to avoid breaking the current APIs but I could be wrong on that >>>>>>> point. >>>>>>> >>>>>>>> * It could also be worth discussing local DataFrames. They might >>>>>>>> not be as important as per-Row transformations, but they would be >>>>>>>> helpful >>>>>>>> for batching for higher throughput. >>>>>>>> >>>>>>> That could be interesting as well. >>>>>>> >>>>>>>> >>>>>>>> I'll be interested to hear others' thoughts too! >>>>>>>> >>>>>>>> Joseph >>>>>>>> >>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <hol...@pigscanfly.ca> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi y'all, >>>>>>>>> >>>>>>>>> With the renewed interest in ML in Apache Spark now seems like a >>>>>>>>> good a time as any to revisit the online serving situation in Spark >>>>>>>>> ML. DB >>>>>>>>> & other's have done some excellent working moving a lot of the >>>>>>>>> necessary >>>>>>>>> tools into a local linear algebra package that doesn't depend on >>>>>>>>> having a >>>>>>>>> SparkContext. >>>>>>>>> >>>>>>>>> There are a few different commercial and non-commercial solutions >>>>>>>>> round this, but currently our individual transform/predict methods are >>>>>>>>> private so they either need to copy or re-implement (or put them >>>>>>>>> selves in >>>>>>>>> org.apache.spark) to access them. How would folks feel about adding a >>>>>>>>> new >>>>>>>>> trait for ML pipeline stages to expose to do transformation of single >>>>>>>>> element inputs (or local collections) that could be optionally >>>>>>>>> implemented >>>>>>>>> by stages which support this? That way we can have less copy and >>>>>>>>> paste code >>>>>>>>> possibly getting out of sync with our model training. >>>>>>>>> >>>>>>>>> I think continuing to have on-line serving grow in different >>>>>>>>> projects is probably the right path, forward (folks have different >>>>>>>>> needs), >>>>>>>>> but I'd love to see us make it simpler for other projects to build >>>>>>>>> reliable >>>>>>>>> serving tools. >>>>>>>>> >>>>>>>>> I realize this maybe puts some of the folks in an awkward position >>>>>>>>> with their own commercial offerings, but hopefully if we make it >>>>>>>>> easier for >>>>>>>>> everyone the commercial vendors can benefit as well. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Holden :) >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Joseph Bradley >>>>>>>> Software Engineer - Machine Learning >>>>>>>> Databricks, Inc. >>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Joseph Bradley >>>>>> Software Engineer - Machine Learning >>>>>> Databricks, Inc. >>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>> >>>>> -- >>>>> -- >>>>> Cheers, >>>>> Leif >>>>> >>>> >>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> >>> >>> >> >> >> >> -- Twitter: https://twitter.com/holdenkarau