We’re by the registration sign going to start walking over at 4:05 On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice < maximilianofel...@gmail.com> wrote:
> Hi! > > Do we meet at the entrance? > > See you > > > El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath < > nick.pentre...@gmail.com> escribió: > >> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. >> >> On Sun, 3 Jun 2018 at 00:24 Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < >>> maximilianofel...@gmail.com> wrote: >>> >>>> Hi! >>>> >>>> We're already in San Francisco waiting for the summit. We even think >>>> that we spotted @holdenk this afternoon. >>>> >>> Unless you happened to be walking by my garage probably not super >>> likely, spent the day working on scooters/motorcycles (my style is a little >>> less unique in SF :)). Also if you see me feel free to say hi unless I look >>> like I haven't had my first coffee of the day, love chatting with folks IRL >>> :) >>> >>>> >>>> @chris, we're really interested in the Meetup you're hosting. My team >>>> will probably join it since the beginning of you have room for us, and I'll >>>> join it later after discussing the topics on this thread. I'll send you an >>>> email regarding this request. >>>> >>>> Thanks >>>> >>>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal < >>>> sxk1...@hotmail.com> escribió: >>>> >>>>> @Chris This sounds fantastic, please send summary notes for Seattle >>>>> folks >>>>> >>>>> @Felix I work in downtown Seattle, am wondering if we should a tech >>>>> meetup around model serving in spark at my work or elsewhere close, >>>>> thoughts? I’m actually in the midst of building microservices to manage >>>>> models and when I say models I mean much more than machine learning models >>>>> (think OR, process models as well) >>>>> >>>>> Regards >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On May 31, 2018, at 10:32 PM, Chris Fregly <ch...@fregly.com> wrote: >>>>> >>>>> Hey everyone! >>>>> >>>>> @Felix: thanks for putting this together. i sent some of you a quick >>>>> calendar event - mostly for me, so i don’t forget! :) >>>>> >>>>> Coincidentally, this is the focus of June 6th's *Advanced Spark and >>>>> TensorFlow Meetup* >>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>>>> @5:30pm >>>>> on June 6th (same night) here in SF! >>>>> >>>>> Everybody is welcome to come. Here’s the link to the meetup that >>>>> includes the signup link: >>>>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* >>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>>>> >>>>> We have an awesome lineup of speakers covered a lot of deep, technical >>>>> ground. >>>>> >>>>> For those who can’t attend in person, we’ll be broadcasting live - and >>>>> posting the recording afterward. >>>>> >>>>> All details are in the meetup link above… >>>>> >>>>> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than >>>>> welcome to give a talk. I can move things around to make room. >>>>> >>>>> @joseph: I’d personally like an update on the direction of the >>>>> Databricks proprietary ML Serving export format which is similar to PMML >>>>> but not a standard in any way. >>>>> >>>>> Also, the Databricks ML Serving Runtime is only available to >>>>> Databricks customers. This seems in conflict with the community efforts >>>>> described here. Can you comment on behalf of Databricks? >>>>> >>>>> Look forward to your response, joseph. >>>>> >>>>> See you all soon! >>>>> >>>>> — >>>>> >>>>> >>>>> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000 >>>>> Users) >>>>> Organizer @ *Advanced Spark and TensorFlow Meetup* >>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000 >>>>> Global Members) >>>>> >>>>> >>>>> >>>>> *San Francisco - Chicago - Austin - >>>>> Washington DC - London - Dusseldorf * >>>>> *Try our PipelineAI Community Edition with GPUs and TPUs!! >>>>> <http://community.pipeline.ai/>* >>>>> >>>>> >>>>> On May 30, 2018, at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com> >>>>> wrote: >>>>> >>>>> Hi! >>>>> >>>>> Thank you! Let’s meet then >>>>> >>>>> June 6 4pm >>>>> >>>>> Moscone West Convention Center >>>>> 800 Howard Street, San Francisco, CA 94103 >>>>> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g> >>>>> >>>>> Ground floor (outside of conference area - should be available for >>>>> all) - we will meet and decide where to go >>>>> >>>>> (Would not send invite because that would be too much noise for dev@) >>>>> >>>>> To paraphrase Joseph, we will use this to kick off the discusssion and >>>>> post notes after and follow up online. As for Seattle, I would be very >>>>> interested to meet in person lateen and discuss ;) >>>>> >>>>> >>>>> _____________________________ >>>>> From: Saikat Kanjilal <sxk1...@hotmail.com> >>>>> Sent: Tuesday, May 29, 2018 11:46 AM >>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>> To: Maximiliano Felice <maximilianofel...@gmail.com> >>>>> Cc: Felix Cheung <felixcheun...@hotmail.com>, Holden Karau < >>>>> hol...@pigscanfly.ca>, Joseph Bradley <jos...@databricks.com>, Leif >>>>> Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >>>>> >>>>> >>>>> Would love to join but am in Seattle, thoughts on how to make this >>>>> work? >>>>> >>>>> Regards >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice < >>>>> maximilianofel...@gmail.com> wrote: >>>>> >>>>> Big +1 to a meeting with fresh air. >>>>> >>>>> Could anyone send the invites? I don't really know which is the place >>>>> Holden is talking about. >>>>> >>>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung <felixcheun...@hotmail.com>: >>>>> >>>>>> You had me at blue bottle! >>>>>> >>>>>> _____________________________ >>>>>> From: Holden Karau <hol...@pigscanfly.ca> >>>>>> Sent: Tuesday, May 29, 2018 9:47 AM >>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>> To: Felix Cheung <felixcheun...@hotmail.com> >>>>>> Cc: Saikat Kanjilal <sxk1...@hotmail.com>, Maximiliano Felice < >>>>>> maximilianofel...@gmail.com>, Joseph Bradley <jos...@databricks.com>, >>>>>> Leif Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >>>>>> >>>>>> >>>>>> >>>>>> I'm down for that, we could all go for a walk maybe to the mint >>>>>> plazaa blue bottle and grab coffee (if the weather holds have our design >>>>>> meeting outside :p)? >>>>>> >>>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung < >>>>>> felixcheun...@hotmail.com> wrote: >>>>>> >>>>>>> Bump. >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Felix Cheung <felixcheun...@hotmail.com> >>>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM >>>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley >>>>>>> *Cc:* Leif Walsh; Holden Karau; dev >>>>>>> >>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>> >>>>>>> Hi! How about we meet the community and discuss on June 6 4pm at >>>>>>> (near) the Summit? >>>>>>> >>>>>>> (I propose we meet at the venue entrance so we could accommodate >>>>>>> people might not be in the conference) >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Saikat Kanjilal <sxk1...@hotmail.com> >>>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM >>>>>>> *To:* Maximiliano Felice >>>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev >>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>> >>>>>>> I’m in the same exact boat as Maximiliano and have use cases as well >>>>>>> for model serving and would love to join this discussion. >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice < >>>>>>> maximilianofel...@gmail.com> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> I'm don't usually write a lot on this list but I keep up to date >>>>>>> with the discussions and I'm a heavy user of Spark. This topic caught my >>>>>>> attention, as we're currently facing this issue at work. I'm attending >>>>>>> to >>>>>>> the summit and was wondering if it would it be possible for me to join >>>>>>> that >>>>>>> meeting. I might be able to share some helpful usecases and ideas. >>>>>>> >>>>>>> Thanks, >>>>>>> Maximiliano Felice >>>>>>> >>>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh < >>>>>>> leif.wa...@gmail.com> escribió: >>>>>>> >>>>>>>> I’m with you on json being more readable than parquet, but we’ve >>>>>>>> had success using pyarrow’s parquet reader and have been quite happy >>>>>>>> with >>>>>>>> it so far. If your target is python (and probably if not now, then >>>>>>>> soon, >>>>>>>> R), you should look in to it. >>>>>>>> >>>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <jos...@databricks.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Regarding model reading and writing, I'll give quick thoughts >>>>>>>>> here: >>>>>>>>> * Our approach was to use the same format but write JSON instead >>>>>>>>> of Parquet. It's easier to parse JSON without Spark, and using the >>>>>>>>> same >>>>>>>>> format simplifies architecture. Plus, some people want to check >>>>>>>>> files into >>>>>>>>> version control, and JSON is nice for that. >>>>>>>>> * The reader/writer APIs could be extended to take format >>>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON (and >>>>>>>>> maybe, >>>>>>>>> eventually, handle Parquet in the online serving setting). >>>>>>>>> >>>>>>>>> This would be a big project, so proposing a SPIP might be best. >>>>>>>>> If people are around at the Spark Summit, that could be a good time >>>>>>>>> to meet >>>>>>>>> up & then post notes back to the dev list. >>>>>>>>> >>>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung < >>>>>>>>> felixcheun...@hotmail.com> wrote: >>>>>>>>> >>>>>>>>>> Specifically I’d like bring part of the discussion to Model and >>>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite >>>>>>>>>> implementations >>>>>>>>>> that rely on SparkContext. This is a big blocker on reusing trained >>>>>>>>>> models >>>>>>>>>> outside of Spark for online serving. >>>>>>>>>> >>>>>>>>>> What’s the next step? Would folks be interested in getting >>>>>>>>>> together to discuss/get some feedback? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _____________________________ >>>>>>>>>> From: Felix Cheung <felixcheun...@hotmail.com> >>>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM >>>>>>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>>>>>> To: Holden Karau <hol...@pigscanfly.ca>, Joseph Bradley < >>>>>>>>>> jos...@databricks.com> >>>>>>>>>> Cc: dev <dev@spark.apache.org> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Huge +1 on this! >>>>>>>>>> >>>>>>>>>> ------------------------------ >>>>>>>>>> *From:*holden.ka...@gmail.com <holden.ka...@gmail.com> on behalf >>>>>>>>>> of Holden Karau <hol...@pigscanfly.ca> >>>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM >>>>>>>>>> *To:* Joseph Bradley >>>>>>>>>> *Cc:* dev >>>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley < >>>>>>>>>> jos...@databricks.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for bringing this up Holden! I'm a strong supporter of >>>>>>>>>>> this. >>>>>>>>>>> >>>>>>>>>>> Awesome! I'm glad other folks think something like this belongs >>>>>>>>>> in Spark. >>>>>>>>>> >>>>>>>>>>> This was one of the original goals for mllib-local: to have >>>>>>>>>>> local versions of MLlib models which could be deployed without the >>>>>>>>>>> big >>>>>>>>>>> Spark JARs and without a SparkContext or SparkSession. There are >>>>>>>>>>> related >>>>>>>>>>> commercial offerings like this : ) but the overhead of maintaining >>>>>>>>>>> those >>>>>>>>>>> offerings is pretty high. Building good APIs within MLlib to avoid >>>>>>>>>>> copying >>>>>>>>>>> logic across libraries will be well worth it. >>>>>>>>>>> >>>>>>>>>>> We've talked about this need at Databricks and have also been >>>>>>>>>>> syncing with the creators of MLeap. It'd be great to get this >>>>>>>>>>> functionality into Spark itself. Some thoughts: >>>>>>>>>>> * It'd be valuable to have this go beyond adding transform() >>>>>>>>>>> methods taking a Row to the current Models. Instead, it would be >>>>>>>>>>> ideal to >>>>>>>>>>> have local, lightweight versions of models in mllib-local, outside >>>>>>>>>>> of the >>>>>>>>>>> main mllib package (for easier deployment with smaller & fewer >>>>>>>>>>> dependencies). >>>>>>>>>>> * Supporting Pipelines is important. For this, it would be >>>>>>>>>>> ideal to utilize elements of Spark SQL, particularly Rows and >>>>>>>>>>> Types, which >>>>>>>>>>> could be moved into a local sql package. >>>>>>>>>>> * This architecture may require some awkward APIs currently to >>>>>>>>>>> have model prediction logic in mllib-local, local model classes in >>>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in >>>>>>>>>>> mllib. We >>>>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0 to >>>>>>>>>>> facilitate this architecture while making it feasible for 3rd party >>>>>>>>>>> developers to extend MLlib APIs (especially in Java). >>>>>>>>>>> >>>>>>>>>> I agree this could be interesting, and feed into the other >>>>>>>>>> discussion around when (or if) we should be considering Spark 3.0 >>>>>>>>>> I _think_ we could probably do it with optional traits people >>>>>>>>>> could mix in to avoid breaking the current APIs but I could be wrong >>>>>>>>>> on >>>>>>>>>> that point. >>>>>>>>>> >>>>>>>>>>> * It could also be worth discussing local DataFrames. They >>>>>>>>>>> might not be as important as per-Row transformations, but they >>>>>>>>>>> would be >>>>>>>>>>> helpful for batching for higher throughput. >>>>>>>>>>> >>>>>>>>>> That could be interesting as well. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'll be interested to hear others' thoughts too! >>>>>>>>>>> >>>>>>>>>>> Joseph >>>>>>>>>>> >>>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau < >>>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi y'all, >>>>>>>>>>>> >>>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like >>>>>>>>>>>> a good a time as any to revisit the online serving situation in >>>>>>>>>>>> Spark ML. >>>>>>>>>>>> DB & other's have done some excellent working moving a lot of the >>>>>>>>>>>> necessary >>>>>>>>>>>> tools into a local linear algebra package that doesn't depend on >>>>>>>>>>>> having a >>>>>>>>>>>> SparkContext. >>>>>>>>>>>> >>>>>>>>>>>> There are a few different commercial and non-commercial >>>>>>>>>>>> solutions round this, but currently our individual >>>>>>>>>>>> transform/predict >>>>>>>>>>>> methods are private so they either need to copy or re-implement >>>>>>>>>>>> (or put >>>>>>>>>>>> them selves in org.apache.spark) to access them. How would folks >>>>>>>>>>>> feel about >>>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do >>>>>>>>>>>> transformation of >>>>>>>>>>>> single element inputs (or local collections) that could be >>>>>>>>>>>> optionally >>>>>>>>>>>> implemented by stages which support this? That way we can have >>>>>>>>>>>> less copy >>>>>>>>>>>> and paste code possibly getting out of sync with our model >>>>>>>>>>>> training. >>>>>>>>>>>> >>>>>>>>>>>> I think continuing to have on-line serving grow in different >>>>>>>>>>>> projects is probably the right path, forward (folks have different >>>>>>>>>>>> needs), >>>>>>>>>>>> but I'd love to see us make it simpler for other projects to build >>>>>>>>>>>> reliable >>>>>>>>>>>> serving tools. >>>>>>>>>>>> >>>>>>>>>>>> I realize this maybe puts some of the folks in an awkward >>>>>>>>>>>> position with their own commercial offerings, but hopefully if we >>>>>>>>>>>> make it >>>>>>>>>>>> easier for everyone the commercial vendors can benefit as well. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> Holden :) >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Joseph Bradley >>>>>>>>>>> Software Engineer - Machine Learning >>>>>>>>>>> Databricks, Inc. >>>>>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Joseph Bradley >>>>>>>>> Software Engineer - Machine Learning >>>>>>>>> Databricks, Inc. >>>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> Cheers, >>>>>>>> Leif >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> >> -- Twitter: https://twitter.com/holdenkarau