Hi,

As I know many of you don't read / are not part of the user list. I'll make
a summary of what happened at the summit:

We discussed some needs we get in order to start serving our predictions
with Spark. We mostly talked about alternatives to this work and what we
could expect in these areas.

I'm going to share mine here, hoping it will trigger further discussion. We
currently:

   - Use Spark as an ETL tool, followed by
   - a Python (numpy/pandas based) pipeline to preprocess information and
   - use Tensorflow for training our Neural Networks


What we'd love to, and why we don't:

   - Start using Spark for our full preprocessing pipeline. Because type
   safety. And distributed computation. And catalyst. Buy mainly because
   *not-python.*
   Our main issue:
      - We want to use the same code for online serving. We're not willing
      to duplicate the preprocessing operations. Spark is not
      *serving-friendly*.
      - If we want it to preprocess online, we need to copy/paste our
      custom transformations to MLeap.
      - It's an issue to communicate with a Tensorflow API to give it the
      preprocessed data to serve.
   - Use Spark to do hyperparameter tunning.
   We'd need:
      - GPU Integration with Spark, letting us achieve finer tuning.
      - Better TensorFlow integration


Now that I'm on the @dev, do you think that any of this issues could be
addressed? We talked at the summit about PFA (Portable Format for
Analytics) and how we would expect it to cover some issues. Another
discussion I remember was about *encoding operations (functions/lambdas) in
PFA itself. *And I don't remember having smoked anything at that point,
although we could as well have.

Oh, and @Holden Karau <hol...@pigscanfly.ca> insisted that she would be
much happier with us if we started helping with code reviews. I'm willing
to make some time for that.


Sorry again for the delay in replying to this email *(and now sorry for the
length), *looking forward to following up on this topic

El mar., 3 jul. 2018 a las 15:37, Saikat Kanjilal (<sxk1...@hotmail.com>)
escribió:

> Ping, would love to hear back on this.
>
>
> ------------------------------
> *From:* Saikat Kanjilal <sxk1...@hotmail.com>
> *Sent:* Tuesday, June 26, 2018 7:27 AM
> *To:* dev@spark.apache.org
> *Subject:* Spark model serving
>
> HoldenK and interested folks,
> Am just following up on the spark model serving discussions as this is
> highly relevant to what I’m embarking on at work.  Is there a concrete list
> of next steps or can someone summarize what was discussed at the summit ,
> would love to have a Seattle version of this discussion with some folks.
>
> Look forward to hearing back and driving this.
>
> Regards
>
> Sent from my iPhone
>

Reply via email to