Re: Spark model serving

2018-08-03 Thread Saikat Kanjilal
@holdenK et al ping on next steps.

Sent from my iPhone

On Jul 12, 2018, at 3:47 PM, Saikat Kanjilal 
mailto:sxk1...@hotmail.com>> wrote:

Thanks  maximiliano so much for responding, I didn't want this discussion to 
disappear in the wilderness of dev emails :), here's what I would like to see 
or contribute to for model serving within spark, first of I want to be clear on 
what we mean by model serving so I'll add my interpretation of the definition 
here:  model serving is the ability to discover what models exist through the 
use of a model repository and serve up the contents of a particular model for 
invocation/consumption, before we dive into the details you specify below is 
this the definition that people have in mind.  Finally as I mentioned earlier 
when I'm thinking about models I'm initially targeting deep/machine learning 
models but eventually models requiring lots of compute or I/O frequently 
present in Operations Research and other worlds.

Given the above I feel like we also need a more robustified (nice word huh)  
version of Livy , something that discovers and serves up any model for 
downstream computation in addition to hooking it up to zeppelin or some other 
downstream viz engine.

Would love to hear thoughts.

From: Maximiliano Felice 
mailto:maximilianofel...@gmail.com>>
Sent: Thursday, July 12, 2018 11:52 AM
To: Saikat Kanjilal; Holden Karau
Cc: dev
Subject: Re: Spark model serving

Hi,

As I know many of you don't read / are not part of the user list. I'll make a 
summary of what happened at the summit:

We discussed some needs we get in order to start serving our predictions with 
Spark. We mostly talked about alternatives to this work and what we could 
expect in these areas.

I'm going to share mine here, hoping it will trigger further discussion. We 
currently:

  *   Use Spark as an ETL tool, followed by
  *   a Python (numpy/pandas based) pipeline to preprocess information and
  *   use Tensorflow for training our Neural Networks

What we'd love to, and why we don't:

  *   Start using Spark for our full preprocessing pipeline. Because type 
safety. And distributed computation. And catalyst. Buy mainly because 
not-python.
Our main issue:
 *   We want to use the same code for online serving. We're not willing to 
duplicate the preprocessing operations. Spark is not serving-friendly.
 *   If we want it to preprocess online, we need to copy/paste our custom 
transformations to MLeap.
 *   It's an issue to communicate with a Tensorflow API to give it the 
preprocessed data to serve.
  *   Use Spark to do hyperparameter tunning.
We'd need:
 *   GPU Integration with Spark, letting us achieve finer tuning.
 *   Better TensorFlow integration

Now that I'm on the @dev, do you think that any of this issues could be 
addressed? We talked at the summit about PFA (Portable Format for Analytics) 
and how we would expect it to cover some issues. Another discussion I remember 
was about encoding operations (functions/lambdas) in PFA itself. And I don't 
remember having smoked anything at that point, although we could as well have.

Oh, and @Holden Karau<mailto:hol...@pigscanfly.ca> insisted that she would be 
much happier with us if we started helping with code reviews. I'm willing to 
make some time for that.


Sorry again for the delay in replying to this email (and now sorry for the 
length), looking forward to following up on this topic

El mar., 3 jul. 2018 a las 15:37, Saikat Kanjilal 
(mailto:sxk1...@hotmail.com>>) escribió:

Ping, would love to hear back on this.



From: Saikat Kanjilal mailto:sxk1...@hotmail.com>>
Sent: Tuesday, June 26, 2018 7:27 AM
To: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Spark model serving

HoldenK and interested folks,
Am just following up on the spark model serving discussions as this is highly 
relevant to what I’m embarking on at work.  Is there a concrete list of next 
steps or can someone summarize what was discussed at the summit , would love to 
have a Seattle version of this discussion with some folks.

Look forward to hearing back and driving this.

Regards

Sent from my iPhone


Re: Spark model serving

2018-07-12 Thread Saikat Kanjilal
Thanks  maximiliano so much for responding, I didn't want this discussion to 
disappear in the wilderness of dev emails :), here's what I would like to see 
or contribute to for model serving within spark, first of I want to be clear on 
what we mean by model serving so I'll add my interpretation of the definition 
here:  model serving is the ability to discover what models exist through the 
use of a model repository and serve up the contents of a particular model for 
invocation/consumption, before we dive into the details you specify below is 
this the definition that people have in mind.  Finally as I mentioned earlier 
when I'm thinking about models I'm initially targeting deep/machine learning 
models but eventually models requiring lots of compute or I/O frequently 
present in Operations Research and other worlds.

Given the above I feel like we also need a more robustified (nice word huh)  
version of Livy , something that discovers and serves up any model for 
downstream computation in addition to hooking it up to zeppelin or some other 
downstream viz engine.

Would love to hear thoughts.

From: Maximiliano Felice 
Sent: Thursday, July 12, 2018 11:52 AM
To: Saikat Kanjilal; Holden Karau
Cc: dev
Subject: Re: Spark model serving

Hi,

As I know many of you don't read / are not part of the user list. I'll make a 
summary of what happened at the summit:

We discussed some needs we get in order to start serving our predictions with 
Spark. We mostly talked about alternatives to this work and what we could 
expect in these areas.

I'm going to share mine here, hoping it will trigger further discussion. We 
currently:

  *   Use Spark as an ETL tool, followed by
  *   a Python (numpy/pandas based) pipeline to preprocess information and
  *   use Tensorflow for training our Neural Networks

What we'd love to, and why we don't:

  *   Start using Spark for our full preprocessing pipeline. Because type 
safety. And distributed computation. And catalyst. Buy mainly because 
not-python.
Our main issue:
 *   We want to use the same code for online serving. We're not willing to 
duplicate the preprocessing operations. Spark is not serving-friendly.
 *   If we want it to preprocess online, we need to copy/paste our custom 
transformations to MLeap.
 *   It's an issue to communicate with a Tensorflow API to give it the 
preprocessed data to serve.
  *   Use Spark to do hyperparameter tunning.
We'd need:
 *   GPU Integration with Spark, letting us achieve finer tuning.
 *   Better TensorFlow integration

Now that I'm on the @dev, do you think that any of this issues could be 
addressed? We talked at the summit about PFA (Portable Format for Analytics) 
and how we would expect it to cover some issues. Another discussion I remember 
was about encoding operations (functions/lambdas) in PFA itself. And I don't 
remember having smoked anything at that point, although we could as well have.

Oh, and @Holden Karau<mailto:hol...@pigscanfly.ca> insisted that she would be 
much happier with us if we started helping with code reviews. I'm willing to 
make some time for that.


Sorry again for the delay in replying to this email (and now sorry for the 
length), looking forward to following up on this topic

El mar., 3 jul. 2018 a las 15:37, Saikat Kanjilal 
(mailto:sxk1...@hotmail.com>>) escribió:

Ping, would love to hear back on this.



From: Saikat Kanjilal mailto:sxk1...@hotmail.com>>
Sent: Tuesday, June 26, 2018 7:27 AM
To: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Spark model serving

HoldenK and interested folks,
Am just following up on the spark model serving discussions as this is highly 
relevant to what I’m embarking on at work.  Is there a concrete list of next 
steps or can someone summarize what was discussed at the summit , would love to 
have a Seattle version of this discussion with some folks.

Look forward to hearing back and driving this.

Regards

Sent from my iPhone


Re: Spark model serving

2018-07-12 Thread Maximiliano Felice
Hi,

As I know many of you don't read / are not part of the user list. I'll make
a summary of what happened at the summit:

We discussed some needs we get in order to start serving our predictions
with Spark. We mostly talked about alternatives to this work and what we
could expect in these areas.

I'm going to share mine here, hoping it will trigger further discussion. We
currently:

   - Use Spark as an ETL tool, followed by
   - a Python (numpy/pandas based) pipeline to preprocess information and
   - use Tensorflow for training our Neural Networks


What we'd love to, and why we don't:

   - Start using Spark for our full preprocessing pipeline. Because type
   safety. And distributed computation. And catalyst. Buy mainly because
   *not-python.*
   Our main issue:
  - We want to use the same code for online serving. We're not willing
  to duplicate the preprocessing operations. Spark is not
  *serving-friendly*.
  - If we want it to preprocess online, we need to copy/paste our
  custom transformations to MLeap.
  - It's an issue to communicate with a Tensorflow API to give it the
  preprocessed data to serve.
   - Use Spark to do hyperparameter tunning.
   We'd need:
  - GPU Integration with Spark, letting us achieve finer tuning.
  - Better TensorFlow integration


Now that I'm on the @dev, do you think that any of this issues could be
addressed? We talked at the summit about PFA (Portable Format for
Analytics) and how we would expect it to cover some issues. Another
discussion I remember was about *encoding operations (functions/lambdas) in
PFA itself. *And I don't remember having smoked anything at that point,
although we could as well have.

Oh, and @Holden Karau  insisted that she would be
much happier with us if we started helping with code reviews. I'm willing
to make some time for that.


Sorry again for the delay in replying to this email *(and now sorry for the
length), *looking forward to following up on this topic

El mar., 3 jul. 2018 a las 15:37, Saikat Kanjilal ()
escribió:

> Ping, would love to hear back on this.
>
>
> --
> *From:* Saikat Kanjilal 
> *Sent:* Tuesday, June 26, 2018 7:27 AM
> *To:* dev@spark.apache.org
> *Subject:* Spark model serving
>
> HoldenK and interested folks,
> Am just following up on the spark model serving discussions as this is
> highly relevant to what I’m embarking on at work.  Is there a concrete list
> of next steps or can someone summarize what was discussed at the summit ,
> would love to have a Seattle version of this discussion with some folks.
>
> Look forward to hearing back and driving this.
>
> Regards
>
> Sent from my iPhone
>


Re: Spark model serving

2018-07-03 Thread Saikat Kanjilal
Ping, would love to hear back on this.



From: Saikat Kanjilal 
Sent: Tuesday, June 26, 2018 7:27 AM
To: dev@spark.apache.org
Subject: Spark model serving

HoldenK and interested folks,
Am just following up on the spark model serving discussions as this is highly 
relevant to what I’m embarking on at work.  Is there a concrete list of next 
steps or can someone summarize what was discussed at the summit , would love to 
have a Seattle version of this discussion with some folks.

Look forward to hearing back and driving this.

Regards

Sent from my iPhone


Spark model serving

2018-06-26 Thread Saikat Kanjilal
HoldenK and interested folks,
Am just following up on the spark model serving discussions as this is highly 
relevant to what I’m embarking on at work.  Is there a concrete list of next 
steps or can someone summarize what was discussed at the summit , would love to 
have a Seattle version of this discussion with some folks.

Look forward to hearing back and driving this.

Regards 

Sent from my iPhone
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org