Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-11-11 Thread martin
Yes, that would be a suitable option. We could just extend the standard 
Spark MLLib Transformer and add the required meta-data.


Just out of curiosity: Is there a specific reason for why the user of a 
standard Transform would not be able to add arbitrary key-value pairs 
for additional meta-data? This could also be handy not just for things 
like versioning, but also for storing evaluation metrics together with a 
trained pipeline (for people who aren't using something like MLFlow, 
yet).


Cheers,

Martin

Am 2021-10-25 14:38, schrieb Sean Owen:


You can write a custom Transformer or Estimator?

On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal  
wrote:

Hi Martin,

Agree, if you don't need the other features of MLFlow then it is likely 
overkill.


Cheers,
Sonal
https://github.com/zinggAI/zingg

On Mon, Oct 25, 2021 at 4:06 PM  wrote:

Hi Sonal,

Thanks a lot for this suggestion. I presume it might indeed be possible 
to use MLFlow for this purpose, but at present it seems a bit too much 
to introduce another framework only for storing arbitrary meta-data 
with trained ML pipelines. I was hoping there might be a way to do this 
natively in Spark ML. Otherwise, I'll just create a wrapper class for 
the trained models.


Cheers,

Martin

Am 2021-10-24 21:16, schrieb Sonal Goyal:

Does MLFlow help you? https://mlflow.org/

I don't know if ML flow can save arbitrary key-value pairs and 
associate them with a model, but versioning and evaluation etc are 
supported.


Cheers,
Sonal
https://github.com/zinggAI/zingg

On Wed, Oct 20, 2021 at 12:59 PM  wrote:

Hello,

This is my first post to this list, so I hope I won't violate any 
(un)written rules.


I recently started working with SparkNLP for a larger project. SparkNLP 
in turn is based Apache Spark's MLlib. One thing I found missing is the 
ability to store custom parameters in a Spark pipeline. It seems only 
certain pre-configured parameter values are allowed (e.g. "stages" for 
the Pipeline class).


IMHO, it would be handy to be able to store custom parameters, e.g. for 
model versions or other meta-data, so that these parameters are stored 
with a trained pipeline, for instance. This could also be used to 
include evaluation results, such as accuracy, with trained ML models.


(I also asked this on Stackoverflow, but didn't get a response, yet: 
https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline)


Would does the community think about this proposal? Has it been 
discussed before perhaps? Any thoughts?


Cheers,

Martin

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread Sean Owen
You can write a custom Transformer or Estimator?

On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal  wrote:

> Hi Martin,
>
> Agree, if you don't need the other features of MLFlow then it is likely
> overkill.
>
> Cheers,
> Sonal
> https://github.com/zinggAI/zingg
>
>
>
> On Mon, Oct 25, 2021 at 4:06 PM  wrote:
>
>> Hi Sonal,
>>
>> Thanks a lot for this suggestion. I presume it might indeed be possible
>> to use MLFlow for this purpose, but at present it seems a bit too much to
>> introduce another framework only for storing arbitrary meta-data with
>> trained ML pipelines. I was hoping there might be a way to do this natively
>> in Spark ML. Otherwise, I'll just create a wrapper class for the trained
>> models.
>>
>> Cheers,
>>
>> Martin
>>
>>
>>
>> Am 2021-10-24 21:16, schrieb Sonal Goyal:
>>
>> Does MLFlow help you? https://mlflow.org/
>>
>> I don't know if ML flow can save arbitrary key-value pairs and associate
>> them with a model, but versioning and evaluation etc are supported.
>>
>> Cheers,
>> Sonal
>> https://github.com/zinggAI/zingg
>>
>>
>> On Wed, Oct 20, 2021 at 12:59 PM  wrote:
>>
>> Hello,
>>
>> This is my first post to this list, so I hope I won't violate any
>> (un)written rules.
>>
>> I recently started working with SparkNLP for a larger project. SparkNLP
>> in turn is based Apache Spark's MLlib. One thing I found missing is the
>> ability to store custom parameters in a Spark pipeline. It seems only
>> certain pre-configured parameter values are allowed (e.g. "stages" for the
>> Pipeline class).
>>
>> IMHO, it would be handy to be able to store custom parameters, e.g. for
>> model versions or other meta-data, so that these parameters are stored with
>> a trained pipeline, for instance. This could also be used to include
>> evaluation results, such as accuracy, with trained ML models.
>>
>> (I also asked this on Stackoverflow, but didn't get a response, yet:
>> https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline
>> )
>>
>> Would does the community think about this proposal? Has it been discussed
>> before perhaps? Any thoughts?
>>
>> Cheers,
>>
>> Martin
>>
>>


Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread Sonal Goyal
Hi Martin,

Agree, if you don't need the other features of MLFlow then it is likely
overkill.

Cheers,
Sonal
https://github.com/zinggAI/zingg



On Mon, Oct 25, 2021 at 4:06 PM  wrote:

> Hi Sonal,
>
> Thanks a lot for this suggestion. I presume it might indeed be possible to
> use MLFlow for this purpose, but at present it seems a bit too much to
> introduce another framework only for storing arbitrary meta-data with
> trained ML pipelines. I was hoping there might be a way to do this natively
> in Spark ML. Otherwise, I'll just create a wrapper class for the trained
> models.
>
> Cheers,
>
> Martin
>
>
>
> Am 2021-10-24 21:16, schrieb Sonal Goyal:
>
> Does MLFlow help you? https://mlflow.org/
>
> I don't know if ML flow can save arbitrary key-value pairs and associate
> them with a model, but versioning and evaluation etc are supported.
>
> Cheers,
> Sonal
> https://github.com/zinggAI/zingg
>
>
> On Wed, Oct 20, 2021 at 12:59 PM  wrote:
>
> Hello,
>
> This is my first post to this list, so I hope I won't violate any
> (un)written rules.
>
> I recently started working with SparkNLP for a larger project. SparkNLP in
> turn is based Apache Spark's MLlib. One thing I found missing is the
> ability to store custom parameters in a Spark pipeline. It seems only
> certain pre-configured parameter values are allowed (e.g. "stages" for the
> Pipeline class).
>
> IMHO, it would be handy to be able to store custom parameters, e.g. for
> model versions or other meta-data, so that these parameters are stored with
> a trained pipeline, for instance. This could also be used to include
> evaluation results, such as accuracy, with trained ML models.
>
> (I also asked this on Stackoverflow, but didn't get a response, yet:
> https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline
> )
>
> Would does the community think about this proposal? Has it been discussed
> before perhaps? Any thoughts?
>
> Cheers,
>
> Martin
>
>


Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread martin

Hi Sonal,

Thanks a lot for this suggestion. I presume it might indeed be possible 
to use MLFlow for this purpose, but at present it seems a bit too much 
to introduce another framework only for storing arbitrary meta-data with 
trained ML pipelines. I was hoping there might be a way to do this 
natively in Spark ML. Otherwise, I'll just create a wrapper class for 
the trained models.


Cheers,

Martin

Am 2021-10-24 21:16, schrieb Sonal Goyal:


Does MLFlow help you? https://mlflow.org/

I don't know if ML flow can save arbitrary key-value pairs and 
associate them with a model, but versioning and evaluation etc are 
supported.


Cheers,
Sonal
https://github.com/zinggAI/zingg

On Wed, Oct 20, 2021 at 12:59 PM  wrote:


Hello,

This is my first post to this list, so I hope I won't violate any 
(un)written rules.


I recently started working with SparkNLP for a larger project. 
SparkNLP in turn is based Apache Spark's MLlib. One thing I found 
missing is the ability to store custom parameters in a Spark pipeline. 
It seems only certain pre-configured parameter values are allowed 
(e.g. "stages" for the Pipeline class).


IMHO, it would be handy to be able to store custom parameters, e.g. 
for model versions or other meta-data, so that these parameters are 
stored with a trained pipeline, for instance. This could also be used 
to include evaluation results, such as accuracy, with trained ML 
models.


(I also asked this on Stackoverflow, but didn't get a response, yet: 
https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline)


Would does the community think about this proposal? Has it been 
discussed before perhaps? Any thoughts?


Cheers,

Martin

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-24 Thread Sonal Goyal
Does MLFlow help you? https://mlflow.org/

I don't know if ML flow can save arbitrary key-value pairs and associate
them with a model, but versioning and evaluation etc are supported.

Cheers,
Sonal
https://github.com/zinggAI/zingg



On Wed, Oct 20, 2021 at 12:59 PM  wrote:

> Hello,
>
> This is my first post to this list, so I hope I won't violate any
> (un)written rules.
>
> I recently started working with SparkNLP for a larger project. SparkNLP in
> turn is based Apache Spark's MLlib. One thing I found missing is the
> ability to store custom parameters in a Spark pipeline. It seems only
> certain pre-configured parameter values are allowed (e.g. "stages" for the
> Pipeline class).
>
> IMHO, it would be handy to be able to store custom parameters, e.g. for
> model versions or other meta-data, so that these parameters are stored with
> a trained pipeline, for instance. This could also be used to include
> evaluation results, such as accuracy, with trained ML models.
>
> (I also asked this on Stackoverflow, but didn't get a response, yet:
> https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline
> )
>
> Would does the community think about this proposal? Has it been discussed
> before perhaps? Any thoughts?
>
> Cheers,
>
> Martin
>


Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-20 Thread martin

Hello,

This is my first post to this list, so I hope I won't violate any 
(un)written rules.


I recently started working with SparkNLP for a larger project. SparkNLP 
in turn is based Apache Spark's MLlib. One thing I found missing is the 
ability to store custom parameters in a Spark pipeline. It seems only 
certain pre-configured parameter values are allowed (e.g. "stages" for 
the Pipeline class).


IMHO, it would be handy to be able to store custom parameters, e.g. for 
model versions or other meta-data, so that these parameters are stored 
with a trained pipeline, for instance. This could also be used to 
include evaluation results, such as accuracy, with trained ML models.


(I also asked this on Stackoverflow, but didn't get a response, yet: 
https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline)


Would does the community think about this proposal? Has it been 
discussed before perhaps? Any thoughts?


Cheers,

Martin