Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline
Yes, that would be a suitable option. We could just extend the standard Spark MLLib Transformer and add the required meta-data. Just out of curiosity: Is there a specific reason for why the user of a standard Transform would not be able to add arbitrary key-value pairs for additional meta-data? This could also be handy not just for things like versioning, but also for storing evaluation metrics together with a trained pipeline (for people who aren't using something like MLFlow, yet). Cheers, Martin Am 2021-10-25 14:38, schrieb Sean Owen: You can write a custom Transformer or Estimator? On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal wrote: Hi Martin, Agree, if you don't need the other features of MLFlow then it is likely overkill. Cheers, Sonal https://github.com/zinggAI/zingg On Mon, Oct 25, 2021 at 4:06 PM wrote: Hi Sonal, Thanks a lot for this suggestion. I presume it might indeed be possible to use MLFlow for this purpose, but at present it seems a bit too much to introduce another framework only for storing arbitrary meta-data with trained ML pipelines. I was hoping there might be a way to do this natively in Spark ML. Otherwise, I'll just create a wrapper class for the trained models. Cheers, Martin Am 2021-10-24 21:16, schrieb Sonal Goyal: Does MLFlow help you? https://mlflow.org/ I don't know if ML flow can save arbitrary key-value pairs and associate them with a model, but versioning and evaluation etc are supported. Cheers, Sonal https://github.com/zinggAI/zingg On Wed, Oct 20, 2021 at 12:59 PM wrote: Hello, This is my first post to this list, so I hope I won't violate any (un)written rules. I recently started working with SparkNLP for a larger project. SparkNLP in turn is based Apache Spark's MLlib. One thing I found missing is the ability to store custom parameters in a Spark pipeline. It seems only certain pre-configured parameter values are allowed (e.g. "stages" for the Pipeline class). IMHO, it would be handy to be able to store custom parameters, e.g. for model versions or other meta-data, so that these parameters are stored with a trained pipeline, for instance. This could also be used to include evaluation results, such as accuracy, with trained ML models. (I also asked this on Stackoverflow, but didn't get a response, yet: https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline) Would does the community think about this proposal? Has it been discussed before perhaps? Any thoughts? Cheers, Martin
Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline
You can write a custom Transformer or Estimator? On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal wrote: > Hi Martin, > > Agree, if you don't need the other features of MLFlow then it is likely > overkill. > > Cheers, > Sonal > https://github.com/zinggAI/zingg > > > > On Mon, Oct 25, 2021 at 4:06 PM wrote: > >> Hi Sonal, >> >> Thanks a lot for this suggestion. I presume it might indeed be possible >> to use MLFlow for this purpose, but at present it seems a bit too much to >> introduce another framework only for storing arbitrary meta-data with >> trained ML pipelines. I was hoping there might be a way to do this natively >> in Spark ML. Otherwise, I'll just create a wrapper class for the trained >> models. >> >> Cheers, >> >> Martin >> >> >> >> Am 2021-10-24 21:16, schrieb Sonal Goyal: >> >> Does MLFlow help you? https://mlflow.org/ >> >> I don't know if ML flow can save arbitrary key-value pairs and associate >> them with a model, but versioning and evaluation etc are supported. >> >> Cheers, >> Sonal >> https://github.com/zinggAI/zingg >> >> >> On Wed, Oct 20, 2021 at 12:59 PM wrote: >> >> Hello, >> >> This is my first post to this list, so I hope I won't violate any >> (un)written rules. >> >> I recently started working with SparkNLP for a larger project. SparkNLP >> in turn is based Apache Spark's MLlib. One thing I found missing is the >> ability to store custom parameters in a Spark pipeline. It seems only >> certain pre-configured parameter values are allowed (e.g. "stages" for the >> Pipeline class). >> >> IMHO, it would be handy to be able to store custom parameters, e.g. for >> model versions or other meta-data, so that these parameters are stored with >> a trained pipeline, for instance. This could also be used to include >> evaluation results, such as accuracy, with trained ML models. >> >> (I also asked this on Stackoverflow, but didn't get a response, yet: >> https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline >> ) >> >> Would does the community think about this proposal? Has it been discussed >> before perhaps? Any thoughts? >> >> Cheers, >> >> Martin >> >>
Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline
Hi Martin, Agree, if you don't need the other features of MLFlow then it is likely overkill. Cheers, Sonal https://github.com/zinggAI/zingg On Mon, Oct 25, 2021 at 4:06 PM wrote: > Hi Sonal, > > Thanks a lot for this suggestion. I presume it might indeed be possible to > use MLFlow for this purpose, but at present it seems a bit too much to > introduce another framework only for storing arbitrary meta-data with > trained ML pipelines. I was hoping there might be a way to do this natively > in Spark ML. Otherwise, I'll just create a wrapper class for the trained > models. > > Cheers, > > Martin > > > > Am 2021-10-24 21:16, schrieb Sonal Goyal: > > Does MLFlow help you? https://mlflow.org/ > > I don't know if ML flow can save arbitrary key-value pairs and associate > them with a model, but versioning and evaluation etc are supported. > > Cheers, > Sonal > https://github.com/zinggAI/zingg > > > On Wed, Oct 20, 2021 at 12:59 PM wrote: > > Hello, > > This is my first post to this list, so I hope I won't violate any > (un)written rules. > > I recently started working with SparkNLP for a larger project. SparkNLP in > turn is based Apache Spark's MLlib. One thing I found missing is the > ability to store custom parameters in a Spark pipeline. It seems only > certain pre-configured parameter values are allowed (e.g. "stages" for the > Pipeline class). > > IMHO, it would be handy to be able to store custom parameters, e.g. for > model versions or other meta-data, so that these parameters are stored with > a trained pipeline, for instance. This could also be used to include > evaluation results, such as accuracy, with trained ML models. > > (I also asked this on Stackoverflow, but didn't get a response, yet: > https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline > ) > > Would does the community think about this proposal? Has it been discussed > before perhaps? Any thoughts? > > Cheers, > > Martin > >
Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline
Hi Sonal, Thanks a lot for this suggestion. I presume it might indeed be possible to use MLFlow for this purpose, but at present it seems a bit too much to introduce another framework only for storing arbitrary meta-data with trained ML pipelines. I was hoping there might be a way to do this natively in Spark ML. Otherwise, I'll just create a wrapper class for the trained models. Cheers, Martin Am 2021-10-24 21:16, schrieb Sonal Goyal: Does MLFlow help you? https://mlflow.org/ I don't know if ML flow can save arbitrary key-value pairs and associate them with a model, but versioning and evaluation etc are supported. Cheers, Sonal https://github.com/zinggAI/zingg On Wed, Oct 20, 2021 at 12:59 PM wrote: Hello, This is my first post to this list, so I hope I won't violate any (un)written rules. I recently started working with SparkNLP for a larger project. SparkNLP in turn is based Apache Spark's MLlib. One thing I found missing is the ability to store custom parameters in a Spark pipeline. It seems only certain pre-configured parameter values are allowed (e.g. "stages" for the Pipeline class). IMHO, it would be handy to be able to store custom parameters, e.g. for model versions or other meta-data, so that these parameters are stored with a trained pipeline, for instance. This could also be used to include evaluation results, such as accuracy, with trained ML models. (I also asked this on Stackoverflow, but didn't get a response, yet: https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline) Would does the community think about this proposal? Has it been discussed before perhaps? Any thoughts? Cheers, Martin
Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline
Does MLFlow help you? https://mlflow.org/ I don't know if ML flow can save arbitrary key-value pairs and associate them with a model, but versioning and evaluation etc are supported. Cheers, Sonal https://github.com/zinggAI/zingg On Wed, Oct 20, 2021 at 12:59 PM wrote: > Hello, > > This is my first post to this list, so I hope I won't violate any > (un)written rules. > > I recently started working with SparkNLP for a larger project. SparkNLP in > turn is based Apache Spark's MLlib. One thing I found missing is the > ability to store custom parameters in a Spark pipeline. It seems only > certain pre-configured parameter values are allowed (e.g. "stages" for the > Pipeline class). > > IMHO, it would be handy to be able to store custom parameters, e.g. for > model versions or other meta-data, so that these parameters are stored with > a trained pipeline, for instance. This could also be used to include > evaluation results, such as accuracy, with trained ML models. > > (I also asked this on Stackoverflow, but didn't get a response, yet: > https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline > ) > > Would does the community think about this proposal? Has it been discussed > before perhaps? Any thoughts? > > Cheers, > > Martin >
Feature (?): Setting custom parameters for a Spark MLlib pipeline
Hello, This is my first post to this list, so I hope I won't violate any (un)written rules. I recently started working with SparkNLP for a larger project. SparkNLP in turn is based Apache Spark's MLlib. One thing I found missing is the ability to store custom parameters in a Spark pipeline. It seems only certain pre-configured parameter values are allowed (e.g. "stages" for the Pipeline class). IMHO, it would be handy to be able to store custom parameters, e.g. for model versions or other meta-data, so that these parameters are stored with a trained pipeline, for instance. This could also be used to include evaluation results, such as accuracy, with trained ML models. (I also asked this on Stackoverflow, but didn't get a response, yet: https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline) Would does the community think about this proposal? Has it been discussed before perhaps? Any thoughts? Cheers, Martin