Re: Enum parameter in ML
@Alexander It's worked for us to use Param[String] directly. (I think it's b/c String is exactly java.lang.String, rather than a Scala version of it, so it's still Java-friendly.) In other classes, I've added a static list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent coverage on that yet. @Stephen It could be used, but I prefer String for spark.ml since it's easier to maintain consistent APIs across languages. That's what we've used so far, at least. On Wed, Sep 16, 2015 at 6:00 PM, Stephen Boesch wrote: > There was a long thread about enum's initiated by Xiangrui several months > back in which the final consensus was to use java enum's. Is that > discussion (/decision) applicable here? > > 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander : > >> Hi Joseph, >> >> >> >> Strings sounds reasonable. However, there is no StringParam (only >> StringArrayParam). Should I create a new param type? Also, how can the user >> get all possible values of String parameter? >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Joseph Bradley [mailto:jos...@databricks.com] >> *Sent:* Wednesday, September 16, 2015 5:35 PM >> *To:* Feynman Liang >> *Cc:* Ulanov, Alexander; dev@spark.apache.org >> *Subject:* Re: Enum parameter in ML >> >> >> >> I've tended to use Strings. Params can be created with a validator >> (isValid) which can ensure users get an immediate error if they try to pass >> an unsupported String. Not as nice as compile-time errors, but easier on >> the APIs. >> >> >> >> On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang >> wrote: >> >> We usually write a Java test suite which exercises the public API (e.g. >> DCT >> <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> >> ). >> >> >> >> It may be possible to create a sealed trait with singleton concrete >> instances inside of a serializable companion object, the just introduce a >> Param[SealedTrait] to the model (e.g. StreamingDecay PR >> <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). >> However, this would require Java users to use >> CompanionObject$.ConcreteInstanceName to access enum values which isn't the >> prettiest syntax. >> >> >> >> Another option would just be to use Strings, which although is not type >> safe does simplify implementation. >> >> >> >> On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander < >> alexander.ula...@hpe.com> wrote: >> >> Hi Feynman, >> >> >> >> Thank you for suggestion. How can I ensure that there will be no problems >> for Java users? (I only use Scala API) >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Feynman Liang [mailto:fli...@databricks.com] >> *Sent:* Monday, September 14, 2015 5:27 PM >> *To:* Ulanov, Alexander >> *Cc:* dev@spark.apache.org >> *Subject:* Re: Enum parameter in ML >> >> >> >> Since PipelineStages are serializable, the params must also be >> serializable. We also have to keep the Java API in mind. Introducing a new >> enum Param type may work, but we will have to ensure that Java users can >> use it without dealing with ClassTags (I believe Scala will create new >> types for each possible value in the Enum) and that it can be serialized. >> >> >> >> On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < >> alexander.ula...@hpe.com> wrote: >> >> Dear Spark developers, >> >> >> >> I am currently implementing the Estimator in ML that has a parameter that >> can take several different values that are mutually exclusive. The most >> appropriate type seems to be Scala Enum ( >> http://www.scala-lang.org/api/current/index.html#scala.Enumeration). >> However, the current ML API has the following parameter types: >> >> BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, >> IntParam, LongParam, StringArrayParam >> >> >> >> Should I introduce a new parameter type in ML API that is based on Scala >> Enum? >> >> >> >> Best regards, Alexander >> >> >> >> >> >> >> > >
Re: Enum parameter in ML
There was a long thread about enum's initiated by Xiangrui several months back in which the final consensus was to use java enum's. Is that discussion (/decision) applicable here? 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander : > Hi Joseph, > > > > Strings sounds reasonable. However, there is no StringParam (only > StringArrayParam). Should I create a new param type? Also, how can the user > get all possible values of String parameter? > > > > Best regards, Alexander > > > > *From:* Joseph Bradley [mailto:jos...@databricks.com] > *Sent:* Wednesday, September 16, 2015 5:35 PM > *To:* Feynman Liang > *Cc:* Ulanov, Alexander; dev@spark.apache.org > *Subject:* Re: Enum parameter in ML > > > > I've tended to use Strings. Params can be created with a validator > (isValid) which can ensure users get an immediate error if they try to pass > an unsupported String. Not as nice as compile-time errors, but easier on > the APIs. > > > > On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang > wrote: > > We usually write a Java test suite which exercises the public API (e.g. > DCT > <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> > ). > > > > It may be possible to create a sealed trait with singleton concrete > instances inside of a serializable companion object, the just introduce a > Param[SealedTrait] to the model (e.g. StreamingDecay PR > <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). > However, this would require Java users to use > CompanionObject$.ConcreteInstanceName to access enum values which isn't the > prettiest syntax. > > > > Another option would just be to use Strings, which although is not type > safe does simplify implementation. > > > > On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Hi Feynman, > > > > Thank you for suggestion. How can I ensure that there will be no problems > for Java users? (I only use Scala API) > > > > Best regards, Alexander > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Monday, September 14, 2015 5:27 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Enum parameter in ML > > > > Since PipelineStages are serializable, the params must also be > serializable. We also have to keep the Java API in mind. Introducing a new > enum Param type may work, but we will have to ensure that Java users can > use it without dealing with ClassTags (I believe Scala will create new > types for each possible value in the Enum) and that it can be serialized. > > > > On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Dear Spark developers, > > > > I am currently implementing the Estimator in ML that has a parameter that > can take several different values that are mutually exclusive. The most > appropriate type seems to be Scala Enum ( > http://www.scala-lang.org/api/current/index.html#scala.Enumeration). > However, the current ML API has the following parameter types: > > BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, > IntParam, LongParam, StringArrayParam > > > > Should I introduce a new parameter type in ML API that is based on Scala > Enum? > > > > Best regards, Alexander > > > > > > >
RE: Enum parameter in ML
Hi Joseph, Strings sounds reasonable. However, there is no StringParam (only StringArrayParam). Should I create a new param type? Also, how can the user get all possible values of String parameter? Best regards, Alexander From: Joseph Bradley [mailto:jos...@databricks.com] Sent: Wednesday, September 16, 2015 5:35 PM To: Feynman Liang Cc: Ulanov, Alexander; dev@spark.apache.org Subject: Re: Enum parameter in ML I've tended to use Strings. Params can be created with a validator (isValid) which can ensure users get an immediate error if they try to pass an unsupported String. Not as nice as compile-time errors, but easier on the APIs. On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang mailto:fli...@databricks.com>> wrote: We usually write a Java test suite which exercises the public API (e.g. DCT<https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71>). It may be possible to create a sealed trait with singleton concrete instances inside of a serializable companion object, the just introduce a Param[SealedTrait] to the model (e.g. StreamingDecay PR<https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). However, this would require Java users to use CompanionObject$.ConcreteInstanceName to access enum values which isn't the prettiest syntax. Another option would just be to use Strings, which although is not type safe does simplify implementation. On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander mailto:alexander.ula...@hpe.com>> wrote: Hi Feynman, Thank you for suggestion. How can I ensure that there will be no problems for Java users? (I only use Scala API) Best regards, Alexander From: Feynman Liang [mailto:fli...@databricks.com<mailto:fli...@databricks.com>] Sent: Monday, September 14, 2015 5:27 PM To: Ulanov, Alexander Cc: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Re: Enum parameter in ML Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each possible value in the Enum) and that it can be serialized. On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander mailto:alexander.ula...@hpe.com>> wrote: Dear Spark developers, I am currently implementing the Estimator in ML that has a parameter that can take several different values that are mutually exclusive. The most appropriate type seems to be Scala Enum (http://www.scala-lang.org/api/current/index.html#scala.Enumeration). However, the current ML API has the following parameter types: BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, IntParam, LongParam, StringArrayParam Should I introduce a new parameter type in ML API that is based on Scala Enum? Best regards, Alexander
Re: Enum parameter in ML
I've tended to use Strings. Params can be created with a validator (isValid) which can ensure users get an immediate error if they try to pass an unsupported String. Not as nice as compile-time errors, but easier on the APIs. On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang wrote: > We usually write a Java test suite which exercises the public API (e.g. > DCT > <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> > ). > > It may be possible to create a sealed trait with singleton concrete > instances inside of a serializable companion object, the just introduce a > Param[SealedTrait] to the model (e.g. StreamingDecay PR > <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). > However, this would require Java users to use > CompanionObject$.ConcreteInstanceName to access enum values which isn't the > prettiest syntax. > > Another option would just be to use Strings, which although is not type > safe does simplify implementation. > > On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > >> Hi Feynman, >> >> >> >> Thank you for suggestion. How can I ensure that there will be no problems >> for Java users? (I only use Scala API) >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Feynman Liang [mailto:fli...@databricks.com] >> *Sent:* Monday, September 14, 2015 5:27 PM >> *To:* Ulanov, Alexander >> *Cc:* dev@spark.apache.org >> *Subject:* Re: Enum parameter in ML >> >> >> >> Since PipelineStages are serializable, the params must also be >> serializable. We also have to keep the Java API in mind. Introducing a new >> enum Param type may work, but we will have to ensure that Java users can >> use it without dealing with ClassTags (I believe Scala will create new >> types for each possible value in the Enum) and that it can be serialized. >> >> >> >> On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < >> alexander.ula...@hpe.com> wrote: >> >> Dear Spark developers, >> >> >> >> I am currently implementing the Estimator in ML that has a parameter that >> can take several different values that are mutually exclusive. The most >> appropriate type seems to be Scala Enum ( >> http://www.scala-lang.org/api/current/index.html#scala.Enumeration). >> However, the current ML API has the following parameter types: >> >> BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, >> IntParam, LongParam, StringArrayParam >> >> >> >> Should I introduce a new parameter type in ML API that is based on Scala >> Enum? >> >> >> >> Best regards, Alexander >> >> >> > >
Re: Enum parameter in ML
We usually write a Java test suite which exercises the public API (e.g. DCT <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> ). It may be possible to create a sealed trait with singleton concrete instances inside of a serializable companion object, the just introduce a Param[SealedTrait] to the model (e.g. StreamingDecay PR <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). However, this would require Java users to use CompanionObject$.ConcreteInstanceName to access enum values which isn't the prettiest syntax. Another option would just be to use Strings, which although is not type safe does simplify implementation. On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander wrote: > Hi Feynman, > > > > Thank you for suggestion. How can I ensure that there will be no problems > for Java users? (I only use Scala API) > > > > Best regards, Alexander > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Monday, September 14, 2015 5:27 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Enum parameter in ML > > > > Since PipelineStages are serializable, the params must also be > serializable. We also have to keep the Java API in mind. Introducing a new > enum Param type may work, but we will have to ensure that Java users can > use it without dealing with ClassTags (I believe Scala will create new > types for each possible value in the Enum) and that it can be serialized. > > > > On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Dear Spark developers, > > > > I am currently implementing the Estimator in ML that has a parameter that > can take several different values that are mutually exclusive. The most > appropriate type seems to be Scala Enum ( > http://www.scala-lang.org/api/current/index.html#scala.Enumeration). > However, the current ML API has the following parameter types: > > BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, > IntParam, LongParam, StringArrayParam > > > > Should I introduce a new parameter type in ML API that is based on Scala > Enum? > > > > Best regards, Alexander > > >
RE: Enum parameter in ML
Hi Feynman, Thank you for suggestion. How can I ensure that there will be no problems for Java users? (I only use Scala API) Best regards, Alexander From: Feynman Liang [mailto:fli...@databricks.com] Sent: Monday, September 14, 2015 5:27 PM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Enum parameter in ML Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each possible value in the Enum) and that it can be serialized. On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander mailto:alexander.ula...@hpe.com>> wrote: Dear Spark developers, I am currently implementing the Estimator in ML that has a parameter that can take several different values that are mutually exclusive. The most appropriate type seems to be Scala Enum (http://www.scala-lang.org/api/current/index.html#scala.Enumeration). However, the current ML API has the following parameter types: BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, IntParam, LongParam, StringArrayParam Should I introduce a new parameter type in ML API that is based on Scala Enum? Best regards, Alexander
Re: Enum parameter in ML
Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each possible value in the Enum) and that it can be serialized. On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander wrote: > Dear Spark developers, > > > > I am currently implementing the Estimator in ML that has a parameter that > can take several different values that are mutually exclusive. The most > appropriate type seems to be Scala Enum ( > http://www.scala-lang.org/api/current/index.html#scala.Enumeration). > However, the current ML API has the following parameter types: > > BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, > IntParam, LongParam, StringArrayParam > > > > Should I introduce a new parameter type in ML API that is based on Scala > Enum? > > > > Best regards, Alexander >
Enum parameter in ML
Dear Spark developers, I am currently implementing the Estimator in ML that has a parameter that can take several different values that are mutually exclusive. The most appropriate type seems to be Scala Enum (http://www.scala-lang.org/api/current/index.html#scala.Enumeration). However, the current ML API has the following parameter types: BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, IntParam, LongParam, StringArrayParam Should I introduce a new parameter type in ML API that is based on Scala Enum? Best regards, Alexander