Hi Joseph,

Strings sounds reasonable. However, there is no StringParam (only 
StringArrayParam). Should I create a new param type? Also, how can the user get 
all possible values of String parameter?

Best regards, Alexander

From: Joseph Bradley [mailto:jos...@databricks.com]
Sent: Wednesday, September 16, 2015 5:35 PM
To: Feynman Liang
Cc: Ulanov, Alexander; dev@spark.apache.org
Subject: Re: Enum parameter in ML

I've tended to use Strings.  Params can be created with a validator (isValid) 
which can ensure users get an immediate error if they try to pass an 
unsupported String.  Not as nice as compile-time errors, but easier on the APIs.

On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang 
<fli...@databricks.com<mailto:fli...@databricks.com>> wrote:
We usually write a Java test suite which exercises the public API (e.g. 
DCT<https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71>).

It may be possible to create a sealed trait with singleton concrete instances 
inside of a serializable companion object, the just introduce a 
Param[SealedTrait] to the model (e.g. StreamingDecay 
PR<https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>).
 However, this would require Java users to use 
CompanionObject$.ConcreteInstanceName to access enum values which isn't the 
prettiest syntax.

Another option would just be to use Strings, which although is not type safe 
does simplify implementation.

On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander 
<alexander.ula...@hpe.com<mailto:alexander.ula...@hpe.com>> wrote:
Hi Feynman,

Thank you for suggestion. How can I ensure that there will be no problems for 
Java users? (I only use Scala API)

Best regards, Alexander

From: Feynman Liang [mailto:fli...@databricks.com<mailto:fli...@databricks.com>]
Sent: Monday, September 14, 2015 5:27 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Re: Enum parameter in ML

Since PipelineStages are serializable, the params must also be serializable. We 
also have to keep the Java API in mind. Introducing a new enum Param type may 
work, but we will have to ensure that Java users can use it without dealing 
with ClassTags (I believe Scala will create new types for each possible value 
in the Enum) and that it can be serialized.

On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander 
<alexander.ula...@hpe.com<mailto:alexander.ula...@hpe.com>> wrote:
Dear Spark developers,

I am currently implementing the Estimator in ML that has a parameter that can 
take several different values that are mutually exclusive. The most appropriate 
type seems to be Scala Enum 
(http://www.scala-lang.org/api/current/index.html#scala.Enumeration). However, 
the current ML API has the following parameter types:
BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, 
IntParam, LongParam, StringArrayParam

Should I introduce a new parameter type in ML API that is based on Scala Enum?

Best regards, Alexander



Reply via email to