[ 
https://issues.apache.org/jira/browse/SPARK-26172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26172.
-------------------------------
    Resolution: Won't Fix

> Unify String Params' case-insensitivity in ML
> ---------------------------------------------
>
>                 Key: SPARK-26172
>                 URL: https://issues.apache.org/jira/browse/SPARK-26172
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.0.0
>            Reporter: zhengruifeng
>            Priority: Major
>
> For now, there are three ways to deal with case-insensitivity in ML:
> 1, support case-insensitivity, e.g. {{LogisticRegression}};
> 2, support case-insensitivity, but with getter returning the lower case value 
> (not the value passed to setter), e.g. {{ALS}},{{DecisionTreeClassifier}};
> 3, do not support case-insensitivity, e.g. {{NaiveBayes}}
>  
> This situation result in confusion in usage. 
> I think we should choose the *first* way to support case-insensitivity of all 
> non-columnName string params, including:
>  * LogisticRegression: family
>  * MultilayerPerceptronClassifier: {{solver}}
>  * NaiveBayes: modelType
>  * DecisionTreeClassifier: impurity
>  * RandomForestClassifier: featureSubsetStrategy, impurity
>  * GBTClassifier: featureSubsetStrategy, impurity, {{lossType}}
>  * {{}}
>  * LinearRegression: solver, loss
>  * GeneralizedLinearRegression: family, link, solver
>  * DecisionTreeRegressor: impurity
>  * RandomForestRegressor: featureSubsetStrategy, impurity
>  * GBTRegressor: featureSubsetStrategy, impurity, {{lossType}}
>  * {{}}
>  * {\{KMeans: }}initMode
>  * LDA: optimizer
>  * PowerIterationClustering\{{: }}initMode
>  * 
>  * ALS: coldStartStrategy, intermediateStorageLevel, finalStorageLevel
>  * 
>  * Bucketizer: handleInvalid
>  * ChiSqSelector: selectorType
>  * Imputer: strategy
>  * QuantileDiscretizer: handleInvalid
>  * RFormula: handleInvalid, stringIndexerOrderType
>  * StringIndexer: handleInvalid, stringOrderType
>  * VectorAssembler: handleInvalid
>  * VectorIndexer: handleInvalid
>  * VectorSizeHint: handleInvalid
>  * OneHotEncoderEstimator: handleInvalid (*this will be let alone until the 
> breaking change*)
>  * 
>  * BinaryClassificationEvaluator: metricName
>  * MulticlassClassificationEvaluator: metricName
>  * RegressionEvaluator: metricName
>  * ClusteringEvaluator: metricName, distanceMeasure
>  
>  
>  
> To to this:
>  * methods {{lowerCaseInArray}} and {{upperCaseInArray}} are created in 
> {{ParamValidators}} to check case-insensitivity;
>  * methods  {{{{$$(param: Param[String])}}}} and {{%%(param: Param[String])}} 
> are created in trait {{Params}} to lower/upper the param value conveniently, 
> and this can minimize the modifications in existing codes, since in many 
> cases we only need to change {{$(param)}} to {{$$\{param}}};
>  * in *SharedParamsCodeGen*, *handleInvalid* and *{{distanceMeasure}}* are 
> updated to use  lowerCaseInArray
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to