[ https://issues.apache.org/jira/browse/SPARK-26172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-26172. ------------------------------- Resolution: Won't Fix > Unify String Params' case-insensitivity in ML > --------------------------------------------- > > Key: SPARK-26172 > URL: https://issues.apache.org/jira/browse/SPARK-26172 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Priority: Major > > For now, there are three ways to deal with case-insensitivity in ML: > 1, support case-insensitivity, e.g. {{LogisticRegression}}; > 2, support case-insensitivity, but with getter returning the lower case value > (not the value passed to setter), e.g. {{ALS}},{{DecisionTreeClassifier}}; > 3, do not support case-insensitivity, e.g. {{NaiveBayes}} > > This situation result in confusion in usage. > I think we should choose the *first* way to support case-insensitivity of all > non-columnName string params, including: > * LogisticRegression: family > * MultilayerPerceptronClassifier: {{solver}} > * NaiveBayes: modelType > * DecisionTreeClassifier: impurity > * RandomForestClassifier: featureSubsetStrategy, impurity > * GBTClassifier: featureSubsetStrategy, impurity, {{lossType}} > * {{}} > * LinearRegression: solver, loss > * GeneralizedLinearRegression: family, link, solver > * DecisionTreeRegressor: impurity > * RandomForestRegressor: featureSubsetStrategy, impurity > * GBTRegressor: featureSubsetStrategy, impurity, {{lossType}} > * {{}} > * {\{KMeans: }}initMode > * LDA: optimizer > * PowerIterationClustering\{{: }}initMode > * > * ALS: coldStartStrategy, intermediateStorageLevel, finalStorageLevel > * > * Bucketizer: handleInvalid > * ChiSqSelector: selectorType > * Imputer: strategy > * QuantileDiscretizer: handleInvalid > * RFormula: handleInvalid, stringIndexerOrderType > * StringIndexer: handleInvalid, stringOrderType > * VectorAssembler: handleInvalid > * VectorIndexer: handleInvalid > * VectorSizeHint: handleInvalid > * OneHotEncoderEstimator: handleInvalid (*this will be let alone until the > breaking change*) > * > * BinaryClassificationEvaluator: metricName > * MulticlassClassificationEvaluator: metricName > * RegressionEvaluator: metricName > * ClusteringEvaluator: metricName, distanceMeasure > > > > To to this: > * methods {{lowerCaseInArray}} and {{upperCaseInArray}} are created in > {{ParamValidators}} to check case-insensitivity; > * methods {{{{$$(param: Param[String])}}}} and {{%%(param: Param[String])}} > are created in trait {{Params}} to lower/upper the param value conveniently, > and this can minimize the modifications in existing codes, since in many > cases we only need to change {{$(param)}} to {{$$\{param}}}; > * in *SharedParamsCodeGen*, *handleInvalid* and *{{distanceMeasure}}* are > updated to use lowerCaseInArray > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org