[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/17995 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125606191 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -229,6 +238,16 @@ object ParamValidators { def arrayLengthGt[T](lowerBound: Double): Array[T] => Boolean = { (value: Array[T]) => value.length > lowerBound } + + /** Check for value in an allowed set of string values. */ --- End diff -- ```Check for string value in an allowed set of string values in a case-insensitive way.``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125564145 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -187,12 +188,12 @@ class NaiveBayes @Since("1.5.0") ( aggregated.foreach { case (label, (n, sumTermFreqs)) => labelArray(i) = label piArray(i) = math.log(n + lambda) - piLogDenom - val thetaLogDenom = $(modelType) match { + val thetaLogDenom = getModelType.toLowerCase(Locale.ROOT) match { case Multinomial => math.log(sumTermFreqs.values.sum + numFeatures * lambda) case Bernoulli => math.log(n + 2.0 * lambda) case _ => // This should never happen. - throw new UnknownError(s"Invalid modelType: ${$(modelType)}.") + throw new UnknownError(s"Invalid modelType: ${getModelType}.") --- End diff -- ```${getModelType}``` -> ```$getModelType``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125606256 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -136,6 +137,14 @@ private[ml] object Param { s"${this.getClass.getName} must override jsonDecode to support its value type.") } } + + /** Looks for the corresponding case-insensitive string option. */ + def findStringOption(supportedValues: Array[String], value: String): String = { --- End diff -- I think this function is useless and involves extra computing cost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125642693 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -93,8 +93,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas @Since("2.1.0") final val family: Param[String] = new Param(this, "family", "The name of family which is a description of the label distribution to be used in the " + - s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.", -(value: String) => supportedFamilyNames.contains(value.toLowerCase(Locale.ROOT))) + s"model. Supported options: ${supportedFamilyNames.mkString("(", ",", ")")}.", --- End diff -- Please keep the original output format for supported family names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125647745 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -199,7 +199,7 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam require(supportedFamilyAndLinkPairs.contains( Family.fromParams(this) -> Link.fromParams(this)), s"Generalized Linear Regression with ${$(family)} family " + -s"does not support ${$(link)} link function.") +s"does not support ${getLink.toLowerCase(Locale.ROOT)} link function.") --- End diff -- Please revert this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125648566 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -460,7 +463,7 @@ object LinearRegression extends DefaultParamsReadable[LinearRegression] { val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES /** String name for "auto". */ - private[regression] val AUTO = "auto" + private[regression] val Auto = "auto" /** String name for "normal". */ private[regression] val NORMAL = "normal" --- End diff -- ```NORMAL``` -> ```Normal``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125605206 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -83,19 +85,16 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va case Row(rawPrediction: Double, label: Double) => (rawPrediction, label) } val metrics = new BinaryClassificationMetrics(scoreAndLabels) -val metric = $(metricName) match { - case "areaUnderROC" => metrics.areaUnderROC() - case "areaUnderPR" => metrics.areaUnderPR() +val metric = Param.findStringOption(supportedMetricNames, getMetricName) match { + case AreaUnderROC => metrics.areaUnderROC() + case AreaUnderPR => metrics.areaUnderPR() } metrics.unpersist() metric } @Since("1.5.0") - override def isLargerBetter: Boolean = $(metricName) match { -case "areaUnderROC" => true -case "areaUnderPR" => true - } + override def isLargerBetter: Boolean = true --- End diff -- It's better to keep the original style, since we may add new metric which is not comply with ```isLargerBetter```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125605561 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -106,4 +105,13 @@ object BinaryClassificationEvaluator extends DefaultParamsReadable[BinaryClassif @Since("1.6.0") override def load(path: String): BinaryClassificationEvaluator = super.load(path) + + /** String name for `areaUnderROC` metric name. */ + private[spark] val AreaUnderROC: String = "areaUnderROC" --- End diff -- Can we make it ```private[BinaryClassificationEvaluator]```? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123931580 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams extends PredictorParams final val solver: Param[String] = new Param[String](this, "solver", "The solver algorithm for optimization. Supported options: " + s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. (Default l-bfgs)", - ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers)) +(value: String) => MultilayerPerceptronClassifier.supportedSolvers + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- I think it a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123921193 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams extends PredictorParams final val solver: Param[String] = new Param[String](this, "solver", "The solver algorithm for optimization. Supported options: " + s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. (Default l-bfgs)", - ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers)) +(value: String) => MultilayerPerceptronClassifier.supportedSolvers + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- What do you think of adding a new function in ```object ParamValidators``` as ``` def inStringArray(allowed: Array[String]): String => Boolean = { (value: String) => allowed.contains(value.toLowerCase(java.util.Locale.ROOT)) } ``` to facilitate similar check here and other place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -313,7 +313,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val * @group setParam */ @Since("2.0.0") - def setSolver(value: String): this.type = set(solver, value) + def setSolver(value: String): this.type = { +require("irls" == value.toLowerCase(Locale.ROOT), + s"Solver $value was not supported. Supported options: irls") +set(solver, value) + } --- End diff -- Actually we can't do this, since MLlib supports set params via other entrances. Currently we can leave as it is, until we resolved #16028. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919667 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -128,7 +130,8 @@ private[feature] trait ChiSqSelectorParams extends Params final val selectorType = new Param[String](this, "selectorType", "The selector type of the ChisqSelector. " + "Supported options: " + OldChiSqSelector.supportedSelectorTypes.mkString(", "), - ParamValidators.inArray[String](OldChiSqSelector.supportedSelectorTypes)) +(value: String) => OldChiSqSelector.supportedSelectorTypes.map(_.toLowerCase(Locale.ROOT)) --- End diff -- Supported selector types should always be stored as lower case, please update corresponding code snippet in ```mllib.feature.ChiSqSelector``` from: ``` private[spark] val NumTopFeatures: String = "numTopFeatures" .. ``` to ``` private[spark] val NumTopFeatures: String = "numTopFeatures".toLowerCase .. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919494 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -48,10 +50,11 @@ final class RegressionEvaluator @Since("1.4.0") (@Since("1.4.0") override val ui * @group param */ @Since("1.4.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("mse", "rmse", "r2", "mae")) -new Param(this, "metricName", "metric name in evaluation (mse|rmse|r2|mae)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (mse|rmse|r2|mae)", +(value: String) => Array("mse", "rmse", "r2", "mae") --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123888757 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -46,11 +48,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va * @group param */ @Since("1.2.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("areaUnderROC", "areaUnderPR")) -new Param( - this, "metricName", "metric name in evaluation (areaUnderROC|areaUnderPR)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (areaUnderROC|areaUnderPR)", +(value: String) => Array("areaunderroc", "areaunderpr").contains( + value.toLowerCase(Locale.ROOT))) --- End diff -- Could we organize as ``` val AreaUnderROC: String = "areaUnderROC".toLowerCase val AreaUnderPR: String = "areaUnderPR".toLowerCase val supportedMetricNames = Set(AreaUnderROC, AreaUnderPR) ``` in ```object BinaryClassificationEvaluator```? This should be more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -45,7 +47,7 @@ private[feature] trait ImputerParams extends Params with HasInputCols { final val strategy: Param[String] = new Param(this, "strategy", s"strategy for imputation. " + s"If ${Imputer.mean}, then replace missing values using the mean value of the feature. " + s"If ${Imputer.median}, then replace missing values using the median value of the feature.", -ParamValidators.inArray[String](Array(Imputer.mean, Imputer.median))) +(value: String) => Array(Imputer.mean, Imputer.median).contains(value.toLowerCase(Locale.ROOT))) --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -44,12 +46,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") (@Since("1.5.0") overrid * @group param */ @Since("1.5.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("f1", "weightedPrecision", - "weightedRecall", "accuracy")) -new Param(this, "metricName", "metric name in evaluation " + - "(f1|weightedPrecision|weightedRecall|accuracy)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (f1|weightedPrecision|weightedRecall|accuracy)", +(value: String) => Array("f1", "weightedprecision", "weightedrecall", "accuracy") + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920352 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -70,6 +71,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va setDefault(metricName -> "areaUnderROC") + private def getFormattedMetricName = --- End diff -- Is this really necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/17995 [SPARK-20762][ML]Make String Params Case-Insensitive ## What changes were proposed in this pull request? Make String Params Case-Insensitive ## How was this patch tested? existing tests and added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark str_get_set Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17995.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17995 commit cecc96826a393a6729488bbb9ca4a902a8422279 Author: Zheng RuiFengDate: 2017-05-15T09:28:37Z create pr commit 3554601b25a7a37ef73d66fcfd8eb2d9b85ec802 Author: Zheng RuiFeng Date: 2017-05-15T10:17:41Z fix glr commit 2c184f0843b14f3edf76138f725c121389a688e6 Author: Zheng RuiFeng Date: 2017-05-15T11:19:06Z update tests commit ce93b9ef7a492be669c53d1c9cd44a4d4a86df60 Author: Zheng RuiFeng Date: 2017-05-16T05:32:04Z create pr x commit 982220130494cf05081bd9c285e1ce582e4c7028 Author: Zheng RuiFeng Date: 2017-05-16T05:59:54Z fix some nits commit abac9042287c25bdaf6b9810b1d94d318c7acfd9 Author: Zheng RuiFeng Date: 2017-05-16T06:02:39Z fix some nits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org