[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20633 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r179073826 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { + +private def getValueFromParams(params: JValue): Seq[(String, JValue)] = { + params match { +case JObject(pairs) => pairs +case _ => + throw new IllegalArgumentException( +s"Cannot recognize JSON metadata: $metadataJson.") + } +} + /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` - * is available. + * is available. This will look up `params` first, if not existing then looking up + * `defaultParams`. */ def getParamValue(paramName: String): JValue = { implicit val format = DefaultFormats - params match { + + // Looking up for `params` first. + var pairs = getValueFromParams(params) + var foundPairs = pairs.filter { case (pName, jsonValue) => +pName == paramName + } + if (foundPairs.length == 0) { +// Looking up for `defaultParams` then. +pairs = getValueFromParams(defaultParams) +foundPairs = pairs.filter { case (pName, jsonValue) => + pName == paramName +} + } + assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" + +s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) + + foundPairs.map(_._2).head +} + +/** + * Extract Params from metadata, and set them in the instance. + * This works if all Params (except params included by `skipParams` list) implement + * [[org.apache.spark.ml.param.Param.jsonDecode()]]. + * + * @param skipParams The params included in `skipParams` won't be set. This is useful if some + * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]] + * and need special handling. + */ +def getAndSetParams( +instance: Params, +skipParams: Option[List[String]] = None): Unit = { + setParams(instance, skipParams, isDefault = false) + + // For metadata file prior to Spark 2.4, there is no default section. + val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion) + if (major >= 2 && minor >= 4) { --- End diff -- Good catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r179073750 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable { } } +object Params { --- End diff -- Yes. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178992553 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { + +private def getValueFromParams(params: JValue): Seq[(String, JValue)] = { + params match { +case JObject(pairs) => pairs +case _ => + throw new IllegalArgumentException( +s"Cannot recognize JSON metadata: $metadataJson.") + } +} + /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` - * is available. + * is available. This will look up `params` first, if not existing then looking up + * `defaultParams`. */ def getParamValue(paramName: String): JValue = { implicit val format = DefaultFormats - params match { + + // Looking up for `params` first. + var pairs = getValueFromParams(params) + var foundPairs = pairs.filter { case (pName, jsonValue) => +pName == paramName + } + if (foundPairs.length == 0) { +// Looking up for `defaultParams` then. +pairs = getValueFromParams(defaultParams) +foundPairs = pairs.filter { case (pName, jsonValue) => + pName == paramName +} + } + assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" + +s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) + + foundPairs.map(_._2).head +} + +/** + * Extract Params from metadata, and set them in the instance. + * This works if all Params (except params included by `skipParams` list) implement + * [[org.apache.spark.ml.param.Param.jsonDecode()]]. + * + * @param skipParams The params included in `skipParams` won't be set. This is useful if some + * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]] + * and need special handling. + */ +def getAndSetParams( +instance: Params, +skipParams: Option[List[String]] = None): Unit = { + setParams(instance, skipParams, isDefault = false) + + // For metadata file prior to Spark 2.4, there is no default section. + val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion) + if (major >= 2 && minor >= 4) { --- End diff -- Ah, you're correct. I'll fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178956696 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { + +private def getValueFromParams(params: JValue): Seq[(String, JValue)] = { + params match { +case JObject(pairs) => pairs +case _ => + throw new IllegalArgumentException( +s"Cannot recognize JSON metadata: $metadataJson.") + } +} + /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` - * is available. + * is available. This will look up `params` first, if not existing then looking up + * `defaultParams`. */ def getParamValue(paramName: String): JValue = { implicit val format = DefaultFormats - params match { + + // Looking up for `params` first. + var pairs = getValueFromParams(params) + var foundPairs = pairs.filter { case (pName, jsonValue) => +pName == paramName + } + if (foundPairs.length == 0) { +// Looking up for `defaultParams` then. +pairs = getValueFromParams(defaultParams) +foundPairs = pairs.filter { case (pName, jsonValue) => + pName == paramName +} + } + assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" + +s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) + + foundPairs.map(_._2).head +} + +/** + * Extract Params from metadata, and set them in the instance. + * This works if all Params (except params included by `skipParams` list) implement + * [[org.apache.spark.ml.param.Param.jsonDecode()]]. + * + * @param skipParams The params included in `skipParams` won't be set. This is useful if some + * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]] + * and need special handling. + */ +def getAndSetParams( +instance: Params, +skipParams: Option[List[String]] = None): Unit = { + setParams(instance, skipParams, isDefault = false) + + // For metadata file prior to Spark 2.4, there is no default section. + val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion) + if (major >= 2 && minor >= 4) { --- End diff -- This should be: ```if (major > 2 || (major == 2 && minor >= 4)) {``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178955058 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable { } } +object Params { --- End diff -- Shall we make this object package private (for cleaner docs) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178955105 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable { * this method gets called. * @param value the default value */ - protected final def setDefault[T](param: Param[T], value: T): this.type = { + private[ml] final def setDefault[T](param: Param[T], value: T): this.type = { --- End diff -- Makes sense; I like your solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r176310898 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable { * this method gets called. * @param value the default value */ - protected final def setDefault[T](param: Param[T], value: T): this.type = { + private[ml] final def setDefault[T](param: Param[T], value: T): this.type = { --- End diff -- Because we need to call `setDefault` in `Metadata.setParams`, I can't leave it as `protected`. But I think it is important to keep the extensibility. And I think we can't make it as `public` too. I add object `Params` and use it to help us update default param of a `Params`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175965924 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext val myParams = new MyParams("my_params") testDefaultReadWrite(myParams) } + + test("default param shouldn't become user-supplied param after persistence") { +val myParams = new MyParams("my_params") +myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1) +myParams.checkExclusiveParams() +val loadedMyParams = testDefaultReadWrite(myParams) +loadedMyParams.checkExclusiveParams() +assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) == + myParams.getDefault(myParams.intParamWithDefault)) + +loadedMyParams.set(myParams.intParamWithDefault, 1) +intercept[SparkException] { + loadedMyParams.checkExclusiveParams() +} + } + + test("User-suppiled value for default param should be kept after persistence") { --- End diff -- suppiled -> supplied --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175965272 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { + +private def getValueFromParams(params: JValue): Seq[(String, JValue)] = { + params match { +case JObject(pairs) => pairs +case _ => + throw new IllegalArgumentException( +s"Cannot recognize JSON metadata: $metadataJson.") + } +} + /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` - * is available. + * is available. This will look up `params` first, if not existing then looking up + * `defaultParams`. */ def getParamValue(paramName: String): JValue = { implicit val format = DefaultFormats - params match { + + // Looking up for `params` first. + var pairs = getValueFromParams(params) + var foundPairs = pairs.filter { case (pName, jsonValue) => +pName == paramName + } + if (foundPairs.length == 0) { +// Looking up for `defaultParams` then. +pairs = getValueFromParams(defaultParams) +foundPairs = pairs.filter { case (pName, jsonValue) => + pName == paramName +} + } + assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" + +s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) + + foundPairs.map(_._2).head +} + +/** + * Extract Params from metadata, and set them in the instance. + * This works if all Params (except params included by `skipParams` list) implement + * [[org.apache.spark.ml.param.Param.jsonDecode()]]. + * + * @param skipParams The params included in `skipParams` won't be set. This is useful if some + * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]] + * and need special handling. + */ +def getAndSetParams( +instance: Params, +skipParams: Option[List[String]] = None): Unit = { + setParams(instance, false, skipParams) + setParams(instance, true, skipParams) +} + +private def setParams( +instance: Params, +isDefault: Boolean, +skipParams: Option[List[String]]): Unit = { + implicit val format = DefaultFormats + val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion) + val paramsToSet = if (isDefault) defaultParams else params + paramsToSet match { case JObject(pairs) => - val values = pairs.filter { case (pName, jsonValue) => -pName == paramName - }.map(_._2) - assert(values.length == 1, s"Expected one instance of Param '$paramName' but found" + -s" ${values.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) - values.head + pairs.foreach { case (paramName, jsonValue) => +if (skipParams == None || !skipParams.get.contains(paramName)) { + val param = instance.getParam(paramName) + val value = param.jsonDecode(compact(render(jsonValue))) + if (isDefault) { +instance.setDefault(param, value) + } else { +instance.set(param, value) + } +} + } +// For metadata file prior to Spark 2.4, there is no default section. +case JNothing if isDefault && (major == 2 && minor < 4 || major < 2) => --- End diff -- This logic would be simpler if this check were put in the getAndSetParams method, which could just skip calling setParams(instance, true, skipParams) for Spark 2.3-. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175961795 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable { * this method gets called. * @param value the default value */ - protected final def setDefault[T](param: Param[T], value: T): this.type = { + private[ml] final def setDefault[T](param: Param[T], value: T): this.type = { --- End diff -- We should leave this as protected. It's important that 3rd-party libraries be able to extend MLlib APIs, and this is one they need. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175966231 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext val myParams = new MyParams("my_params") testDefaultReadWrite(myParams) } + + test("default param shouldn't become user-supplied param after persistence") { +val myParams = new MyParams("my_params") +myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1) +myParams.checkExclusiveParams() +val loadedMyParams = testDefaultReadWrite(myParams) +loadedMyParams.checkExclusiveParams() +assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) == + myParams.getDefault(myParams.intParamWithDefault)) + +loadedMyParams.set(myParams.intParamWithDefault, 1) +intercept[SparkException] { + loadedMyParams.checkExclusiveParams() +} + } + + test("User-suppiled value for default param should be kept after persistence") { +val myParams = new MyParams("my_params") +myParams.set(myParams.intParamWithDefault, 100) +val loadedMyParams = testDefaultReadWrite(myParams) +assert(loadedMyParams.get(myParams.intParamWithDefault).get == 100) + } + + test("Read metadata without default field prior to 2.4") { --- End diff -- Nice! I like this setup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175963464 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { + +private def getValueFromParams(params: JValue): Seq[(String, JValue)] = { + params match { +case JObject(pairs) => pairs +case _ => + throw new IllegalArgumentException( +s"Cannot recognize JSON metadata: $metadataJson.") + } +} + /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` - * is available. + * is available. This will look up `params` first, if not existing then looking up + * `defaultParams`. */ def getParamValue(paramName: String): JValue = { implicit val format = DefaultFormats - params match { + + // Looking up for `params` first. + var pairs = getValueFromParams(params) + var foundPairs = pairs.filter { case (pName, jsonValue) => +pName == paramName + } + if (foundPairs.length == 0) { +// Looking up for `defaultParams` then. +pairs = getValueFromParams(defaultParams) +foundPairs = pairs.filter { case (pName, jsonValue) => + pName == paramName +} + } + assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" + +s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", ")) + + foundPairs.map(_._2).head +} + +/** + * Extract Params from metadata, and set them in the instance. + * This works if all Params (except params included by `skipParams` list) implement + * [[org.apache.spark.ml.param.Param.jsonDecode()]]. + * + * @param skipParams The params included in `skipParams` won't be set. This is useful if some + * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]] + * and need special handling. + */ +def getAndSetParams( +instance: Params, +skipParams: Option[List[String]] = None): Unit = { + setParams(instance, false, skipParams) --- End diff -- style nit: It's nice to pass boolean args by name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r175962188 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -296,14 +297,19 @@ private[ml] object DefaultParamsWriter { paramMap: Option[JValue] = None): String = { val uid = instance.uid val cls = instance.getClass.getName -val params = instance.extractParamMap().toSeq.asInstanceOf[Seq[ParamPair[Any]]] +val params = instance.paramMap.toSeq +val defaultParams = instance.defaultParamMap.toSeq val jsonParams = paramMap.getOrElse(render(params.map { case ParamPair(p, v) => p.name -> parse(p.jsonEncode(v)) }.toList)) +val jsonDefaultParams = render(defaultParams.map { case ParamPair(p, v) => + p.name -> parse(p.jsonEncode(v)) +}.toList) val basicMetadata = ("class" -> cls) ~ ("timestamp" -> System.currentTimeMillis()) ~ ("sparkVersion" -> sc.version) ~ ("uid" -> uid) ~ + ("defaultParamMap" -> jsonDefaultParams) ~ --- End diff -- nit: How about putting this below paramMap since that's nicer for viewing the JSON? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r172420910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` * is available. + * + * @param isDefaultParam Whether the given param name is a default param. Default is false. */ -def getParamValue(paramName: String): JValue = { +def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = { --- End diff -- Sounds good. I will change this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r172139633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` * is available. + * + * @param isDefaultParam Whether the given param name is a default param. Default is false. */ -def getParamValue(paramName: String): JValue = { +def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = { --- End diff -- Should the logic be "lookup params first, if not exist then lookup defaultParams" ? If so, we don't need the `isDefaultParam` param. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20633 [SPARK-23455][ML] Default Params in ML should be saved separately in metadata ## What changes were proposed in this pull request? We save ML's user-supplied params and default params as one entity in metadata. During loading the saved models, we set all the loaded params into created ML model instances as user-supplied params. It causes some problems, e.g., if we strictly disallow some params to be set at the same time, a default param can fail the param check because it is treated as user-supplied param after loading. The loaded default params should not be set as user-supplied params. We should save ML default params separately in metadata. For backward compatibility, when loading metadata, if it is a metadata file from previous Spark, we shouldn't raise error if we can't find the default param field. ## How was this patch tested? Pass existing tests and added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 save-ml-default-params Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20633.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20633 commit 69648d67546b292037d26ef3a282bf26afd4863e Author: Liang-Chi Hsieh Date: 2018-02-17T02:34:11Z Save default params separately in JSON. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org