[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20633


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r179073826
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
+
+private def getValueFromParams(params: JValue): Seq[(String, JValue)] 
= {
+  params match {
+case JObject(pairs) => pairs
+case _ =>
+  throw new IllegalArgumentException(
+s"Cannot recognize JSON metadata: $metadataJson.")
+  }
+}
+
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
- * is available.
+ * is available. This will look up `params` first, if not existing 
then looking up
+ * `defaultParams`.
  */
 def getParamValue(paramName: String): JValue = {
   implicit val format = DefaultFormats
-  params match {
+
+  // Looking up for `params` first.
+  var pairs = getValueFromParams(params)
+  var foundPairs = pairs.filter { case (pName, jsonValue) =>
+pName == paramName
+  }
+  if (foundPairs.length == 0) {
+// Looking up for `defaultParams` then.
+pairs = getValueFromParams(defaultParams)
+foundPairs = pairs.filter { case (pName, jsonValue) =>
+  pName == paramName
+}
+  }
+  assert(foundPairs.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
+s" ${foundPairs.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
+
+  foundPairs.map(_._2).head
+}
+
+/**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` 
list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. 
This is useful if some
+ *   params don't implement 
[[org.apache.spark.ml.param.Param.jsonDecode()]]
+ *   and need special handling.
+ */
+def getAndSetParams(
+instance: Params,
+skipParams: Option[List[String]] = None): Unit = {
+  setParams(instance, skipParams, isDefault = false)
+
+  // For metadata file prior to Spark 2.4, there is no default section.
+  val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+  if (major >= 2 && minor >= 4) {
--- End diff --

Good catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r179073750
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
   }
 }
 
+object Params {
--- End diff --

Yes. Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r178992553
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
+
+private def getValueFromParams(params: JValue): Seq[(String, JValue)] 
= {
+  params match {
+case JObject(pairs) => pairs
+case _ =>
+  throw new IllegalArgumentException(
+s"Cannot recognize JSON metadata: $metadataJson.")
+  }
+}
+
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
- * is available.
+ * is available. This will look up `params` first, if not existing 
then looking up
+ * `defaultParams`.
  */
 def getParamValue(paramName: String): JValue = {
   implicit val format = DefaultFormats
-  params match {
+
+  // Looking up for `params` first.
+  var pairs = getValueFromParams(params)
+  var foundPairs = pairs.filter { case (pName, jsonValue) =>
+pName == paramName
+  }
+  if (foundPairs.length == 0) {
+// Looking up for `defaultParams` then.
+pairs = getValueFromParams(defaultParams)
+foundPairs = pairs.filter { case (pName, jsonValue) =>
+  pName == paramName
+}
+  }
+  assert(foundPairs.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
+s" ${foundPairs.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
+
+  foundPairs.map(_._2).head
+}
+
+/**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` 
list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. 
This is useful if some
+ *   params don't implement 
[[org.apache.spark.ml.param.Param.jsonDecode()]]
+ *   and need special handling.
+ */
+def getAndSetParams(
+instance: Params,
+skipParams: Option[List[String]] = None): Unit = {
+  setParams(instance, skipParams, isDefault = false)
+
+  // For metadata file prior to Spark 2.4, there is no default section.
+  val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+  if (major >= 2 && minor >= 4) {
--- End diff --

Ah, you're correct. I'll fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r178956696
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
+
+private def getValueFromParams(params: JValue): Seq[(String, JValue)] 
= {
+  params match {
+case JObject(pairs) => pairs
+case _ =>
+  throw new IllegalArgumentException(
+s"Cannot recognize JSON metadata: $metadataJson.")
+  }
+}
+
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
- * is available.
+ * is available. This will look up `params` first, if not existing 
then looking up
+ * `defaultParams`.
  */
 def getParamValue(paramName: String): JValue = {
   implicit val format = DefaultFormats
-  params match {
+
+  // Looking up for `params` first.
+  var pairs = getValueFromParams(params)
+  var foundPairs = pairs.filter { case (pName, jsonValue) =>
+pName == paramName
+  }
+  if (foundPairs.length == 0) {
+// Looking up for `defaultParams` then.
+pairs = getValueFromParams(defaultParams)
+foundPairs = pairs.filter { case (pName, jsonValue) =>
+  pName == paramName
+}
+  }
+  assert(foundPairs.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
+s" ${foundPairs.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
+
+  foundPairs.map(_._2).head
+}
+
+/**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` 
list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. 
This is useful if some
+ *   params don't implement 
[[org.apache.spark.ml.param.Param.jsonDecode()]]
+ *   and need special handling.
+ */
+def getAndSetParams(
+instance: Params,
+skipParams: Option[List[String]] = None): Unit = {
+  setParams(instance, skipParams, isDefault = false)
+
+  // For metadata file prior to Spark 2.4, there is no default section.
+  val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+  if (major >= 2 && minor >= 4) {
--- End diff --

This should be:
```if (major > 2 || (major == 2 && minor >= 4)) {```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r178955058
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
   }
 }
 
+object Params {
--- End diff --

Shall we make this object package private (for cleaner docs)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r178955105
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
*   this method gets called.
* @param value  the default value
*/
-  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+  private[ml] final def setDefault[T](param: Param[T], value: T): 
this.type = {
--- End diff --

Makes sense; I like your solution.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r176310898
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
*   this method gets called.
* @param value  the default value
*/
-  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+  private[ml] final def setDefault[T](param: Param[T], value: T): 
this.type = {
--- End diff --

Because we need to call `setDefault` in `Metadata.setParams`, I can't leave 
it as `protected`. But I think it is important to keep the extensibility. And I 
think we can't make it as `public` too. I add object `Params` and use it to 
help us update default param of a `Params`.



 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175965924
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
@@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with 
MLlibTestSparkContext
 val myParams = new MyParams("my_params")
 testDefaultReadWrite(myParams)
   }
+
+  test("default param shouldn't become user-supplied param after 
persistence") {
+val myParams = new MyParams("my_params")
+myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
+myParams.checkExclusiveParams()
+val loadedMyParams = testDefaultReadWrite(myParams)
+loadedMyParams.checkExclusiveParams()
+assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
+  myParams.getDefault(myParams.intParamWithDefault))
+
+loadedMyParams.set(myParams.intParamWithDefault, 1)
+intercept[SparkException] {
+  loadedMyParams.checkExclusiveParams()
+}
+  }
+
+  test("User-suppiled value for default param should be kept after 
persistence") {
--- End diff --

suppiled -> supplied


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175965272
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
+
+private def getValueFromParams(params: JValue): Seq[(String, JValue)] 
= {
+  params match {
+case JObject(pairs) => pairs
+case _ =>
+  throw new IllegalArgumentException(
+s"Cannot recognize JSON metadata: $metadataJson.")
+  }
+}
+
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
- * is available.
+ * is available. This will look up `params` first, if not existing 
then looking up
+ * `defaultParams`.
  */
 def getParamValue(paramName: String): JValue = {
   implicit val format = DefaultFormats
-  params match {
+
+  // Looking up for `params` first.
+  var pairs = getValueFromParams(params)
+  var foundPairs = pairs.filter { case (pName, jsonValue) =>
+pName == paramName
+  }
+  if (foundPairs.length == 0) {
+// Looking up for `defaultParams` then.
+pairs = getValueFromParams(defaultParams)
+foundPairs = pairs.filter { case (pName, jsonValue) =>
+  pName == paramName
+}
+  }
+  assert(foundPairs.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
+s" ${foundPairs.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
+
+  foundPairs.map(_._2).head
+}
+
+/**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` 
list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. 
This is useful if some
+ *   params don't implement 
[[org.apache.spark.ml.param.Param.jsonDecode()]]
+ *   and need special handling.
+ */
+def getAndSetParams(
+instance: Params,
+skipParams: Option[List[String]] = None): Unit = {
+  setParams(instance, false, skipParams)
+  setParams(instance, true, skipParams)
+}
+
+private def setParams(
+instance: Params,
+isDefault: Boolean,
+skipParams: Option[List[String]]): Unit = {
+  implicit val format = DefaultFormats
+  val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+  val paramsToSet = if (isDefault) defaultParams else params
+  paramsToSet match {
 case JObject(pairs) =>
-  val values = pairs.filter { case (pName, jsonValue) =>
-pName == paramName
-  }.map(_._2)
-  assert(values.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
-s" ${values.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
-  values.head
+  pairs.foreach { case (paramName, jsonValue) =>
+if (skipParams == None || !skipParams.get.contains(paramName)) 
{
+  val param = instance.getParam(paramName)
+  val value = param.jsonDecode(compact(render(jsonValue)))
+  if (isDefault) {
+instance.setDefault(param, value)
+  } else {
+instance.set(param, value)
+  }
+}
+  }
+// For metadata file prior to Spark 2.4, there is no default 
section.
+case JNothing if isDefault && (major == 2 && minor < 4 || major < 
2) =>
--- End diff --

This logic would be simpler if this check were put in the getAndSetParams 
method, which could just skip calling setParams(instance, true, skipParams) for 
Spark 2.3-.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175961795
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
*   this method gets called.
* @param value  the default value
*/
-  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+  private[ml] final def setDefault[T](param: Param[T], value: T): 
this.type = {
--- End diff --

We should leave this as protected.  It's important that 3rd-party libraries 
be able to extend MLlib APIs, and this is one they need.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175966231
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
@@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with 
MLlibTestSparkContext
 val myParams = new MyParams("my_params")
 testDefaultReadWrite(myParams)
   }
+
+  test("default param shouldn't become user-supplied param after 
persistence") {
+val myParams = new MyParams("my_params")
+myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
+myParams.checkExclusiveParams()
+val loadedMyParams = testDefaultReadWrite(myParams)
+loadedMyParams.checkExclusiveParams()
+assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
+  myParams.getDefault(myParams.intParamWithDefault))
+
+loadedMyParams.set(myParams.intParamWithDefault, 1)
+intercept[SparkException] {
+  loadedMyParams.checkExclusiveParams()
+}
+  }
+
+  test("User-suppiled value for default param should be kept after 
persistence") {
+val myParams = new MyParams("my_params")
+myParams.set(myParams.intParamWithDefault, 100)
+val loadedMyParams = testDefaultReadWrite(myParams)
+assert(loadedMyParams.get(myParams.intParamWithDefault).get == 100)
+  }
+
+  test("Read metadata without default field prior to 2.4") {
--- End diff --

Nice!  I like this setup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175963464
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
+
+private def getValueFromParams(params: JValue): Seq[(String, JValue)] 
= {
+  params match {
+case JObject(pairs) => pairs
+case _ =>
+  throw new IllegalArgumentException(
+s"Cannot recognize JSON metadata: $metadataJson.")
+  }
+}
+
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
- * is available.
+ * is available. This will look up `params` first, if not existing 
then looking up
+ * `defaultParams`.
  */
 def getParamValue(paramName: String): JValue = {
   implicit val format = DefaultFormats
-  params match {
+
+  // Looking up for `params` first.
+  var pairs = getValueFromParams(params)
+  var foundPairs = pairs.filter { case (pName, jsonValue) =>
+pName == paramName
+  }
+  if (foundPairs.length == 0) {
+// Looking up for `defaultParams` then.
+pairs = getValueFromParams(defaultParams)
+foundPairs = pairs.filter { case (pName, jsonValue) =>
+  pName == paramName
+}
+  }
+  assert(foundPairs.length == 1, s"Expected one instance of Param 
'$paramName' but found" +
+s" ${foundPairs.length} in JSON Params: " + 
pairs.map(_.toString).mkString(", "))
+
+  foundPairs.map(_._2).head
+}
+
+/**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` 
list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. 
This is useful if some
+ *   params don't implement 
[[org.apache.spark.ml.param.Param.jsonDecode()]]
+ *   and need special handling.
+ */
+def getAndSetParams(
+instance: Params,
+skipParams: Option[List[String]] = None): Unit = {
+  setParams(instance, false, skipParams)
--- End diff --

style nit: It's nice to pass boolean args by name


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r175962188
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -296,14 +297,19 @@ private[ml] object DefaultParamsWriter {
   paramMap: Option[JValue] = None): String = {
 val uid = instance.uid
 val cls = instance.getClass.getName
-val params = 
instance.extractParamMap().toSeq.asInstanceOf[Seq[ParamPair[Any]]]
+val params = instance.paramMap.toSeq
+val defaultParams = instance.defaultParamMap.toSeq
 val jsonParams = paramMap.getOrElse(render(params.map { case 
ParamPair(p, v) =>
   p.name -> parse(p.jsonEncode(v))
 }.toList))
+val jsonDefaultParams = render(defaultParams.map { case ParamPair(p, 
v) =>
+  p.name -> parse(p.jsonEncode(v))
+}.toList)
 val basicMetadata = ("class" -> cls) ~
   ("timestamp" -> System.currentTimeMillis()) ~
   ("sparkVersion" -> sc.version) ~
   ("uid" -> uid) ~
+  ("defaultParamMap" -> jsonDefaultParams) ~
--- End diff --

nit: How about putting this below paramMap since that's nicer for viewing 
the JSON?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r172420910
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
  * is available.
+ *
+ * @param isDefaultParam Whether the given param name is a default 
param. Default is false.
  */
-def getParamValue(paramName: String): JValue = {
+def getParamValue(paramName: String, isDefaultParam: Boolean = false): 
JValue = {
--- End diff --

Sounds good. I will change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r172139633
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
  * is available.
+ *
+ * @param isDefaultParam Whether the given param name is a default 
param. Default is false.
  */
-def getParamValue(paramName: String): JValue = {
+def getParamValue(paramName: String, isDefaultParam: Boolean = false): 
JValue = {
--- End diff --

Should the logic be "lookup params first, if not exist then lookup 
defaultParams" ?
If so, we don't need the `isDefaultParam` param.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-02-17 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20633

[SPARK-23455][ML] Default Params in ML should be saved separately in 
metadata

## What changes were proposed in this pull request?

We save ML's user-supplied params and default params as one entity in 
metadata. During loading the saved models, we set all the loaded params into 
created ML model instances as user-supplied params.

It causes some problems, e.g., if we strictly disallow some params to be 
set at the same time, a default param can fail the param check because it is 
treated as user-supplied param after loading.

The loaded default params should not be set as user-supplied params. We 
should save ML default params separately in metadata.

For backward compatibility, when loading metadata, if it is a metadata file 
from previous Spark, we shouldn't raise error if we can't find the default 
param field.

## How was this patch tested?

Pass existing tests and added tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 save-ml-default-params

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20633


commit 69648d67546b292037d26ef3a282bf26afd4863e
Author: Liang-Chi Hsieh 
Date:   2018-02-17T02:34:11Z

Save default params separately in JSON.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org