[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92594483
  
Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5431


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92565573
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30212/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92565550
  
  [Test build #30212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30212/consoleFull)
 for   PR 5431 at commit 
[`d19236d`](https://github.com/apache/spark/commit/d19236dd12d57ced30374e2b1ac22b720b5c42d8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92552866
  
  [Test build #30210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30210/consoleFull)
 for   PR 5431 at commit 
[`26ae2d7`](https://github.com/apache/spark/commit/26ae2d7898fa598e6fef8a91721992f1d7c793e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92552876
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30210/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92549813
  
  [Test build #30212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30212/consoleFull)
 for   PR 5431 at commit 
[`d19236d`](https://github.com/apache/spark/commit/d19236dd12d57ced30374e2b1ac22b720b5c42d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92548691
  
LGTM once tests pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92544002
  
  [Test build #30210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30210/consoleFull)
 for   PR 5431 at commit 
[`26ae2d7`](https://github.com/apache/spark/commit/26ae2d7898fa598e6fef8a91721992f1d7c793e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28291065
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
--- End diff --

OK, that sounds fine, including making clear() non-public


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290995
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -55,58 +49,42 @@ class Param[T] (
*/
   def ->(value: T): ParamPair[T] = ParamPair(this, value)
 
-  override def toString: String = {
-if (defaultValue.isDefined) {
-  s"$name: $doc (default: ${defaultValue.get})"
-} else {
-  s"$name: $doc"
-}
-  }
+  override def toString: String = s"$name: $doc"
--- End diff --

I like having the default.  Your call on the current value; that might 
actually be confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290778
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -325,7 +379,7 @@ class ParamMap private[ml] (private val map: 
mutable.Map[Param[Any], Any]) exten
   }
 
   /**
-   * Make a copy of this param map.
+   * Creates a copy of this param map.
*/
   def copy: ParamMap = new ParamMap(map.clone())
--- End diff --

`()` is used if it changes the content of the instance or triggers IO. 
Another exception is there are overridden methods and we have to use `()` in 
Scala. For this case, I think `copy` without `()` is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290818
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala 
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.param.shared
+
+import java.io.PrintWriter
+
+import scala.reflect.ClassTag
+
+/**
+ * Code generator for shared params (sharedParams.scala). Run under the 
Spark folder with
+ * {{{
+ *   build/sbt "mllib/runMain 
org.apache.spark.ml.param.shared.SharedParamsCodeGen"
+ * }}}
+ */
+private[shared] object SharedParamsCodeGen {
+
+  def main(args: Array[String]): Unit = {
+val params = Seq(
+  ParamDesc[Double]("regParam", "regularization parameter"),
+  ParamDesc[Int]("maxIter", "max number of iterations"),
+  ParamDesc[String]("featuresCol", "features column name", 
Some("\"features\"")),
+  ParamDesc[String]("labelCol", "label column name", 
Some("\"label\"")),
+  ParamDesc[String]("predictionCol", "prediction column name", 
Some("\"prediction\"")),
+  ParamDesc[String]("rawPredictionCol", "raw prediction (a.k.a. 
confidence) column name",
+Some("\"rawPrediction\"")),
+  ParamDesc[String]("probabilityCol",
+"column name for predicted class conditional probabilities", 
Some("\"probability\"")),
+  ParamDesc[Double]("threshold", "threshold in prediction"),
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290695
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
--- End diff --

I made this change in my last commit. `set` cannot be public because 
`Model` also extends `Params` and some params should be immutable in models. 
Maybe `clear` shouldn't be public either. But for get, getDefault, and 
getOrDefault, I think those could be public.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290532
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
*/
-  protected val paramMap: ParamMap = ParamMap.empty
+  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+shouldOwn(param)
+defaultParamMap.put(param, value)
+this
+  }
 
   /**
-   * Check whether the given schema contains an input column.
-   * @param colName  Input column name
-   * @param dataType  Input column DataType
+   * Sets default values. Make sure that the input params are initialized 
before this gets called.
*/
-  protected def checkInputColumn(schema: StructType, colName: String, 
dataType: DataType): Unit = {
-val actualDataType = schema(colName).dataType
-require(actualDataType.equals(dataType), s"Input column $colName must 
be of type $dataType" +
-  s" but was actually $actualDataType.  Column param description: 
${getParam(colName)}")
+  protected final def setDefault(paramPairs: ParamPair[_]*): this.type = {
+paramPairs.foreach { p =>
+  setDefault(p.param.asInstanceOf[Param[Any]], p.value)
+}
+this
   }
 
   /**
-   * Add an output column to the given schema.
-   * This fails if the given output column already exists.
-   * @param schema  Initial schema (not modified)
-   * @param colName  Output column name.  If this column name is an empy 
String "", this method
-   * returns the initial schema, unchanged.  This allows 
users to disable output
-   * columns.
-   * @param dataType  Output column DataType
-   */
-  protected def addOutputColumn(
-  schema: StructType,
-  colName: String,
-  dataType: DataType): StructType = {
-if (colName.length == 0) return schema
-val fieldNames = schema.fieldNames
-require(!fieldNames.contains(colName), s"Output column $colName 
already exists.")
-val outputFields = schema.fields ++ Seq(StructField(colName, dataType, 
nullable = false))
-StructType(outputFields)
+   * Gets the default value of a parameter.
+   */
+  final def getDefault[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+defaultParamMap.get(param)
+  }
+
+  /**
+   * Tests whether the input param has a default value set.
+   */
+  final def hasDefault[T](param: Param[T]): Boolean = {
+shouldOwn(param)
+defaultParamMap.contains(param)
+  }
+
+  /**
+   * Extracts the embedded default param values and user-supplied values, 
and then merges them with
+   * extra values from input into a flat param map, where the latter value 
is used if there exist
+   * conflicts.
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290414
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28290419
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
*/
-  protected val paramMap: ParamMap = ParamMap.empty
+  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+shouldOwn(param)
+defaultParamMap.put(param, value)
+this
+  }
 
   /**
-   * Check whether the given schema contains an input column.
-   * @param colName  Input column name
-   * @param dataType  Input column DataType
+   * Sets default values. Make sure that the input params are initialized 
before this gets called.
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28289791
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -55,58 +49,42 @@ class Param[T] (
*/
   def ->(value: T): ParamPair[T] = ParamPair(this, value)
 
-  override def toString: String = {
-if (defaultValue.isDefined) {
-  s"$name: $doc (default: ${defaultValue.get})"
-} else {
-  s"$name: $doc"
-}
-  }
+  override def toString: String = s"$name: $doc"
--- End diff --

I thought about this option but I'm not sure whether we should put default 
value and current value in `Param.toString` because they are stored in the 
parent. But no strong preference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28289673
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala 
---
@@ -78,23 +81,42 @@ class ParamsSuite extends FunSuite {
   }
 
   test("params") {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28289697
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -17,15 +17,16 @@
 
 package org.apache.spark.ml.classification
 
-import org.apache.spark.annotation.{DeveloperApi, AlphaComponent}
+import org.apache.spark.annotation.{AlphaComponent, DeveloperApi}
 import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor, 
PredictorParams}
-import org.apache.spark.ml.param.{Params, ParamMap, HasRawPredictionCol}
+import org.apache.spark.ml.param.{ParamMap, Params}
+import org.apache.spark.ml.param.shared.HasRawPredictionCol
+import org.apache.spark.ml.util.SchemaUtils
 import org.apache.spark.mllib.linalg.{Vector, VectorUDT}
-import org.apache.spark.sql.functions._
 import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.functions._
 import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
 
-
--- End diff --

No strong preference on this one:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92513221
  
@mengxr I added minor comment but don't have major ones.  That's it for now!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28283755
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala 
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.param.shared
+
+import java.io.PrintWriter
+
+import scala.reflect.ClassTag
+
+/**
+ * Code generator for shared params (sharedParams.scala). Run under the 
Spark folder with
+ * {{{
+ *   build/sbt "mllib/runMain 
org.apache.spark.ml.param.shared.SharedParamsCodeGen"
+ * }}}
+ */
+private[shared] object SharedParamsCodeGen {
+
+  def main(args: Array[String]): Unit = {
+val params = Seq(
+  ParamDesc[Double]("regParam", "regularization parameter"),
+  ParamDesc[Int]("maxIter", "max number of iterations"),
+  ParamDesc[String]("featuresCol", "features column name", 
Some("\"features\"")),
+  ParamDesc[String]("labelCol", "label column name", 
Some("\"label\"")),
+  ParamDesc[String]("predictionCol", "prediction column name", 
Some("\"prediction\"")),
+  ParamDesc[String]("rawPredictionCol", "raw prediction (a.k.a. 
confidence) column name",
+Some("\"rawPrediction\"")),
+  ParamDesc[String]("probabilityCol",
+"column name for predicted class conditional probabilities", 
Some("\"probability\"")),
+  ParamDesc[Double]("threshold", "threshold in prediction"),
--- End diff --

"threshold in prediction" --> "threshold in binary classification"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28283689
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -325,7 +379,7 @@ class ParamMap private[ml] (private val map: 
mutable.Map[Param[Any], Any]) exten
   }
 
   /**
-   * Make a copy of this param map.
+   * Creates a copy of this param map.
*/
   def copy: ParamMap = new ParamMap(map.clone())
--- End diff --

Do we like copy methods to use parentheses or not?  I think we're not 
completely consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28283381
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
--- End diff --

Why is get public and set private?  Shouldn't both be public (which might 
be handy for programmatic setting of parameters, such as from config files) or 
both be private (since implementing classes will provide getters/setters 
anyways)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282819
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
*/
-  protected val paramMap: ParamMap = ParamMap.empty
+  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+shouldOwn(param)
+defaultParamMap.put(param, value)
+this
+  }
 
   /**
-   * Check whether the given schema contains an input column.
-   * @param colName  Input column name
-   * @param dataType  Input column DataType
+   * Sets default values. Make sure that the input params are initialized 
before this gets called.
*/
-  protected def checkInputColumn(schema: StructType, colName: String, 
dataType: DataType): Unit = {
-val actualDataType = schema(colName).dataType
-require(actualDataType.equals(dataType), s"Input column $colName must 
be of type $dataType" +
-  s" but was actually $actualDataType.  Column param description: 
${getParam(colName)}")
+  protected final def setDefault(paramPairs: ParamPair[_]*): this.type = {
+paramPairs.foreach { p =>
+  setDefault(p.param.asInstanceOf[Param[Any]], p.value)
+}
+this
   }
 
   /**
-   * Add an output column to the given schema.
-   * This fails if the given output column already exists.
-   * @param schema  Initial schema (not modified)
-   * @param colName  Output column name.  If this column name is an empy 
String "", this method
-   * returns the initial schema, unchanged.  This allows 
users to disable output
-   * columns.
-   * @param dataType  Output column DataType
-   */
-  protected def addOutputColumn(
-  schema: StructType,
-  colName: String,
-  dataType: DataType): StructType = {
-if (colName.length == 0) return schema
-val fieldNames = schema.fieldNames
-require(!fieldNames.contains(colName), s"Output column $colName 
already exists.")
-val outputFields = schema.fields ++ Seq(StructField(colName, dataType, 
nullable = false))
-StructType(outputFields)
+   * Gets the default value of a parameter.
+   */
+  final def getDefault[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+defaultParamMap.get(param)
+  }
+
+  /**
+   * Tests whether the input param has a default value set.
+   */
+  final def hasDefault[T](param: Param[T]): Boolean = {
+shouldOwn(param)
+defaultParamMap.contains(param)
+  }
+
+  /**
+   * Extracts the embedded default param values and user-supplied values, 
and then merges them with
+   * extra values from input into a flat param map, where the latter value 
is used if there exist
+   * conflicts.
--- End diff --

This is a bit confusing.  I like specifying the priority ordering:
```
Priority ordering: default params < user-supplied params < extraParamMap
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-un

[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282538
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -55,58 +49,42 @@ class Param[T] (
*/
   def ->(value: T): ParamPair[T] = ParamPair(this, value)
 
-  override def toString: String = {
-if (defaultValue.isDefined) {
-  s"$name: $doc (default: ${defaultValue.get})"
-} else {
-  s"$name: $doc"
-}
-  }
+  override def toString: String = s"$name: $doc"
--- End diff --

Why not print default using parent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282519
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -17,15 +17,16 @@
 
 package org.apache.spark.ml.classification
 
-import org.apache.spark.annotation.{DeveloperApi, AlphaComponent}
+import org.apache.spark.annotation.{AlphaComponent, DeveloperApi}
 import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor, 
PredictorParams}
-import org.apache.spark.ml.param.{Params, ParamMap, HasRawPredictionCol}
+import org.apache.spark.ml.param.{ParamMap, Params}
+import org.apache.spark.ml.param.shared.HasRawPredictionCol
+import org.apache.spark.ml.util.SchemaUtils
 import org.apache.spark.mllib.linalg.{Vector, VectorUDT}
-import org.apache.spark.sql.functions._
 import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.functions._
 import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
 
-
--- End diff --

People's opinions about this differ  : )   (See the DataFrames PR)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282554
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
*/
-  protected val paramMap: ParamMap = ParamMap.empty
+  protected final def setDefault[T](param: Param[T], value: T): this.type 
= {
+shouldOwn(param)
+defaultParamMap.put(param, value)
+this
+  }
 
   /**
-   * Check whether the given schema contains an input column.
-   * @param colName  Input column name
-   * @param dataType  Input column DataType
+   * Sets default values. Make sure that the input params are initialized 
before this gets called.
--- End diff --

ditto: move to parameter-specific doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282550
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -179,52 +179,96 @@ trait Params extends Identifiable with Serializable {
   /**
* Sets a parameter (by name) in the embedded param map.
*/
-  private[ml] def set(param: String, value: Any): this.type = {
+  protected final def set(param: String, value: Any): this.type = {
 set(getParam(param), value)
   }
 
   /**
-   * Gets the value of a parameter in the embedded param map.
+   * Optionally returns the user-supplied value of a param.
+   */
+  final def get[T](param: Param[T]): Option[T] = {
+shouldOwn(param)
+paramMap.get(param)
+  }
+
+  /**
+   * Clears the user-supplied value for the input param.
+   */
+  final def clear(param: Param[_]): this.type = {
+shouldOwn(param)
+paramMap.remove(param)
+this
+  }
+
+  /**
+   * Gets the value of a param in the embedded param map or its default 
value. Throws an exception
+   * if neither is set.
*/
-  protected def get[T](param: Param[T]): T = {
-require(param.parent.eq(this))
-paramMap(param)
+  final def getOrDefault[T](param: Param[T]): T = {
+shouldOwn(param)
+get(param).orElse(getDefault(param)).get
   }
 
   /**
-   * Internal param map.
+   * Sets a default value. Make sure that the input param is initialized 
before this gets called.
--- End diff --

"Make sure that the input param is initialized before this gets called." 
will be unclear to new developers.  Maybe move this to parameter-specific doc:
```
@param param:  Make sure that this param is initialized before this method 
gets called.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282562
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala 
---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.sql.types.{DataType, StructField, StructType}
+
+/**
+ * :: DeveloperApi ::
+ * Utils for handling schemas.
+ */
+@DeveloperApi
+object SchemaUtils {
+
+  // TODO: Move the utility methods to SQL.
+
+  /**
+   * Check whether the given schema contains an column of the required 
data type.
+   * @param colName  column name
+   * @param dataType  required column data type
+   */
+  def checkColumnType(schema: StructType, colName: String, dataType: 
DataType): Unit = {
+val actualDataType = schema(colName).dataType
+require(actualDataType.equals(dataType),
+  s"Column $colName must be of type $dataType but was actually 
$actualDataType.")
+  }
+
+  /**
+   * Appends a new column to the input schema. This fails if the given 
output column already exists.
+   * @param schema input schema
+   * @param colName new column name. If this column name is en empty 
string "", this method returns
--- End diff --

typo: "en" -> "an"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282559
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala 
---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.sql.types.{DataType, StructField, StructType}
+
+/**
+ * :: DeveloperApi ::
+ * Utils for handling schemas.
+ */
+@DeveloperApi
+object SchemaUtils {
+
+  // TODO: Move the utility methods to SQL.
+
+  /**
+   * Check whether the given schema contains an column of the required 
data type.
--- End diff --

typo: "an" --> "a"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5431#discussion_r28282508
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala 
---
@@ -78,23 +81,42 @@ class ParamsSuite extends FunSuite {
   }
 
   test("params") {
--- End diff --

Add test for "params.clear"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92509721
  
Should this method in Params be made abstract?
```
def validate(paramMap: ParamMap): Unit = {}
```
I just realized we haven't been using it, and making it abstract would 
force users to think about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92255656
  
  [Test build #30150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30150/consoleFull)
 for   PR 5431 at commit 
[`eec2264`](https://github.com/apache/spark/commit/eec2264ef8e49a0be5f2b7539c8bba6e4279c01a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92255674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30150/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5957][ML] better handling of parameters

2015-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5431#issuecomment-92230105
  
  [Test build #30150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30150/consoleFull)
 for   PR 5431 at commit 
[`eec2264`](https://github.com/apache/spark/commit/eec2264ef8e49a0be5f2b7539c8bba6e4279c01a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org