[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8830


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-25 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143150875
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40292341
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -70,10 +93,10 @@ class VectorAssembler(override val uid: String)
   val group = AttributeGroup.fromStructField(field)
   if (group.attributes.isDefined) {
 // If attributes are defined, copy them with updated names.
+val prefix = $(groupPrefixes).get(c).getOrElse(c + "_")
 group.attributes.get.map { attr =>
   if (attr.name.isDefined) {
-// TODO: Define a rigorous naming scheme.
--- End diff --

Please keep this TODO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142841176
  
@ericl I think the only issue left is the `groupPrefixes` param in 
`VectorAssembler`. It would be nice to keep the feature name handling under the 
hood.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40292192
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -47,6 +48,28 @@ class VectorAssembler(override val uid: String)
   /** @group setParam */
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /**
+   * By default, the attribute names of vector components will be 
`groupName + '_' + attrName`.
+   * This parameter allows the overriding of the group prefix per input 
column vector.
+   *
+   * @group param Mapping of input vector names to group prefixes. If not 
specified, the group
+   *  prefix for an input vector column will default to 
`groupName + '_'`.
+   * @param groupPrefixes
+   */
+  final val groupPrefixes: Param[Map[String, String]] = new Param(
--- End diff --

This would be a really advanced parameter for users. For this PR, is it 
possible to use existing attribute name generation in `VectorAssembler` and 
rename the generated attributes after? We could assign a unique prefix to new 
columns and then remove this prefix from all attribute names.

Otherwise, there are couple issues with this parameter:
1. It is not Java/Python friendly.
2. It is quite hard to understand.
I don't think those issues should be addressed in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40292247
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaParserSuite.scala ---
@@ -79,4 +87,79 @@ class RFormulaParserSuite extends SparkFunSuite {
 assert(!RFormulaParser.parse("a ~ b - 1").hasIntercept)
 assert(!RFormulaParser.parse("a ~ b + 1 - 1").hasIntercept)
   }
+
+  test("parse interactions") {
+checkParse("y ~ a:b", "y", Seq("a:b"))
+checkParse("y ~ ._a:._x", "y", Seq("._a:._x"))
+checkParse("y ~ foo:bar", "y", Seq("foo:bar"))
+checkParse("y ~ a : b : c", "y", Seq("a:b:c"))
+checkParse("y ~ q + a:b:c + b:c + c:d + z", "y", Seq("q", "a:b:c", 
"b:c", "c:d", "z"))
+  }
+
+  test("parse basic interactions with dot") {
+val schema = (new StructType)
+  .add("a", "int", true)
+  .add("b", "long", false)
+  .add("c", "string", true)
+checkParse("y ~ .:x", "y", Seq("a:x", "b:x", "c:x"), schema)
--- End diff --

This is a little confusing because `x` and `y` do not appear in the schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40292275
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,27 +32,35 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
--- End diff --

minor: `term` -> `col`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143087972
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42994/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143087971
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143085618
  
comment should be addressed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143085679
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143085696
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143094806
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143094803
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143094740
  
  [Test build #42995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/console)
 for   PR 8830 at commit 
[`145569a`](https://github.com/apache/spark/commit/145569a3dfbfc2fe6dab21fac0b1a374d5949081).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143087143
  
  [Test build #42995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/consoleFull)
 for   PR 8830 at commit 
[`145569a`](https://github.com/apache/spark/commit/145569a3dfbfc2fe6dab21fac0b1a374d5949081).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143086286
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-143086273
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046360
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
+includedTerms ++= expandInteraction(schema, terms)
   case Dot =>
-includedTerms ++= simpleTypes(schema).filter(_ != label.value)
-  case ColumnRef(value) =>
-includedTerms :+= value
+includedTerms ++= dotTerms.map(Seq(_))
   case Deletion(term: Term) =>
 term match {
-  case ColumnRef(value) =>
-includedTerms = includedTerms.filter(_ != value)
+  case inner: ColumnRef =>
+includedTerms = includedTerms.filter(_ != Seq(inner.value))
+  case ColumnInteraction(terms) =>
+val fromInteraction = expandInteraction(schema, 
terms).map(_.toSet)
+includedTerms = includedTerms.filter(t => 
!fromInteraction.contains(t.toSet))
   case Dot =>
 // e.g. "- .", which removes all first-order terms
-val fromSchema = simpleTypes(schema)
-includedTerms = includedTerms.filter(fromSchema.contains(_))
+includedTerms = includedTerms.filter {
+  case Seq(t) => !dotTerms.contains(t)
+  case _ => true
+}
   case _: Deletion =>
 assert(false, "Deletion terms cannot be nested")
--- End diff --

* not part of this PR: `throw new RuntimeException(...)`
* also not part of this PR: Shall we move `hasIntercept` to 
`ResolvedRFormula`? It is a little strange to have two places to store the 
fully parsed formula.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046354
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
--- End diff --

`terms` shadows the class member `terms`. `terms` -> `cols`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142159885
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142159870
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047937
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
+includedTerms ++= expandInteraction(schema, terms)
   case Dot =>
-includedTerms ++= simpleTypes(schema).filter(_ != label.value)
-  case ColumnRef(value) =>
-includedTerms :+= value
+includedTerms ++= dotTerms.map(Seq(_))
   case Deletion(term: Term) =>
 term match {
-  case ColumnRef(value) =>
-includedTerms = includedTerms.filter(_ != value)
+  case inner: ColumnRef =>
+includedTerms = includedTerms.filter(_ != Seq(inner.value))
+  case ColumnInteraction(terms) =>
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047935
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142166909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142166873
  
  [Test build #42805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/console)
 for   PR 8830 at commit 
[`41dc78b`](https://github.com/apache/spark/commit/41dc78b740f3e5a31ea3882f5881f04c87b6dd66).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142166910
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046370
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers 
{
   def columnRef: Parser[ColumnRef] =
 "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) }
 
-  def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => 
Dot }
+  def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot }
+
+  def interaction: Parser[List[InteractionComponent]] = repsep(columnRef | 
dot, ":")
--- End diff --

If we want to separate `columnRef` from `interaction`, we should use 
`rep1sep` here and update `term`. This might add several lines of code, but 
make the code easier to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046356
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
+includedTerms ++= expandInteraction(schema, terms)
   case Dot =>
-includedTerms ++= simpleTypes(schema).filter(_ != label.value)
-  case ColumnRef(value) =>
-includedTerms :+= value
+includedTerms ++= dotTerms.map(Seq(_))
   case Deletion(term: Term) =>
 term match {
-  case ColumnRef(value) =>
-includedTerms = includedTerms.filter(_ != value)
+  case inner: ColumnRef =>
+includedTerms = includedTerms.filter(_ != Seq(inner.value))
+  case ColumnInteraction(terms) =>
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046364
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -67,19 +76,52 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
 intercept
   }
 
+  // expands the Dot operators in interaction terms
+  private def expandInteraction(
+  schema: StructType, terms: Seq[InteractionComponent]): 
Seq[Seq[String]] = {
+if (terms.isEmpty) {
+  return Seq(Nil)
+}
+
+val rest = expandInteraction(schema, terms.tail)
+val validInteractions = (terms.head match {
+  case Dot =>
+expandDot(schema).filter(_ != label.value).flatMap { t =>
--- End diff --

remove `.filter(_ != label.value)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046365
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -87,11 +129,17 @@ private[ml] case class ResolvedRFormula(label: String, 
terms: Seq[String])
  */
 private[ml] sealed trait Term
 
+/** A term that may be part of an interaction, e.g. 'x' in 'x:y' */
+private[ml] sealed trait InteractionComponent extends Term
+
 /* R formula reference to all available columns, e.g. "." in a formula */
-private[ml] case object Dot extends Term
+private[ml] case object Dot extends InteractionComponent
 
 /* R formula reference to a column, e.g. "+ Species" in a formula */
-private[ml] case class ColumnRef(value: String) extends Term
+private[ml] case class ColumnRef(value: String) extends 
InteractionComponent
--- End diff --

This makes the implementation easier but code harder to understand, because 
a `ColRef` is not an interaction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40046368
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers 
{
   def columnRef: Parser[ColumnRef] =
 "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) }
 
-  def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => 
Dot }
+  def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot }
--- End diff --

minor: Those could be `val` instead of `def`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142155219
  
I only checked `RFormulaParser.scala`. Need more time to go through the 
rest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-142160070
  
  [Test build #42805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/consoleFull)
 for   PR 8830 at commit 
[`41dc78b`](https://github.com/apache/spark/commit/41dc78b740f3e5a31ea3882f5881f04c87b6dd66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047990
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -87,11 +129,17 @@ private[ml] case class ResolvedRFormula(label: String, 
terms: Seq[String])
  */
 private[ml] sealed trait Term
 
+/** A term that may be part of an interaction, e.g. 'x' in 'x:y' */
+private[ml] sealed trait InteractionComponent extends Term
+
 /* R formula reference to all available columns, e.g. "." in a formula */
-private[ml] case object Dot extends Term
+private[ml] case object Dot extends InteractionComponent
 
 /* R formula reference to a column, e.g. "+ Species" in a formula */
-private[ml] case class ColumnRef(value: String) extends Term
+private[ml] case class ColumnRef(value: String) extends 
InteractionComponent
--- End diff --

I updated the parser so that Interaction doesn't entirely subsume ColRef, 
let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047963
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
* of the special '.' term. Duplicate terms will be removed during 
resolution.
*/
   def resolve(schema: StructType): ResolvedRFormula = {
-var includedTerms = Seq[String]()
+val dotTerms = expandDot(schema)
+var includedTerms = Seq[Seq[String]]()
 terms.foreach {
+  case term: ColumnRef =>
+includedTerms :+= Seq(term.value)
+  case ColumnInteraction(terms) =>
+includedTerms ++= expandInteraction(schema, terms)
   case Dot =>
-includedTerms ++= simpleTypes(schema).filter(_ != label.value)
-  case ColumnRef(value) =>
-includedTerms :+= value
+includedTerms ++= dotTerms.map(Seq(_))
   case Deletion(term: Term) =>
 term match {
-  case ColumnRef(value) =>
-includedTerms = includedTerms.filter(_ != value)
+  case inner: ColumnRef =>
+includedTerms = includedTerms.filter(_ != Seq(inner.value))
+  case ColumnInteraction(terms) =>
+val fromInteraction = expandInteraction(schema, 
terms).map(_.toSet)
+includedTerms = includedTerms.filter(t => 
!fromInteraction.contains(t.toSet))
   case Dot =>
 // e.g. "- .", which removes all first-order terms
-val fromSchema = simpleTypes(schema)
-includedTerms = includedTerms.filter(fromSchema.contains(_))
+includedTerms = includedTerms.filter {
+  case Seq(t) => !dotTerms.contains(t)
+  case _ => true
+}
   case _: Deletion =>
 assert(false, "Deletion terms cannot be nested")
--- End diff --

Done (though, it's not really needed in ResolvedRFormula and a bit awkward 
to extract for use in SparkRWrappers)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047964
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -67,19 +76,52 @@ private[ml] case class ParsedRFormula(label: ColumnRef, 
terms: Seq[Term]) {
 intercept
   }
 
+  // expands the Dot operators in interaction terms
+  private def expandInteraction(
+  schema: StructType, terms: Seq[InteractionComponent]): 
Seq[Seq[String]] = {
+if (terms.isEmpty) {
+  return Seq(Nil)
+}
+
+val rest = expandInteraction(schema, terms.tail)
+val validInteractions = (terms.head match {
+  case Dot =>
+expandDot(schema).filter(_ != label.value).flatMap { t =>
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047995
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers 
{
   def columnRef: Parser[ColumnRef] =
 "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) }
 
-  def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => 
Dot }
+  def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot }
+
+  def interaction: Parser[List[InteractionComponent]] = repsep(columnRef | 
dot, ":")
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/8830#discussion_r40047992
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala ---
@@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers 
{
   def columnRef: Parser[ColumnRef] =
 "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) }
 
-  def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => 
Dot }
+  def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot }
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141606324
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141606323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141602715
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141602769
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141603326
  
  [Test build #42706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/consoleFull)
 for   PR 8830 at commit 
[`15b4da7`](https://github.com/apache/spark/commit/15b4da72a6b6bf7e0e18255dcda7a72db207ba49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141606284
  
  [Test build #42706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/console)
 for   PR 8830 at commit 
[`15b4da7`](https://github.com/apache/spark/commit/15b4da72a6b6bf7e0e18255dcda7a72db207ba49).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/8830

[SPARK-9681] [ML] Support R feature interactions in RFormula

This integrates the Interaction feature transformer with the SparkR R 
formula support (i.e. we support ':' now).


@mengxr 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark interaction-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8830


commit b16795add215a879d6ab461ac53140179e184a73
Author: Eric Liang 
Date:   2015-09-18T21:38:03Z

Squashed commit of the following:

commit ca78c26f9928f0e4a0fb8e0adf38510161178ed1
Author: Eric Liang 
Date:   Fri Sep 18 14:37:00 2015 -0700

Fri Sep 18 14:37:00 PDT 2015

commit 68411c764fd39902088ab893064a6c226b694f94
Author: Eric Liang 
Date:   Fri Sep 18 14:36:15 2015 -0700

doc

commit abb81ebfcb535dba56432aba6ce9fcc60035a6d7
Author: Eric Liang 
Date:   Fri Sep 18 14:18:43 2015 -0700

Fri Sep 18 14:18:43 PDT 2015

commit 97750a6c1e8b7f4f985487d0e58bd4c56ca774cf
Author: Eric Liang 
Date:   Fri Sep 18 14:09:43 2015 -0700

Fri Sep 18 14:09:43 PDT 2015

commit 6518f62557a035d03000575eaeae0c38d3ae4ba4
Author: Eric Liang 
Date:   Fri Sep 18 13:03:26 2015 -0700

Fri Sep 18 13:03:26 PDT 2015

commit 853ab7d7f179d53de5fe5653ff0b677191a5c5f5
Author: Eric Liang 
Date:   Fri Sep 18 12:49:32 2015 -0700

Fri Sep 18 12:49:32 PDT 2015

commit 7ce9c289c822ca132f22f9e709742e8293d275ef
Author: Eric Liang 
Date:   Thu Sep 17 15:42:41 2015 -0700

Thu Sep 17 15:42:41 PDT 2015

commit 09b4e00b87669f785c112e0a70ddf1bbaf02dd34
Merge: e5099f6 4fbf332
Author: Eric Liang 
Date:   Thu Sep 17 15:41:31 2015 -0700

Merge branch 'master' into interaction-2

Conflicts:
mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala

commit e5099f695a9533e402ca15434330e2f3678d30f3
Author: Eric Liang 
Date:   Thu Aug 6 17:21:21 2015 -0700

tests and attribute refactorign

commit 4c11a773e74e677f237f23949eeb9dffa8bc43f2
Author: Eric Liang 
Date:   Wed Aug 5 23:45:35 2015 -0700

small nits

commit 5f7cb9b505e043039898df9860b999d5382c4ae0
Author: Eric Liang 
Date:   Wed Aug 5 23:15:14 2015 -0700

Wed Aug  5 23:15:14 PDT 2015

commit 3ad5464566076570438001714582082b11981479
Author: Eric Liang 
Date:   Wed Aug 5 23:08:34 2015 -0700

docs

commit 478ee8f2e1133901746e68cc202ecfc89de8eaa9
Author: Eric Liang 
Date:   Wed Aug 5 22:57:10 2015 -0700

add rformula test

commit 11bb70fcdd32f390a513fa39a8d8096a60e3d22e
Merge: 2957cb6 d5a9af3
Author: Eric Liang 
Date:   Wed Aug 5 22:20:26 2015 -0700

Merge branch 'master' into interaction

commit 2957cb686264303344c02b96e5ce166a7a66a959
Author: Eric Liang 
Date:   Wed Aug 5 22:19:28 2015 -0700

fix parser

commit dc8801a31cd7806ccd1e4ef902568e1d9bc85e94
Author: Eric Liang 
Date:   Wed Aug 5 19:59:50 2015 -0700

Wed Aug  5 19:59:50 PDT 2015

commit a12e58e58e7a9c44611124d21c6172cb0a4c6b53
Author: Eric Liang 
Date:   Wed Aug 5 15:58:14 2015 -0700

Wed Aug  5 15:58:14 PDT 2015

commit a3623aa6cbb2e248b45b489cfb216c9d87bc7c86
Author: Eric Liang 
Date:   Tue Aug 4 20:32:39 2015 -0700

attribute generation

commit ab2a3477512aba52f69f3ccd0bfe620b2da0cb39
Author: Eric Liang 
Date:   Tue Aug 4 18:14:31 2015 -0700

combiner

commit 0ece16cd2e30403f4fa8c5473e6cb6e7930ae52f
Author: Eric Liang 
Date:   Mon Aug 3 20:29:16 2015 -0700

compiles now

commit 429cb520682dcd446f01e204f178b5c0c932e6cf
Author: Eric Liang 
Date:   Mon Aug 3 18:32:54 2015 -0700

first pass




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the 

[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141576487
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141577509
  
  [Test build #42695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/consoleFull)
 for   PR 8830 at commit 
[`b16795a`](https://github.com/apache/spark/commit/b16795add215a879d6ab461ac53140179e184a73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141584803
  
  [Test build #42695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/console)
 for   PR 8830 at commit 
[`b16795a`](https://github.com/apache/spark/commit/b16795add215a879d6ab461ac53140179e184a73).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141584845
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141576507
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8830#issuecomment-141584844
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128283405
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128284672
  
  [Test build #40011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40011/console)
 for   PR 7987 at commit 
[`4c11a77`](https://github.com/apache/spark/commit/4c11a773e74e677f237f23949eeb9dffa8bc43f2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128547360
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128547373
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128471053
  
@ericl Shall we split this PR into two?

1. Add `Interaction` as a transformer (SPARK-9698).
2. Support feature interaction in RFormula.

After 1) is merged, people can start working on the Python API, without 
being blocked by 2).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128275868
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128275891
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128284709
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128276283
  
  [Test build #40009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40009/consoleFull)
 for   PR 7987 at commit 
[`386881b`](https://github.com/apache/spark/commit/386881ba7c517b4affc90a4e91e84f860c603e11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128277742
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128277768
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/7987

[SPARK-9681] [ML] Support R feature interactions in RFormula

This adds support for the interaction (:) operator to the RFormula 
feature transformer.

Design doc from umbrella task: 
https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit

@mengxr 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark interaction

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7987


commit 429cb520682dcd446f01e204f178b5c0c932e6cf
Author: Eric Liang e...@databricks.com
Date:   2015-08-04T01:32:54Z

first pass

commit 0ece16cd2e30403f4fa8c5473e6cb6e7930ae52f
Author: Eric Liang e...@databricks.com
Date:   2015-08-04T03:29:16Z

compiles now

commit ab2a3477512aba52f69f3ccd0bfe620b2da0cb39
Author: Eric Liang e...@databricks.com
Date:   2015-08-05T01:14:31Z

combiner

commit a3623aa6cbb2e248b45b489cfb216c9d87bc7c86
Author: Eric Liang e...@databricks.com
Date:   2015-08-05T03:32:39Z

attribute generation

commit a12e58e58e7a9c44611124d21c6172cb0a4c6b53
Author: Eric Liang e...@databricks.com
Date:   2015-08-05T22:58:14Z

Wed Aug  5 15:58:14 PDT 2015

commit dc8801a31cd7806ccd1e4ef902568e1d9bc85e94
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T02:59:50Z

Wed Aug  5 19:59:50 PDT 2015

commit 2957cb686264303344c02b96e5ce166a7a66a959
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T05:19:28Z

fix parser

commit 11bb70fcdd32f390a513fa39a8d8096a60e3d22e
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T05:20:26Z

Merge branch 'master' into interaction

commit 478ee8f2e1133901746e68cc202ecfc89de8eaa9
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T05:57:10Z

add rformula test

commit 3ad5464566076570438001714582082b11981479
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T06:08:34Z

docs

commit 5f7cb9b505e043039898df9860b999d5382c4ae0
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T06:15:14Z

Wed Aug  5 23:15:14 PDT 2015

commit e44dd8397440f0e997a5384dc79220ac4ff2ba34
Author: Eric Liang e...@databricks.com
Date:   2015-08-06T06:45:35Z

small nits




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128277888
  
  [Test build #40011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40011/consoleFull)
 for   PR 7987 at commit 
[`4c11a77`](https://github.com/apache/spark/commit/4c11a773e74e677f237f23949eeb9dffa8bc43f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...

2015-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7987#issuecomment-128283369
  
  [Test build #40009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40009/console)
 for   PR 7987 at commit 
[`386881b`](https://github.com/apache/spark/commit/386881ba7c517b4affc90a4e91e84f860c603e11).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RInteraction(override val uid: String) extends 
Estimator[PipelineModel]`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org