[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8830 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143150875 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40292341 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -70,10 +93,10 @@ class VectorAssembler(override val uid: String) val group = AttributeGroup.fromStructField(field) if (group.attributes.isDefined) { // If attributes are defined, copy them with updated names. +val prefix = $(groupPrefixes).get(c).getOrElse(c + "_") group.attributes.get.map { attr => if (attr.name.isDefined) { -// TODO: Define a rigorous naming scheme. --- End diff -- Please keep this TODO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142841176 @ericl I think the only issue left is the `groupPrefixes` param in `VectorAssembler`. It would be nice to keep the feature name handling under the hood. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40292192 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -47,6 +48,28 @@ class VectorAssembler(override val uid: String) /** @group setParam */ def setOutputCol(value: String): this.type = set(outputCol, value) + /** + * By default, the attribute names of vector components will be `groupName + '_' + attrName`. + * This parameter allows the overriding of the group prefix per input column vector. + * + * @group param Mapping of input vector names to group prefixes. If not specified, the group + * prefix for an input vector column will default to `groupName + '_'`. + * @param groupPrefixes + */ + final val groupPrefixes: Param[Map[String, String]] = new Param( --- End diff -- This would be a really advanced parameter for users. For this PR, is it possible to use existing attribute name generation in `VectorAssembler` and rename the generated attributes after? We could assign a unique prefix to new columns and then remove this prefix from all attribute names. Otherwise, there are couple issues with this parameter: 1. It is not Java/Python friendly. 2. It is quite hard to understand. I don't think those issues should be addressed in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40292247 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaParserSuite.scala --- @@ -79,4 +87,79 @@ class RFormulaParserSuite extends SparkFunSuite { assert(!RFormulaParser.parse("a ~ b - 1").hasIntercept) assert(!RFormulaParser.parse("a ~ b + 1 - 1").hasIntercept) } + + test("parse interactions") { +checkParse("y ~ a:b", "y", Seq("a:b")) +checkParse("y ~ ._a:._x", "y", Seq("._a:._x")) +checkParse("y ~ foo:bar", "y", Seq("foo:bar")) +checkParse("y ~ a : b : c", "y", Seq("a:b:c")) +checkParse("y ~ q + a:b:c + b:c + c:d + z", "y", Seq("q", "a:b:c", "b:c", "c:d", "z")) + } + + test("parse basic interactions with dot") { +val schema = (new StructType) + .add("a", "int", true) + .add("b", "long", false) + .add("c", "string", true) +checkParse("y ~ .:x", "y", Seq("a:x", "b:x", "c:x"), schema) --- End diff -- This is a little confusing because `x` and `y` do not appear in the schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40292275 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,27 +32,35 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => --- End diff -- minor: `term` -> `col` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143087972 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42994/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143087971 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143085618 comment should be addressed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143085679 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143085696 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143094806 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143094803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143094740 [Test build #42995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/console) for PR 8830 at commit [`145569a`](https://github.com/apache/spark/commit/145569a3dfbfc2fe6dab21fac0b1a374d5949081). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143087143 [Test build #42995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42995/consoleFull) for PR 8830 at commit [`145569a`](https://github.com/apache/spark/commit/145569a3dfbfc2fe6dab21fac0b1a374d5949081). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143086286 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-143086273 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046360 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => +includedTerms ++= expandInteraction(schema, terms) case Dot => -includedTerms ++= simpleTypes(schema).filter(_ != label.value) - case ColumnRef(value) => -includedTerms :+= value +includedTerms ++= dotTerms.map(Seq(_)) case Deletion(term: Term) => term match { - case ColumnRef(value) => -includedTerms = includedTerms.filter(_ != value) + case inner: ColumnRef => +includedTerms = includedTerms.filter(_ != Seq(inner.value)) + case ColumnInteraction(terms) => +val fromInteraction = expandInteraction(schema, terms).map(_.toSet) +includedTerms = includedTerms.filter(t => !fromInteraction.contains(t.toSet)) case Dot => // e.g. "- .", which removes all first-order terms -val fromSchema = simpleTypes(schema) -includedTerms = includedTerms.filter(fromSchema.contains(_)) +includedTerms = includedTerms.filter { + case Seq(t) => !dotTerms.contains(t) + case _ => true +} case _: Deletion => assert(false, "Deletion terms cannot be nested") --- End diff -- * not part of this PR: `throw new RuntimeException(...)` * also not part of this PR: Shall we move `hasIntercept` to `ResolvedRFormula`? It is a little strange to have two places to store the fully parsed formula. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046354 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => --- End diff -- `terms` shadows the class member `terms`. `terms` -> `cols`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142159885 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142159870 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047937 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => +includedTerms ++= expandInteraction(schema, terms) case Dot => -includedTerms ++= simpleTypes(schema).filter(_ != label.value) - case ColumnRef(value) => -includedTerms :+= value +includedTerms ++= dotTerms.map(Seq(_)) case Deletion(term: Term) => term match { - case ColumnRef(value) => -includedTerms = includedTerms.filter(_ != value) + case inner: ColumnRef => +includedTerms = includedTerms.filter(_ != Seq(inner.value)) + case ColumnInteraction(terms) => --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047935 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142166909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142166873 [Test build #42805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/console) for PR 8830 at commit [`41dc78b`](https://github.com/apache/spark/commit/41dc78b740f3e5a31ea3882f5881f04c87b6dd66). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142166910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046370 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers { def columnRef: Parser[ColumnRef] = "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) } - def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => Dot } + def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot } + + def interaction: Parser[List[InteractionComponent]] = repsep(columnRef | dot, ":") --- End diff -- If we want to separate `columnRef` from `interaction`, we should use `rep1sep` here and update `term`. This might add several lines of code, but make the code easier to understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046356 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => +includedTerms ++= expandInteraction(schema, terms) case Dot => -includedTerms ++= simpleTypes(schema).filter(_ != label.value) - case ColumnRef(value) => -includedTerms :+= value +includedTerms ++= dotTerms.map(Seq(_)) case Deletion(term: Term) => term match { - case ColumnRef(value) => -includedTerms = includedTerms.filter(_ != value) + case inner: ColumnRef => +includedTerms = includedTerms.filter(_ != Seq(inner.value)) + case ColumnInteraction(terms) => --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046364 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -67,19 +76,52 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { intercept } + // expands the Dot operators in interaction terms + private def expandInteraction( + schema: StructType, terms: Seq[InteractionComponent]): Seq[Seq[String]] = { +if (terms.isEmpty) { + return Seq(Nil) +} + +val rest = expandInteraction(schema, terms.tail) +val validInteractions = (terms.head match { + case Dot => +expandDot(schema).filter(_ != label.value).flatMap { t => --- End diff -- remove `.filter(_ != label.value)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046365 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -87,11 +129,17 @@ private[ml] case class ResolvedRFormula(label: String, terms: Seq[String]) */ private[ml] sealed trait Term +/** A term that may be part of an interaction, e.g. 'x' in 'x:y' */ +private[ml] sealed trait InteractionComponent extends Term + /* R formula reference to all available columns, e.g. "." in a formula */ -private[ml] case object Dot extends Term +private[ml] case object Dot extends InteractionComponent /* R formula reference to a column, e.g. "+ Species" in a formula */ -private[ml] case class ColumnRef(value: String) extends Term +private[ml] case class ColumnRef(value: String) extends InteractionComponent --- End diff -- This makes the implementation easier but code harder to understand, because a `ColRef` is not an interaction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40046368 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers { def columnRef: Parser[ColumnRef] = "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) } - def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => Dot } + def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot } --- End diff -- minor: Those could be `val` instead of `def`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142155219 I only checked `RFormulaParser.scala`. Need more time to go through the rest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-142160070 [Test build #42805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42805/consoleFull) for PR 8830 at commit [`41dc78b`](https://github.com/apache/spark/commit/41dc78b740f3e5a31ea3882f5881f04c87b6dd66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047990 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -87,11 +129,17 @@ private[ml] case class ResolvedRFormula(label: String, terms: Seq[String]) */ private[ml] sealed trait Term +/** A term that may be part of an interaction, e.g. 'x' in 'x:y' */ +private[ml] sealed trait InteractionComponent extends Term + /* R formula reference to all available columns, e.g. "." in a formula */ -private[ml] case object Dot extends Term +private[ml] case object Dot extends InteractionComponent /* R formula reference to a column, e.g. "+ Species" in a formula */ -private[ml] case class ColumnRef(value: String) extends Term +private[ml] case class ColumnRef(value: String) extends InteractionComponent --- End diff -- I updated the parser so that Interaction doesn't entirely subsume ColRef, let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047963 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -31,20 +32,28 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { * of the special '.' term. Duplicate terms will be removed during resolution. */ def resolve(schema: StructType): ResolvedRFormula = { -var includedTerms = Seq[String]() +val dotTerms = expandDot(schema) +var includedTerms = Seq[Seq[String]]() terms.foreach { + case term: ColumnRef => +includedTerms :+= Seq(term.value) + case ColumnInteraction(terms) => +includedTerms ++= expandInteraction(schema, terms) case Dot => -includedTerms ++= simpleTypes(schema).filter(_ != label.value) - case ColumnRef(value) => -includedTerms :+= value +includedTerms ++= dotTerms.map(Seq(_)) case Deletion(term: Term) => term match { - case ColumnRef(value) => -includedTerms = includedTerms.filter(_ != value) + case inner: ColumnRef => +includedTerms = includedTerms.filter(_ != Seq(inner.value)) + case ColumnInteraction(terms) => +val fromInteraction = expandInteraction(schema, terms).map(_.toSet) +includedTerms = includedTerms.filter(t => !fromInteraction.contains(t.toSet)) case Dot => // e.g. "- .", which removes all first-order terms -val fromSchema = simpleTypes(schema) -includedTerms = includedTerms.filter(fromSchema.contains(_)) +includedTerms = includedTerms.filter { + case Seq(t) => !dotTerms.contains(t) + case _ => true +} case _: Deletion => assert(false, "Deletion terms cannot be nested") --- End diff -- Done (though, it's not really needed in ResolvedRFormula and a bit awkward to extract for use in SparkRWrappers) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047964 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -67,19 +76,52 @@ private[ml] case class ParsedRFormula(label: ColumnRef, terms: Seq[Term]) { intercept } + // expands the Dot operators in interaction terms + private def expandInteraction( + schema: StructType, terms: Seq[InteractionComponent]): Seq[Seq[String]] = { +if (terms.isEmpty) { + return Seq(Nil) +} + +val rest = expandInteraction(schema, terms.tail) +val validInteractions = (terms.head match { + case Dot => +expandDot(schema).filter(_ != label.value).flatMap { t => --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047995 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers { def columnRef: Parser[ColumnRef] = "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) } - def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => Dot } + def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot } + + def interaction: Parser[List[InteractionComponent]] = repsep(columnRef | dot, ":") --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/8830#discussion_r40047992 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala --- @@ -109,7 +157,15 @@ private[ml] object RFormulaParser extends RegexParsers { def columnRef: Parser[ColumnRef] = "([a-zA-Z]|\\.[a-zA-Z_])[a-zA-Z0-9._]*".r ^^ { case a => ColumnRef(a) } - def term: Parser[Term] = intercept | columnRef | "\\.".r ^^ { case _ => Dot } + def dot: Parser[InteractionComponent] = "\\.".r ^^ { case _ => Dot } --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141606324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141606323 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141602715 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141602769 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141603326 [Test build #42706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/consoleFull) for PR 8830 at commit [`15b4da7`](https://github.com/apache/spark/commit/15b4da72a6b6bf7e0e18255dcda7a72db207ba49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141606284 [Test build #42706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42706/console) for PR 8830 at commit [`15b4da7`](https://github.com/apache/spark/commit/15b4da72a6b6bf7e0e18255dcda7a72db207ba49). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/8830 [SPARK-9681] [ML] Support R feature interactions in RFormula This integrates the Interaction feature transformer with the SparkR R formula support (i.e. we support ':' now). @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark interaction-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8830.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8830 commit b16795add215a879d6ab461ac53140179e184a73 Author: Eric LiangDate: 2015-09-18T21:38:03Z Squashed commit of the following: commit ca78c26f9928f0e4a0fb8e0adf38510161178ed1 Author: Eric Liang Date: Fri Sep 18 14:37:00 2015 -0700 Fri Sep 18 14:37:00 PDT 2015 commit 68411c764fd39902088ab893064a6c226b694f94 Author: Eric Liang Date: Fri Sep 18 14:36:15 2015 -0700 doc commit abb81ebfcb535dba56432aba6ce9fcc60035a6d7 Author: Eric Liang Date: Fri Sep 18 14:18:43 2015 -0700 Fri Sep 18 14:18:43 PDT 2015 commit 97750a6c1e8b7f4f985487d0e58bd4c56ca774cf Author: Eric Liang Date: Fri Sep 18 14:09:43 2015 -0700 Fri Sep 18 14:09:43 PDT 2015 commit 6518f62557a035d03000575eaeae0c38d3ae4ba4 Author: Eric Liang Date: Fri Sep 18 13:03:26 2015 -0700 Fri Sep 18 13:03:26 PDT 2015 commit 853ab7d7f179d53de5fe5653ff0b677191a5c5f5 Author: Eric Liang Date: Fri Sep 18 12:49:32 2015 -0700 Fri Sep 18 12:49:32 PDT 2015 commit 7ce9c289c822ca132f22f9e709742e8293d275ef Author: Eric Liang Date: Thu Sep 17 15:42:41 2015 -0700 Thu Sep 17 15:42:41 PDT 2015 commit 09b4e00b87669f785c112e0a70ddf1bbaf02dd34 Merge: e5099f6 4fbf332 Author: Eric Liang Date: Thu Sep 17 15:41:31 2015 -0700 Merge branch 'master' into interaction-2 Conflicts: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala commit e5099f695a9533e402ca15434330e2f3678d30f3 Author: Eric Liang Date: Thu Aug 6 17:21:21 2015 -0700 tests and attribute refactorign commit 4c11a773e74e677f237f23949eeb9dffa8bc43f2 Author: Eric Liang Date: Wed Aug 5 23:45:35 2015 -0700 small nits commit 5f7cb9b505e043039898df9860b999d5382c4ae0 Author: Eric Liang Date: Wed Aug 5 23:15:14 2015 -0700 Wed Aug 5 23:15:14 PDT 2015 commit 3ad5464566076570438001714582082b11981479 Author: Eric Liang Date: Wed Aug 5 23:08:34 2015 -0700 docs commit 478ee8f2e1133901746e68cc202ecfc89de8eaa9 Author: Eric Liang Date: Wed Aug 5 22:57:10 2015 -0700 add rformula test commit 11bb70fcdd32f390a513fa39a8d8096a60e3d22e Merge: 2957cb6 d5a9af3 Author: Eric Liang Date: Wed Aug 5 22:20:26 2015 -0700 Merge branch 'master' into interaction commit 2957cb686264303344c02b96e5ce166a7a66a959 Author: Eric Liang Date: Wed Aug 5 22:19:28 2015 -0700 fix parser commit dc8801a31cd7806ccd1e4ef902568e1d9bc85e94 Author: Eric Liang Date: Wed Aug 5 19:59:50 2015 -0700 Wed Aug 5 19:59:50 PDT 2015 commit a12e58e58e7a9c44611124d21c6172cb0a4c6b53 Author: Eric Liang Date: Wed Aug 5 15:58:14 2015 -0700 Wed Aug 5 15:58:14 PDT 2015 commit a3623aa6cbb2e248b45b489cfb216c9d87bc7c86 Author: Eric Liang Date: Tue Aug 4 20:32:39 2015 -0700 attribute generation commit ab2a3477512aba52f69f3ccd0bfe620b2da0cb39 Author: Eric Liang Date: Tue Aug 4 18:14:31 2015 -0700 combiner commit 0ece16cd2e30403f4fa8c5473e6cb6e7930ae52f Author: Eric Liang Date: Mon Aug 3 20:29:16 2015 -0700 compiles now commit 429cb520682dcd446f01e204f178b5c0c932e6cf Author: Eric Liang Date: Mon Aug 3 18:32:54 2015 -0700 first pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141576487 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141577509 [Test build #42695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/consoleFull) for PR 8830 at commit [`b16795a`](https://github.com/apache/spark/commit/b16795add215a879d6ab461ac53140179e184a73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141584803 [Test build #42695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/console) for PR 8830 at commit [`b16795a`](https://github.com/apache/spark/commit/b16795add215a879d6ab461ac53140179e184a73). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141584845 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42695/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141576507 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8830#issuecomment-141584844 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128283405 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128284672 [Test build #40011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40011/console) for PR 7987 at commit [`4c11a77`](https://github.com/apache/spark/commit/4c11a773e74e677f237f23949eeb9dffa8bc43f2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128547360 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128547373 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128471053 @ericl Shall we split this PR into two? 1. Add `Interaction` as a transformer (SPARK-9698). 2. Support feature interaction in RFormula. After 1) is merged, people can start working on the Python API, without being blocked by 2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128275868 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128275891 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128284709 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128276283 [Test build #40009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40009/consoleFull) for PR 7987 at commit [`386881b`](https://github.com/apache/spark/commit/386881ba7c517b4affc90a4e91e84f860c603e11). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128277742 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128277768 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/7987 [SPARK-9681] [ML] Support R feature interactions in RFormula This adds support for the interaction (:) operator to the RFormula feature transformer. Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark interaction Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7987 commit 429cb520682dcd446f01e204f178b5c0c932e6cf Author: Eric Liang e...@databricks.com Date: 2015-08-04T01:32:54Z first pass commit 0ece16cd2e30403f4fa8c5473e6cb6e7930ae52f Author: Eric Liang e...@databricks.com Date: 2015-08-04T03:29:16Z compiles now commit ab2a3477512aba52f69f3ccd0bfe620b2da0cb39 Author: Eric Liang e...@databricks.com Date: 2015-08-05T01:14:31Z combiner commit a3623aa6cbb2e248b45b489cfb216c9d87bc7c86 Author: Eric Liang e...@databricks.com Date: 2015-08-05T03:32:39Z attribute generation commit a12e58e58e7a9c44611124d21c6172cb0a4c6b53 Author: Eric Liang e...@databricks.com Date: 2015-08-05T22:58:14Z Wed Aug 5 15:58:14 PDT 2015 commit dc8801a31cd7806ccd1e4ef902568e1d9bc85e94 Author: Eric Liang e...@databricks.com Date: 2015-08-06T02:59:50Z Wed Aug 5 19:59:50 PDT 2015 commit 2957cb686264303344c02b96e5ce166a7a66a959 Author: Eric Liang e...@databricks.com Date: 2015-08-06T05:19:28Z fix parser commit 11bb70fcdd32f390a513fa39a8d8096a60e3d22e Author: Eric Liang e...@databricks.com Date: 2015-08-06T05:20:26Z Merge branch 'master' into interaction commit 478ee8f2e1133901746e68cc202ecfc89de8eaa9 Author: Eric Liang e...@databricks.com Date: 2015-08-06T05:57:10Z add rformula test commit 3ad5464566076570438001714582082b11981479 Author: Eric Liang e...@databricks.com Date: 2015-08-06T06:08:34Z docs commit 5f7cb9b505e043039898df9860b999d5382c4ae0 Author: Eric Liang e...@databricks.com Date: 2015-08-06T06:15:14Z Wed Aug 5 23:15:14 PDT 2015 commit e44dd8397440f0e997a5384dc79220ac4ff2ba34 Author: Eric Liang e...@databricks.com Date: 2015-08-06T06:45:35Z small nits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128277888 [Test build #40011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40011/consoleFull) for PR 7987 at commit [`4c11a77`](https://github.com/apache/spark/commit/4c11a773e74e677f237f23949eeb9dffa8bc43f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9681] [ML] Support R feature interactio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7987#issuecomment-128283369 [Test build #40009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/40009/console) for PR 7987 at commit [`386881b`](https://github.com/apache/spark/commit/386881ba7c517b4affc90a4e91e84f860c603e11). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RInteraction(override val uid: String) extends Estimator[PipelineModel]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org