[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355108 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala --- @@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends SparkFunSuite with MLlibTestSparkContext { LabeledPoint(1.0, Vectors.dense(Array(4.0))), LabeledPoint(1.0, Vectors.dense(Array(4.0))), LabeledPoint(2.0, Vectors.dense(Array(9.0 -val model = new ChiSqSelector().setAlpha(0.1).fit(labeledDiscreteData) +val model = new ChiSqSelector().setSelectorType("fpr").setAlpha(0.1).fit(labeledDiscreteData) --- End diff -- Added ML test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355103 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -160,6 +166,12 @@ final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: Str @Since("1.6.0") override def transformSchema(schema: StructType): StructType = { +val otherPairs = OldChiSqSelector.supportedTypeAndParamPairs.filter(_._1 == $(selectorType)) --- End diff -- Sorry, typo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355100 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: Str case Row(label: Double, features: Vector) => OldLabeledPoint(label, OldVectors.fromML(features)) } -var selector = new feature.ChiSqSelector() -ChiSqSelectorType.withName($(selectorType)) match { - case ChiSqSelectorType.KBest => +val selector = new feature.ChiSqSelector() +$(selectorType) match { + case OldChiSqSelector.KBest => selector.setNumTopFeatures($(numTopFeatures)) --- End diff -- Yes, updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80278819 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala --- @@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends SparkFunSuite with MLlibTestSparkContext { LabeledPoint(1.0, Vectors.dense(Array(4.0))), LabeledPoint(1.0, Vectors.dense(Array(4.0))), LabeledPoint(2.0, Vectors.dense(Array(9.0 -val model = new ChiSqSelector().setAlpha(0.1).fit(labeledDiscreteData) +val model = new ChiSqSelector().setSelectorType("fpr").setAlpha(0.1).fit(labeledDiscreteData) --- End diff -- you should also do the same thing for https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277785 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -160,6 +166,12 @@ final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: Str @Since("1.6.0") override def transformSchema(schema: StructType): StructType = { +val otherPairs = OldChiSqSelector.supportedTypeAndParamPairs.filter(_._1 == $(selectorType)) --- End diff -- == or != ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277470 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: Str case Row(label: Double, features: Vector) => OldLabeledPoint(label, OldVectors.fromML(features)) } -var selector = new feature.ChiSqSelector() -ChiSqSelectorType.withName($(selectorType)) match { - case ChiSqSelectorType.KBest => +val selector = new feature.ChiSqSelector() +$(selectorType) match { + case OldChiSqSelector.KBest => selector.setNumTopFeatures($(numTopFeatures)) - case ChiSqSelectorType.Percentile => + case OldChiSqSelector.Percentile => selector.setPercentile($(percentile)) - case ChiSqSelectorType.FPR => + case OldChiSqSelector.FPR => selector.setAlpha($(alpha)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277277 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: Str case Row(label: Double, features: Vector) => OldLabeledPoint(label, OldVectors.fromML(features)) } -var selector = new feature.ChiSqSelector() -ChiSqSelectorType.withName($(selectorType)) match { - case ChiSqSelectorType.KBest => +val selector = new feature.ChiSqSelector() +$(selectorType) match { + case OldChiSqSelector.KBest => selector.setNumTopFeatures($(numTopFeatures)) --- End diff -- Do you need to set SelectorType here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15214 [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector and add ML Python API. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-17017 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15214 commit 8d1536a683ac6c237f8995f335121929ad85e3c8 Author: Yanbo LiangDate: 2016-09-23T11:13:05Z Refactor of ChiSqSelector and add ML Python API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org