[GitHub] spark pull request: [SPARK-11611][MLlib][Python] Python API for bi...
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/9583 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12874][ML] ML StringIndexer does not pr...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/11370 [SPARK-12874][ML] ML StringIndexer does not protect itself from column name duplication ## What changes were proposed in this pull request? ML StringIndexer does not protect itself from column name duplication. We should still improve a way to validate a schema of `StringIndexer` and `StringIndexerModel`. However, it would be great to fix at another issue. ## How was this patch tested? unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-12874 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11370 commit 917851d3a3f554b0a0f93036da0205b8e24947f1 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2016-02-25T15:12:36Z [SPARK-12874][ML] ML StringIndexer does not protect itself from column name duplication --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/11362#issuecomment-188610588 @mengxr can you review it when you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/11362 [SPARK-13292][ML][PYTHON] QuantileDiscretizer should take random seed in PySpark ## What changes were proposed in this pull request? QuantileDiscretizer in Python should also specify a random seed. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-13292 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11362.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11362 commit 02ffa763358ba0300f0ffcf6f8755951336a5b17 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2016-02-25T04:15:07Z [SPARK-13292][ML][PYTHON] QuantileDiscretizer should take random seed in PySpark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13265][ML] Refactoring of basic ML impo...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/11151#issuecomment-182865831 cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13265][ML] Refactoring of basic ML impo...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/11151#issuecomment-183116192 Thank you for merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-183115660 Thank you for merging it. I will do the issue. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13265][ML] Refactoring of basic ML impo...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/11151 [SPARK-13265][ML] Refactoring of basic ML import/export for other file system besides HDFS @jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-13265 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11151.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11151 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-182366711 @jkbradley ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9604#issuecomment-173415286 Thank you for merging it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9604#issuecomment-173152191 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] DigestUtils.shaHex is deprecated ...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/10831 [SQL][Minor] DigestUtils.shaHex is deprecated in misc.scala Since `DigestUtils.shaHex` is deprecated, we should replace `shaHex` with `sha1Hex`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark minor-shaHex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10831 commit ba303af06967a631aa4dc937b4ab354a6a578e86 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2016-01-19T07:57:03Z [SQL][Minor] DigestUtils.shaHex is deprecated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] DigestUtils.shaHex is deprecated ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10831#issuecomment-172788036 @JoshRosen sorry, thank you for letting me know. I'll close this PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] DigestUtils.shaHex is deprecated ...
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/10831 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/9535#discussion_r50203290 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -23,8 +23,8 @@ import org.apache.spark.Logging import org.apache.spark.annotation.{Experimental, Since} import org.apache.spark.ml._ import org.apache.spark.ml.attribute.NominalAttribute +import org.apache.spark.ml.param.shared.{HasSeed, HasInputCol, HasOutputCol} --- End diff -- Thank you for pointing out. I forgot running the lint script. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-173062339 @jkbradley I have rebased with master. Can you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9604#issuecomment-173055593 Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12404] [SQL] Ensure objects passed to S...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10357#issuecomment-165461189 @sarutak thank you for sending this PR. @marmbrus @rxin could you review it? I think this is a little big issue. We should fix it before releasing Spark 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-165277966 Thank you for merging it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9694] [ML] Add random seed Param to Sca...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9108#issuecomment-165277685 Thank you for merging this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12215][ML][DOC] User guide section for ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10244#issuecomment-164964860 @jkbradley thank you so much for the review. I modified the 2 points. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-164966445 @jkbradley thanks for the review. I modified the wrong indentations and excluded the import statements for `SparkConf` and `JavaSparkContext`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-164631771 I have rebased this PR with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-164604399 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-164341602 Oh, I'm terribly sorry about that. Pushing the update was failed - Modified what you pointed out - Add a Java example and its doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12215][ML][DOC] User guide section for ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10244#issuecomment-164365254 @jkbradley thank you for reviewing it. I modified some a few points. Can you review it again? - Include the import statments of JavaKMeansExample in the doc - Simplify KMeansExample code, accoding to BinarizerExample.scala] - Modify the description about KMeans in `ml-clustering.md` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-163487878 ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12215][ML][DOC] User guide section for ...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/10244 [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml cc @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-12215 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10244.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10244 commit e8e9c14fcaa994644dd7e2410b1c4d70eb842867 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-12-10T04:42:03Z [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12215][ML][DOC] User guide section for ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10244#issuecomment-163487655 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12215][ML][DOC] User guide section for ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/10244#issuecomment-163488306 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8534#issuecomment-161946643 @sarutak thank you for your help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example] Add example code ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-161983105 @jkbradley ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10266][Documentation, ML] Fixed @Since ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9338#issuecomment-161460435 Thank you for merging it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example] Add example code ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-161466765 I moved the documentation to this PR from https://github.com/apache/spark/pull/9968. Because the PR depends on this PR. I modified the docs for `BisectingKMeans` with `include_example`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][DOC] Add example code and ...
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/9968 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][DOC] Add example code and ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9968#issuecomment-161466498 @jkbradley finally, I want to merge this PR with https://github.com/apache/spark/pull/9952. Because this PR depends on the example code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10266][Documentation, ML] Fixed @Since ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9338#issuecomment-161301469 I'll fix them soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example] Add example code ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-160012938 All right. I'll fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example] Add example code ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-160019724 @jkbradley could you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][DOC] Add example code and ...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/9968#discussion_r46012436 --- Diff: data/mllib/sample_bisecting_kmeans_data.txt --- @@ -0,0 +1,20 @@ +6.4,2.7,5.3,1.9 --- End diff -- Yes, we can. The reason why I added a new data file was that it was a little strange to use one for k-means for me. So, I'll remove the file and then use the sample data for k-means instead of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][DOC] Add example code and ...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9968 [SPARK-6518][MLlib][DOC] Add example code and user guide for bisecting k-means cc @jkbradley This PR relates to https://github.com/apache/spark/pull/9952. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-6518.docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9968.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9968 commit e476b0a6a8270d937255b0334879aa065cbb22ec Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-25T08:50:58Z [SPARK-6518][MLlib][DOC] Add example code and user guide for bisecting k-means --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-11602] [MLlib] Refine visibility f...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/9939#discussion_r45815661 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -33,7 +33,7 @@ import org.apache.spark.rdd.RDD @Since("1.6.0") @Experimental class BisectingKMeansModel @Since("1.6.0") ( -@Since("1.6.0") val root: ClusteringTreeNode +@Since("1.6.0")private[clustering] val root: ClusteringTreeNode --- End diff -- Sorry. It's not public parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10266][Documentation, ML] Fixed @Since ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9338#issuecomment-159444594 Sure. I'll resolve the conflicts by rebaseing master. Just a minute! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10266][Documentation, ML] Fixed @Since ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9338#issuecomment-159457312 @jkbradley I have rebased master. @noel-smith could you take a look when you have time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib] Add example code and user ...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9952 [SPARK-6518][MLlib] Add example code and user guide for bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-6518 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9952 commit 7c8005d51283c818c6dde4737ae89571ad0d417d Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-25T02:02:56Z [SPARK-6518][MLlib] Add example code and user guide for bisecting k-means --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8534#issuecomment-159466727 @taishi-oss thank you for the udpate! LGTM @jkbradley could you take a look at PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10263] [ML] Add @Since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8935#issuecomment-159466857 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10263] [ML] Add @Since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8935#issuecomment-159467655 @taishi-oss thank you for your contribution. Could you modify this PR like https://github.com/apache/spark/pull/8534? We should add the annotation to all public class/object, public methods and public variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8534#issuecomment-159468625 @jkbradley could you run tests for this PR? This is the fist PR for @taishi-oss. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib] Add example code and user ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-159470079 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib] Add example code and user ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-159474623 @jkbradley could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9960 [Minor] Remove unnecessary spaces in `include_example.rb` You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark minor-remove-spaces Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9960 commit 7d093816541b03915ae862fd1f8450f05369ce8b Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-25T05:46:33Z [Minor] Remove unnecessary spaces in `include_example.rb` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10991][ML] logistic regression training...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9037#issuecomment-157283184 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8534#issuecomment-156182872 ping @taishi-oss --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-156181157 @JihongMA thanks for the update! Could you revert `Skewness.scala` and `Kurtosis.scala`. Since I don't think the change relates to the issue. I know this is a minor thing, but we shouldn't change them in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10266][Documentation, ML] Fixed @Since ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9338#issuecomment-156187540 @noel-smith could you review it when you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-156177206 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9604 [SPARK-6519][ML] Add spark.ml API for bisecting k-means You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-6519 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9604.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9604 commit 58985b583167bc5b2da1345c8e48482fe7fb8bf5 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-04T01:31:00Z [SPARK-6519][ML] Add spark.ml API for bisecting k-means --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9604#issuecomment-155587648 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6519][ML] Add spark.ml API for bisectin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9604#issuecomment-155587079 @jkbradley @mengxr could you review it when you have time? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11566][MLlib][Python] Refactoring Gauss...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9534#issuecomment-155608020 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11611][MLlib][Python] Python API for bi...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9583#issuecomment-155271852 @mengxr could you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11611][MLlib][Python] Python API for bi...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9583 [SPARK-11611][MLlib][Python] Python API for bisecting k-means You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-11611 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9583 commit b8aa40d75fb54dbfa45b2d49b70f0280066e2d74 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-03T08:30:11Z [SPARK-11611][MLlib][Python] Python API for bisecting k-means --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-155175314 @jkbradley sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r44322215 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -0,0 +1,489 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import java.util.Random + +import scala.collection.mutable + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors} +import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel + +/** + * A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" + * by Steinbach, Karypis, and Kumar, with modification to fit Spark. + * The algorithm starts from a single cluster that contains all points. + * Iteratively it finds divisible clusters on the bottom level and bisects each of them using + * k-means, until there are `k` leaf clusters in total or no leaf clusters are divisible. + * The bisecting steps of clusters on the same level are grouped together to increase parallelism. + * If bisecting all divisible clusters on the bottom level would result more than `k` leaf clusters, + * larger clusters get higher priority. + * + * @param k the desired number of leaf clusters (default: 4). The actual number could be smaller if + * there are no divisible leaf clusters. + * @param maxIterations the max number of k-means iterations to split clusters (default: 20) + * @param minDivisibleClusterSize the minimum number of points (if >= 1.0) or the minimum proportion + *of points (if < 1.0) of a divisible cluster (default: 1) + * @param seed a random seed (default: hash value of the class name) + * + * @see [[http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf + * Steinbach, Karypis, and Kumar, A comparison of document clustering techniques, + * KDD Workshop on Text Mining, 2000.]] + */ +@Since("1.6.0") +@Experimental +class BisectingKMeans private ( +private var k: Int, +private var maxIterations: Int, +private var minDivisibleClusterSize: Double, +private var seed: Long) extends Logging { + + import BisectingKMeans._ + + /** + * Constructs with the default configuration + */ + @Since("1.6.0") + def this() = this(4, 20, 1.0, classOf[BisectingKMeans].getName.##) + + /** + * Sets the desired number of leaf clusters (default: 4). + * The actual number could be smaller if there are no divisible leaf clusters. + */ + @Since("1.6.0") + def setK(k: Int): this.type = { +require(k > 0, s"k must be positive but got $k.") +this.k = k +this + } + + /** + * Gets the desired number of leaf clusters. + */ + @Since("1.6.0") + def getK: Int = this.k + + /** + * Sets the max number of k-means iterations to split clusters (default: 20). + */ + @Since("1.6.0") + def setMaxIterations(maxIterations: Int): this.type = { +require(maxIterations > 0, s"maxIterations must be positive but got $maxIterations.") +this.maxIterations = maxIterations +this + } + + /** + * Gets the max number of k-means iterations to split clusters. + */ + @Since("1.6.0") + def getMaxIterations: Int = this.maxIterations + + /** + * Sets the minimum number of points (if >= `1.0`) or the minimum proportion of points + * (if < `1.0`) of a divisible cluster (default: 1). + */ + @Since("1.6.0") + def setMinDivisibleClusterSize(
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r44321863 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -0,0 +1,489 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import java.util.Random + +import scala.collection.mutable + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors} +import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel + +/** + * A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" + * by Steinbach, Karypis, and Kumar, with modification to fit Spark. + * The algorithm starts from a single cluster that contains all points. + * Iteratively it finds divisible clusters on the bottom level and bisects each of them using + * k-means, until there are `k` leaf clusters in total or no leaf clusters are divisible. + * The bisecting steps of clusters on the same level are grouped together to increase parallelism. + * If bisecting all divisible clusters on the bottom level would result more than `k` leaf clusters, + * larger clusters get higher priority. + * + * @param k the desired number of leaf clusters (default: 4). The actual number could be smaller if + * there are no divisible leaf clusters. + * @param maxIterations the max number of k-means iterations to split clusters (default: 20) + * @param minDivisibleClusterSize the minimum number of points (if >= 1.0) or the minimum proportion + *of points (if < 1.0) of a divisible cluster (default: 1) + * @param seed a random seed (default: hash value of the class name) + * + * @see [[http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf + * Steinbach, Karypis, and Kumar, A comparison of document clustering techniques, + * KDD Workshop on Text Mining, 2000.]] + */ +@Since("1.6.0") +@Experimental +class BisectingKMeans private ( +private var k: Int, +private var maxIterations: Int, +private var minDivisibleClusterSize: Double, +private var seed: Long) extends Logging { + + import BisectingKMeans._ + + /** + * Constructs with the default configuration + */ + @Since("1.6.0") + def this() = this(4, 20, 1.0, classOf[BisectingKMeans].getName.##) + + /** + * Sets the desired number of leaf clusters (default: 4). + * The actual number could be smaller if there are no divisible leaf clusters. + */ + @Since("1.6.0") + def setK(k: Int): this.type = { +require(k > 0, s"k must be positive but got $k.") +this.k = k +this + } + + /** + * Gets the desired number of leaf clusters. + */ + @Since("1.6.0") + def getK: Int = this.k + + /** + * Sets the max number of k-means iterations to split clusters (default: 20). + */ + @Since("1.6.0") + def setMaxIterations(maxIterations: Int): this.type = { +require(maxIterations > 0, s"maxIterations must be positive but got $maxIterations.") +this.maxIterations = maxIterations +this + } + + /** + * Gets the max number of k-means iterations to split clusters. + */ + @Since("1.6.0") + def getMaxIterations: Int = this.maxIterations + + /** + * Sets the minimum number of points (if >= `1.0`) or the minimum proportion of points + * (if < `1.0`) of a divisible cluster (default: 1). + */ + @Since("1.6.0") + def setMinDivisibleClusterSize(
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-155314648 @jkbradley thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-155231942 @freeman-lab thank you for your great support! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-155232384 @jkbradley I send the PR at https://github.com/apache/spark/pull/9577. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11610][MLlib][Python][Docs] Make the do...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9577 [SPARK-11610][MLlib][Python][Docs] Make the docs of LDAModel.describeTopics in Python more specific cc @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-11610 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9577.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9577 commit e07f118ccfe9e80befaa00686c28020ba1a84350 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-09T22:56:45Z [SPARK-11610][MLlib][Python][Docs] Make the docs of LDAModel.describeTopics in Python more specific --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-155226627 Thank you for merging it!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11610][MLlib][Python][Docs] Make the do...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9577#issuecomment-155244740 Thanks for merging it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10863][SPARKR] Method coltypes() (New v...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9579#issuecomment-155249581 @olarayej could you please check the style warnings with `./dev/lint-r` on your local computer before pushing commits. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10280][MLlib][PySpark][Docs] Add @since...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8690#issuecomment-154885652 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10280][MLlib][PySpark][Docs] Add @since...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8690#issuecomment-154885616 @noel-smith thanks you for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154585403 @jkbradley @davies could you review it? I modified the type conversion using `SerDe.dumps`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9535 [SPARK-11515][ML] QuantileDiscretizer should take random seed cc @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-11515 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9535.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9535 commit bfabeb2c88227c90d68a7699106accc01f1bf2f9 Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-07T02:28:35Z [SPARK-11515][ML] QuantileDiscretizer should take random seed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11566][MLlib][Python] Refactoring Gauss...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9534 [SPARK-11566][MLlib][Python] Refactoring GaussianMixtureModel.gaussians in Python cc @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-11566 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9534 commit 7334add72a43832e27652d5f917c543ac8f4d57f Author: Yu ISHIKAWA <yuu.ishik...@gmail.com> Date: 2015-11-07T01:31:20Z [SPARK-11566][MLlib][Python] Refactoring GaussianMixtureModel.gaussians in Python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154629369 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154643744 Thank you for merging it and your great support! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154618704 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154617157 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154640689 @davies thanks for the review. I fixed them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-154501697 @nilmeier thanks. This document would help you with contributing to Spark. https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154508219 I think it would be natural to fix the Scala error message and fix the condition in Python. The algorithm checks not the indexes have duplicated values, but the indexes are sorted. @srowen What do you think? ## Python Replace `>=` with `>`. ``` if self.indices[i] > self.indices[i + 1]: raise TypeError("indices array must be sorted") ``` ## Scala Change the message. ``` require(prev < i, s"indices array must be sorted: $i.") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44164309 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- ping @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44168908 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- I tried to test serialization directly. It worked well. Why do we failed to serialize the `describeTopics `'s return value...? https://gist.github.com/yu-iskw/22fb83895024a29ea048 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44171965 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Thanks! Let me think about it. Because python's mllib `call` doesn't have `encoding` option. Sorry, one more thing, what is the difference between the two cases in this gist. When comparing the return value with `1`, It seems to be going well. However, when comparing with the expected value, it failed. https://gist.github.com/yu-iskw/59c66bb90d9311c0b408 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44185186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Thanks! I did it!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154512438 @urvishparikh oh, got it. Thank you for letting me know. It's my fault. If the indexes are already sorted, the condition should be `==`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154518546 All right. We should focus on change the error message in this issue. Thank you for making it clear. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154532750 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154537228 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-154164156 @JihongMA I don't know if there are any strong reasons in terms of catalyst. However, personally I think we should separate changing the return type and `null` from the issue. So, we should focus on refacotring stdev in this PR. It seems that at least `Skewness` and `Kurtosis` have nothing to do with the issue. If wee need to discuss them, it would be great to do in another issue. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093654 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -165,8 +185,9 @@ object GBTClassifier { * @param _treeWeights Weights for the decision trees in the ensemble. */ @Experimental +@Since("1.6.0") final class GBTClassificationModel private[ml]( -override val uid: String, +@Since("1.6.0") override val uid: String, private val _trees: Array[DecisionTreeRegressionModel], private val _treeWeights: Array[Double], override val numFeatures: Int) --- End diff -- Add the tag to `numFeatures` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -390,8 +404,9 @@ class LogisticRegression(override val uid: String) * Model produced by [[LogisticRegression]]. */ @Experimental +@Since("1.4.0") class LogisticRegressionModel private[ml] ( -override val uid: String, +@Since("1.4.0") override val uid: String, val coefficients: Vector, val intercept: Double) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093643 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -103,11 +116,12 @@ object DecisionTreeClassifier { * features. */ @Experimental +@Since("1.4.0") final class DecisionTreeClassificationModel private[ml] ( -override val uid: String, -override val rootNode: Node, -override val numFeatures: Int, -override val numClasses: Int) +@Since("1.5.0")override val uid: String, +@Since("1.5.0")override val rootNode: Node, --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093665 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -390,8 +404,9 @@ class LogisticRegression(override val uid: String) * Model produced by [[LogisticRegression]]. */ @Experimental +@Since("1.4.0") class LogisticRegressionModel private[ml] ( -override val uid: String, +@Since("1.4.0") override val uid: String, val coefficients: Vector, --- End diff -- Add the tag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093682 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -110,8 +116,9 @@ class NaiveBayes(override val uid: String) * by D (number of features) */ @Experimental +@Since("1.5.0") class NaiveBayesModel private[ml] ( -override val uid: String, +@Since("1.5.0") override val uid: String, val pi: Vector, --- End diff -- Add the tag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8534#discussion_r44093689 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -71,16 +71,19 @@ private[ml] trait OneVsRestParams extends PredictorParams { * (taking label 0). */ @Experimental +@Since("1.4.0") final class OneVsRestModel private[ml] ( -override val uid: String, +@Since("1.4.0") override val uid: String, labelMetadata: Metadata, val models: Array[_ <: ClassificationModel[_, _]]) --- End diff -- Add the tag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org