[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-03-03 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-76953094 @jkbradley I wrote a script here (https://gist.github.com/MechCoder/5939294f74f105e5c499) to compare the timings in this branch and master, It seems to me

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-03-02 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-76701605 Hmm. I came up with this, but surely there should be a more elegant way of doing it. import scala.util.Random rng = Random rng.setSeed(0

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-03-01 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-76669702 Hi, Sorry for taking so much time to get back to this. I want to generate some random data to write this script using breeze. But I unable to understand how the random

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

2015-03-01 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4834#issuecomment-76622887 Great. Do you have any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...

2015-02-28 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76527936 Also, the present code is unoptimized since there are two runs across the data RDD. one to update the residual, and the other to calculate the error. But that can

[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...

2015-02-28 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76527811 @jkbradley I am assuming that this is what you intended. It works but I'm not sure about the present design, which differs from the design that you had posted

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

2015-02-28 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4834#issuecomment-76541233 Hmm. I get an a accuracy of zero for the given example. Not sure where I'm going wrong though :( --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

2015-02-28 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4834#issuecomment-76541036 cc: @mengxr Would you be able to verify this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

2015-02-28 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4834 [SPARK-6083] [MLLib] [DOC] Make Python API example consistent in NaiveBayes You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark

[GitHub] spark pull request: [SPARK-6083] [MLLib] [DOC] Make Python API exa...

2015-02-28 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4834#issuecomment-76578092 I changed the randomSplit seed and it works better. It should look good now. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-6025] Add helper method to efficiently ...

2015-02-27 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4819 [SPARK-6025] Add helper method to efficiently compute error in GBT's While computing the error, with and without validation, for every iteration, the feature prediction of the previous trees

[GitHub] spark pull request: [SPARK-6025] Add helper method to efficiently ...

2015-02-27 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76472686 @jkbradley Is this similar to what you had in mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-75745207 Great, I'll do it tomorrow after I'm done with my exams. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-75745123 @jkbradley I have fixed up your comments ! Hopefully good to go. [off-topic] It would be really great and helpful if Spark would be interested in taking

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-23 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4677#discussion_r25150164 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala --- @@ -158,6 +158,63 @@ class GradientBoostedTreesSuite

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-22 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4677#discussion_r25143389 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala --- @@ -158,6 +158,63 @@ class GradientBoostedTreesSuite

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-22 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-75487872 @jkbradley Addressed all your comments except the inline one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-22 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4677#discussion_r25142849 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -76,8 +77,44 @@ class GradientBoostedTrees(private val

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-75223810 @mengxr Fixed ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4677#discussion_r25062848 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -76,8 +77,42 @@ class GradientBoostedTrees(private val

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-75225799 Just to clarify, by cluster mode do you mean running `./bin/spark-shell --master spark://manoj-X550LD:7077` where the url is generated by doing `./sbin/start-master.sh

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-75285426 @tgaloppo Sorry for my noobness, all my work on MLlib has been on a single machine. I am not really sure how to run it on a cluster (and hence was verifying if my

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-75026265 @jkbradley I have fixed up your comments. Btw, why are there are both a train and a run, which seems to me do the same thing. Is it not better to have one way

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-75195111 @jkbradley I've fixed up your comments. a] Negative tol is allowed. b] It makes sense to return based on the best validationError rather than the previous

[GitHub] spark pull request: [Minor] Minor doc fix in GBT classification ex...

2015-02-18 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4672 [Minor] Minor doc fix in GBT classification example numClassesForClassification has been renamed to numClasses. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

2015-02-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74834330 ping @jkbradley ? I was not sure if I had to open a JIRA for this, as it is minor. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-18 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4677 [SPARK-5436] [MLlib] Validate GradientBoostedTrees during train One can early stop if the decrease in error rate is lesser than a certain tol, or if the error increases if the training data

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

2015-02-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74953724 @jkbradley I just wanted to know if this is in the right direction. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24877024 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -168,16 +182,26 @@ class GaussianMixture private

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74819739 Thanks ! Looking forward to learn lot's more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74740596 @tgaloppo I've addressed the issue with distributing the Gaussian updates, in the latest commit. But it breaks tests (Note that I've set distributeGaussian explicitly

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74734084 I could not distribute the other Gaussian update, since this line (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74734555 Err.. No. Should have been some other error while I tested it. Will update that in a while. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74723052 @manishamde Thanks. The LGTM suggests that this should be good to go in! ;) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4654 [SPARK-5016] Distribute Gaussian Initialization in GaussianMixture Following discussion in the JIRA You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74734747 Do you want me to time any specific data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24846111 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -135,25 +135,39 @@ class GaussianMixture private

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24848796 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -168,16 +182,26 @@ class GaussianMixture private

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74609225 @jkbradley fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4231#discussion_r24789326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1064,9 +1045,12 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-10 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73831296 Thanks @tgaloppo and @mengxr . Any idea what to touch in GaussianMixture next? The parallelized Gaussian initialization. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-10 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73667895 @mengxr Just to make it easier for you, a small description. GaussianMixture used to support sparse input, by converting it to DenseVectors, which is non-optimal

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-10 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4459#discussion_r24439928 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -235,12 +235,23 @@ private[spark] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-10 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73777170 @mengxr Fixed up your comments. Let me know if there is anything else. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-09 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73649664 @tgaloppo Alright, thanks for the explanation. What makes you think that the covariance matrix is wrong. I calculated it manually and it seems to be right. I added

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-09 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73554216 Also a noob question, but what is the significance of the negative variance in the tests? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-09 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73551176 @tgaloppo I fixed it up. Can you have a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5021] [MLlib] Gaussian Mixture now supp...

2015-02-09 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73508202 @tgaloppo Thanks for your valuable feedback. Do you have anything more to add as of now? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-08 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-73348866 @jkbradley Thanks for your reviews. I fixed them up. Anyone else want to have a final look? cc @manishamde @mengxr ? --- If your project is set up

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4459 [SPARK-5021] Gaussian Mixture now supports Sparse Input Following discussion in the Jira. You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73429922 ping @tgaloppo and @jkbradley (whenever you are back!) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4459#discussion_r24310707 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -215,20 +217,29 @@ private object ExpectationSum { def

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4459#discussion_r24310793 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala --- @@ -40,10 +41,15 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4459#discussion_r24308815 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala --- @@ -40,10 +41,15 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request: [SPARK-5021] Gaussian Mixture now supports Spa...

2015-02-08 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4459#issuecomment-73456587 @tgaloppo Thanks. I shall fix them in a while. However, does the general code look good to you? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-06 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-73300352 ping @jkbradley ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-03 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-72696085 @jkbradley Sorry for being impatient, but would you be able to have a look anytime soon? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-31 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-72331831 Thanks :) Looking forward to your reviews. I will work on other stuff till then. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-30 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-72253005 Just out of curiosity, what happens if the deadline for code freeze passes?, i.e tomorrow. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-29 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-72090175 @jkbradley Can I please get a pass or comments on this? Or maybe others who are familiar with the tree code (@mengxr ) --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-71724940 ping @jkbradley . Would be great if you could have a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4231 [SPARK-3381] [MLlib] Eliminate bins for unordered features For unordered features, it is sufficient to use splits since the threshold of the split corresponds to the the threshold

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-71779796 It also returns empty bins, just to be compatible with the present API. Hopefully that's not a problem. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4231#discussion_r23665261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/TreePoint.scala --- @@ -96,14 +96,12 @@ private[tree] object TreePoint { * Find bin

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-26 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71584396 @mengxr This can also be viewd as a bugfix which prevents overwriting of the param `subSamplingRate`, which was hardcoded to 1.0 --- If your project is set up

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71335512 ping @jkbradley Could you please have a final look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71349920 @jkbradley Fixed. I can haz merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70982719 @jkbradley Thanks for the tip. Fixed. Anything more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70983304 Repushed after fixing the style checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4073#discussion_r23250726 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala --- @@ -196,6 +196,24 @@ class RandomForestSuite extends FunSuite

[GitHub] spark pull request: [SPARK-5022] [SQL] [MLlib] Change VectorUDT to...

2015-01-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70788974 Thanks for the comment. I meant that now udt is a ClassTag. If udt were to be an Object or an instance of a class, then I'm not sure if it possible

[GitHub] spark pull request: [SPARK-5022] [SQL] [MLlib] Change VectorUDT to...

2015-01-20 Thread MechCoder
Github user MechCoder closed the pull request at: https://github.com/apache/spark/pull/4099 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5022] [SQL] [MLlib] Change VectorUDT to...

2015-01-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70789550 Alright, closing could you please mark the issue as fixed? Sad that I've had to close two Pull Request's for the same reasons. Hopefully I'll find something

[GitHub] spark pull request: [SPARK-5022] [SQL] [MLlib] Change VectorUDT to...

2015-01-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70717091 @jkbradley Thanks for the clarification. I gave a few attempts, but I don't think I'm doing it the right way. This is what you mean right? Presently

[GitHub] spark pull request: [SPARK-5022] [SQL] [MLlib] Change VectorUDT to...

2015-01-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70458668 @jkbradley I'm also not sure how the `SQLUserDefinedType` is supposed to work. Do you mind giving me a short note? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70487065 @mengxr The other PR that you refer to implements sorting, this however implements O(size) check for sparse vectors where size is typically really small (It does

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70538422 @mengxr @jkbradley Any more comments? Sorry for spamming, but I would like to work on other issues related to GBRT and RandomForests as well. --- If your project

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-19 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70537775 Alright, closing please mark the issue as resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-19 Thread MechCoder
Github user MechCoder closed the pull request at: https://github.com/apache/spark/pull/4096 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4099 [SPARK-5022] [Sql] Change VectorUDT to object You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark spark-5022 Alternatively you

[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70450637 cc @rxin I am unable to understand how to change this line `@SQLUserDefinedType(udt = classOf[VectorUDT])` . I tried doing `@SQLUserDefinedType(udt

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70441989 Alright, but maybe the documentation can be updated that the indices should be non-negative? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-5257] SparseVector indices must be non-...

2015-01-18 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4096 [SPARK-5257] SparseVector indices must be non-negative Scala afaik does not support negative indexing. And fail early if negative indices are used. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-5257] SparseVector indices must be non-...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70410393 @davies @mengxr This is a minor PR. Can you please review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70420474 Sorry for the phony Jenkins comments. Is there a way I can turn it off, until I explicitly turn it on, --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70421943 I am extremely sorry for the phony comments due to Jenkins. Is there any way to test it only when asked? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5257] SparseVector indices must be non-...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70414610 Hmm.. If I do that, would it seem odd, if `SparseVector` in the PythonApi allows negative indexing, but not negative indices in the constructor? I implemented

[GitHub] spark pull request: [SPARK-5257] SparseVector indices must be non-...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70414854 Thinking again, negative indexing makes sense, but not supplying an array of negative integers as indices. Let me know what you think about it. --- If your project

[GitHub] spark pull request: [SPARK-5257] SparseVector indices must be non-...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70415840 @davies Thanks. I've fixed it up. Anything more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70369703 Could you please tell me what is the preferred way to generate random data in spark? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70375368 @jkbradley I've added a test according to the other tests in the `RandomForestSuite` . Let me know if there is anything left. --- If your project is set up

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4073 [SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0 I've added support for sampling_rate not equal to 1.0 . I have two major questions. 1. A Scala style test is failing, since

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70283608 @jkbradley @mengxr it would be great if you could have a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70297536 I've made changes such that this not break anything, i.e everything is backward compat. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70300939 @jkbradley, the issue is that the function `train` has more than 10 args. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70302451 @jkbradley Oops, the comments got deleted somehow. I meant that this is because there are 10 arguments in `trainClassifier` and `trainRegressor` --- If your project

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder closed the pull request at: https://github.com/apache/spark/pull/4073 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70304290 Oh well, but still if I'm not mistaken, the `subSamplingRate` is overriden by the condition `numTrees 1`. This should not be the case as having a lower sampling

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
GitHub user MechCoder reopened a pull request: https://github.com/apache/spark/pull/4073 [SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0 I've added support for sampling_rate not equal to 1.0 . I have two major questions. 1. A Scala style test is failing, since

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-16 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70306545 Thanks, Also a design decision, is it worthy enough to add this as an option to `train` given that it is now within the style limit? --- If your project is set up

<    4   5   6   7   8   9   10   >