[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2017-05-18 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13959 I don't understand. If you don't have time to review that is fine (I've been there too), but there is no need to close a PR due to unavailability of comitters. One of the reasons, that I

[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-14 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/17621 Thanks @MLnick ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiSto...

2017-02-27 Thread MechCoder
Github user MechCoder closed the pull request at: https://github.com/apache/spark/pull/14273 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #7963: [SPARK-6227] [MLlib] [PySpark] Implement PySpark wrappers...

2016-10-11 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/7963 Thanks for the reviews @holdenk . Unfortunately I will not be able to work on this anytime soon. Feel free to cherry-pick the commits, (if you wish) --- If your project is set up for it, you can

[GitHub] spark issue #14640: [SPARK-17055] [MLLIB] add labelKFold to CrossValidator

2016-08-23 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14640 Just FYI, we plan to rename "LabelKFold" to "GroupKFold" in the next version of sklearn as a label can mean several things. (including the target label) --- If yo

[GitHub] spark issue #13650: [SPARK-9623] [ML] Provide conditional variance for Rando...

2016-08-22 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13650 @yanboliang Sorry for the wrong delay! Hope you are still here. 1. The term variance in predictions is ambiguous and a bit misleading. Let us say that we have the original data generating

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

2016-08-19 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14579#discussion_r75567101 --- Diff: python/pyspark/rdd.py --- @@ -188,6 +188,12 @@ def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSeri

[GitHub] spark issue #12790: [SPARK-15018][PYSPARK][ML] Improve handling of PySpark P...

2016-08-19 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12790 Yes, I agree that allowing steps to be an empty Sequence or a list in a Pipeline is non-intuitive but I'm fine with allowing that corner case. --- If your project is set up for it, you can reply

[GitHub] spark issue #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing error if Py...

2016-08-18 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12790 Awesome! Thansk! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing erro...

2016-08-18 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12790#discussion_r75390810 --- Diff: python/pyspark/status.py --- @@ -83,6 +85,8 @@ def getJobInfo(self, jobId): job = self._jtracker.getJobInfo(jobId

[GitHub] spark issue #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing error if Py...

2016-08-18 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12790 If that's the case then the piece of documentation that promises the Pipeline to behave as an identity transformer when no stages are used, has to be changed (removed). --- If your project

[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...

2016-08-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14653#discussion_r75230698 --- Diff: python/pyspark/ml/wrapper.py --- @@ -243,7 +240,7 @@ def __init__(self, java_model=None): """ Initializ

[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

2016-08-17 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13036 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...

2016-08-17 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14653 Should we start having `PredictorParams` -> (HasLabelCol, HasFeaturesCol, HasPredictionCol) `ClassifierParams` -> (HasRawPredictionCol) as done in the Scal

[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...

2016-08-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14653#discussion_r75228035 --- Diff: python/pyspark/ml/classification.py --- @@ -59,6 +59,16 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti

[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...

2016-08-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14653#discussion_r75227783 --- Diff: python/pyspark/ml/classification.py --- @@ -59,6 +59,16 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti

[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...

2016-08-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14653#discussion_r75225024 --- Diff: python/pyspark/ml/classification.py --- @@ -59,6 +59,16 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti

[GitHub] spark issue #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing error if Py...

2016-08-16 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12790 LGTM: cc @yanboliang @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing erro...

2016-08-16 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12790#discussion_r75024959 --- Diff: python/pyspark/ml/pipeline.py --- @@ -57,9 +57,8 @@ def __init__(self, stages=None): """ __init__(se

[GitHub] spark pull request #12790: [SPARK-15018][PYSPARK][ML] Fixed bug causing erro...

2016-08-16 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12790#discussion_r75023876 --- Diff: python/pyspark/ml/tests.py --- @@ -230,6 +230,15 @@ def test_pipeline(self): self.assertEqual(5, transformer3.dataset_index

[GitHub] spark pull request #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark acc...

2016-08-15 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14467#discussion_r74845665 --- Diff: python/pyspark/context.py --- @@ -173,9 +173,8 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

2016-08-15 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14579#discussion_r74813935 --- Diff: python/pyspark/rdd.py --- @@ -188,6 +188,12 @@ def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSeri

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

2016-08-15 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/14579#discussion_r74811199 --- Diff: python/pyspark/rdd.py --- @@ -188,6 +188,12 @@ def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSeri

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2016-08-12 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14273 bump? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12889: [SPARK-15113][PySpark][ML] Add missing num featur...

2016-08-08 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12889#discussion_r73947457 --- Diff: python/pyspark/ml/classification.py --- @@ -44,6 +44,23 @@ @inherit_doc +class JavaClassificationModel(JavaPredictionModel

[GitHub] spark issue #12983: [SPARK-15213][PySpark] Unify 'range' usages

2016-08-03 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12983 In sklearn, we use `sklearn.six.moves` which makes `range` and `xrange` to be used interchangeably. In Python3, both `range` and `xrange` would return a `range` instance and in Py2, both `xrange

[GitHub] spark pull request #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose pote...

2016-07-29 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13571#discussion_r72870886 --- Diff: python/pyspark/sql/functions.py --- @@ -1731,13 +1749,115 @@ def sort_array(col, asc=True): # User

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2016-07-28 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14273 @jkbradley Would you be able to have a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #12889: [SPARK-15113][PySpark][ML] Add missing num featur...

2016-07-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12889#discussion_r71629288 --- Diff: python/pyspark/ml/classification.py --- @@ -581,8 +602,11 @@ def _create_model(self, java_model): @inherit_doc -class

[GitHub] spark issue #12889: [SPARK-15113][PySpark][ML] Add missing num features num ...

2016-07-20 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12889 Just `LinearRegressionModel` seems missing to me. LGTM otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r71615972 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r71615183 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -692,14 +692,20 @@ private[spark] object RandomForest extends

[GitHub] spark issue #12374: [SPARK-14610][ML] Remove superfluous split for continuou...

2016-07-20 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12374 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-20 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r71615061 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...

2016-07-20 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13248 Can you please reopen the pull request across the spark master branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2016-07-19 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14273 ping @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiSto...

2016-07-19 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/14273 [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch ## What changes were proposed in this pull request? Builds upon the work done by @hhbyyh in https://github.com/apache/spark/pull

[GitHub] spark issue #7871: [SPARK-9140][MLlib] Replace TimeTracker by Stopwatch

2016-07-19 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/7871 @hhbyyh What is your opinion about renaming `addLocal` to `addOrGetLocal` which returns a local stopwatch if it already exists? That should solve your concerns. --- If your project is set up

[GitHub] spark issue #7871: [SPARK-9140][MLlib] Replace TimeTracker by Stopwatch

2016-07-19 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/7871 @hhbyyh What is your opinion about adding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12374: [SPARK-14610][ML] Remove superfluous split for continuou...

2016-07-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12374 Outside of this PR, I would like to either: 1. Update the documentation of `findSplitsForContinuousFeature` to reflect that the return type is an array of thresholds, rather than an array

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69827161 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark issue #12374: [SPARK-14610][ML] Remove superfluous split for continuou...

2016-07-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12374 @sethah Nice catch! This superfluous split seems to be only for continuous features in which the number of unique values - 1 is lesser than or equal to the number of splits. Can you update the PR

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69826097 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69825860 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69825752 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69824592 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -137,14 +137,47 @@ class RandomForestSuite extends

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69824338 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -114,7 +114,7 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #12374: [SPARK-14610][ML] Remove superfluous split for co...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12374#discussion_r69824214 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -712,17 +712,23 @@ private[spark] object RandomForest extends

[GitHub] spark issue #14016: [SPARK-16399] Force PYSPARK_PYTHON to python

2016-07-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14016 I agree with you, I created a new JIRA and renamed the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69770334 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 Thanks @sethah @yanboliang for the reviews!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 @yanboliang Would appreciate it if you could look at https://github.com/apache/spark/pull/13650 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-05 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69660324 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-05 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 OK, that should be it. I removed all the unused variables and imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-05 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 @yanboliang Thanks! Addressed your comments. Let me know if there is anything else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #8013: [SPARK-3181][MLLIB]: Add Robust Regression Algorithm with...

2016-07-05 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/8013 I'll be happy to review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-04 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69419165 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-01 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 I'm slightly in favour of keeping the original test because the impurity is set to "variance" explicitly by the `setImpurity` method, so it's a safe assumption that the `calculate` meth

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69363260 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #14016: [SPARK-15761] [FOLLOWUP] Set DEFAULT_PYTHON to python

2016-07-01 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14016 @srowen fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-07-01 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 @sethah Thank you for your comments. I have addressed them. Do you have anything else? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14016: [SPARK-15761] [FOLLOWUP] Set DEFAULT_PYTHON to python

2016-07-01 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14016 Thanks for clarifying! It might be a good time to get rid of it.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69333966 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69332114 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #14016: [SPARK-15761] [FOLLOWUP] Set DEFAULT_PYTHON to python

2016-07-01 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/14016 ping @srowen @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14016: [SPARK-15761] [FOLLOWUP] Set DEFAULT_PYTHON to py...

2016-07-01 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/14016 [SPARK-15761] [FOLLOWUP] Set DEFAULT_PYTHON to python ## What changes were proposed in this pull request? I would like to change ```bash if hash python2.7 2>/dev/n

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 I would also like to change ```bash if hash python2.7 2>/dev/null; then # Attempt to use Python 2.7, if installed: DEFAULT_PYTHON="python2.7" else D

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 @JoshRosen fixed, thanks! let me know if you need any other changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 bump? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #12983: [SPARK-15213][PySpark] Unify 'range' usages

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12983 I don't really get the difference, could you please explain it to me.. The previous version renamed `range` in `Python3` to `xrange` and this pull request does the same thing by renaming

[GitHub] spark pull request #13997: [SPARK-16328][ML][MLLIB][PYSPARK] Add 'asML' and ...

2016-06-30 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13997#discussion_r69176771 --- Diff: python/pyspark/mllib/linalg/__init__.py --- @@ -1044,6 +1122,28 @@ def toSparse(self): return SparseMatrix(self.numRows

[GitHub] spark issue #13997: [SPARK-16328][ML][MLLIB][PYSPARK] Add 'asML' and 'fromML...

2016-06-30 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13997 LGTM pending nitpicks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13997: [SPARK-16328][ML][MLLIB][PYSPARK] Add 'asML' and ...

2016-06-30 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13997#discussion_r69176457 --- Diff: python/pyspark/mllib/linalg/__init__.py --- @@ -846,6 +890,33 @@ def dense(*elements): return DenseVector(elements

[GitHub] spark pull request #13650: [SPARK-9623] [ML] Provide variance for RandomFore...

2016-06-29 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13650#discussion_r69039559 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala --- @@ -105,6 +108,55 @@ class RandomForestRegressorSuite

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-06-29 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/13981 [SPARK-16307] [ML] Add test to verify the predicted variances of a DT on toy data ## What changes were proposed in this pull request? The current tests assumes that `impurity.calculate

[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...

2016-06-29 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13981 @yanboliang Could you have a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13650: [SPARK-9623] [ML] Provide variance for RandomFore...

2016-06-29 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13650#discussion_r68985017 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala --- @@ -168,15 +173,39 @@ class RandomForestRegressionModel

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2016-06-29 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13959 The test failure is just due to binary incompatibility. I can fix those once we decide that the current PR is the way to proceed. --- If your project is set up for it, you can reply

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2016-06-28 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13959 @jkbradley @sethah Please have a look when free! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplit...

2016-06-28 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/13959 [SPARK-14351] [MLlib] [ML] Optimize findBestSplits method for decision trees (and random forest) ## What changes were proposed in this pull request? The current `findBestSplits` method

[GitHub] spark issue #7963: [SPARK-6227] [MLlib] [PySpark] Implement PySpark wrappers...

2016-06-13 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/7963 Bump? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13650: [SPARK-9623] [ML] Provide variance for RandomForestRegre...

2016-06-13 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13650 cc: @yanboliang @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13650: [SPARK-9623] [ML] Provide variance for RandomFore...

2016-06-13 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/13650 [SPARK-9623] [ML] Provide variance for RandomForestRegressor predictions ## What changes were proposed in this pull request? It is useful to get the variance of predictions from

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2016-06-07 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13493 lgtm cc: @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13540 LGTM as well. pending the nitpick by @BryanCutler Not related, but it's been a while since I hacked on Spark or PySpark but at some point do we need better docs for PySpark? I couldn't

[GitHub] spark issue #12370: [SPARK-14599][ML] BaggedPoint should support sample weig...

2016-06-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/12370 Should be there a sanity check providing input RDD of instance objects and `extractSampleWeight` as callable that just returns the weight for each instance? --- If your project

[GitHub] spark pull request #12370: [SPARK-14599][ML] BaggedPoint should support samp...

2016-06-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/12370#discussion_r65994490 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -33,13 +33,20 @@ import org.apache.spark.util.random.XORShiftRandom

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-06 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 Merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-04 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13503 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...

2016-06-03 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13248 @praveendareddy21 Just made a first pass. Also please run PEP8 on your code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794904 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794836 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794779 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794427 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794325 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794293 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794126 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65794056 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-03 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65793951 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

  1   2   3   4   5   6   7   8   9   10   >