[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127553451 --- Diff: python/pyspark/ml/tests.py --- @@ -1229,11 +1229,30 @@ def test_output_columns(self): (2.0

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127551356 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +326,11 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127551019 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127550735 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127553419 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,50 @@ class OneVsRestSuite extends

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127550604 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127550407 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r127550217 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 Taking a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r127548975 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/ValidatorParamsSuiteHelpers.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18618 Thanks @HyukjinKwon ! I'm still in favor of adding this, partly to match Scala and partly to have API docs for it. I just had one question: Is there a reason fieldNames should return

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18428 Rats, one more thing: We need to use relative paths, not absolute ones, when we put paths in the persisted file. Could you please add a unit test which checks this, perhaps by saving a model

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r127057083 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -183,8 +198,15 @@ private[ml] object ValidatorParams

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18428 Also, can you please add "OneVsRest" to the PR and JIRA titles since this touches that class? --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r126849117 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -183,8 +198,14 @@ private[ml] object ValidatorParams

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18428 LGTM I couldn't think of a great way to reduce code duplication between JavaWrapper and OneVsRest. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176264 --- Diff: python/pyspark/ml/tests.py --- @@ -681,6 +682,76 @@ def test_save_load(self): self.assertEqual(loadedLrModel.uid, lrModel.uid

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176348 --- Diff: python/pyspark/ml/tuning.py --- @@ -263,8 +301,60 @@ def copy(self, extra=None): newCV.setEvaluator(self.getEvaluator().copy

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176320 --- Diff: python/pyspark/ml/tuning.py --- @@ -137,8 +140,43 @@ def getEvaluator(self): """ return se

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176253 --- Diff: python/pyspark/ml/classification.py --- @@ -1646,6 +1674,15 @@ class OneVsRestModel(Model, OneVsRestParams, MLReadable, MLWritable

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175867 --- Diff: python/pyspark/ml/classification.py --- @@ -1630,8 +1614,52 @@ def _to_java(self): _java_obj.setPredictionCol(self.getPredictionCol

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175232 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -134,6 +134,59 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176209 --- Diff: python/pyspark/ml/classification.py --- @@ -1630,8 +1614,52 @@ def _to_java(self): _java_obj.setPredictionCol(self.getPredictionCol

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176344 --- Diff: python/pyspark/ml/tuning.py --- @@ -263,8 +301,60 @@ def copy(self, extra=None): newCV.setEvaluator(self.getEvaluator().copy

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175799 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,52 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125176362 --- Diff: python/pyspark/ml/wrapper.py --- @@ -111,7 +111,14 @@ def _make_java_param_pair(self, param, value): sc = SparkContext

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175225 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,52 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175803 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,52 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-07-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r125175405 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -160,8 +213,21 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124184937 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -126,10 +126,22 @@ private[ml] object ValidatorParams

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124161422 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,46 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124185775 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -183,8 +195,14 @@ private[ml] object ValidatorParams

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124185314 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -126,10 +126,22 @@ private[ml] object ValidatorParams

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124161463 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,46 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124180242 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -136,6 +136,29 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124168033 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,46 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124161468 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,46 @@ class CrossValidatorSuite

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124185896 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -126,10 +126,22 @@ private[ml] object ValidatorParams

[GitHub] spark pull request #18428: [Spark-21221][ML] CrossValidator and TrainValidat...

2017-06-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18428#discussion_r124161588 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -156,6 +156,46 @@ class CrossValidatorSuite

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-06-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 Catching up here, it sounds like the current recommendations (which I'm on board with) are to: * Switch to Futures, including using sameThreadExecutor for the case of parallelism=1 * Try

spark git commit: [SPARK-20929][ML] LinearSVC should use its own threshold param

2017-06-20 Thread jkbradley
ies to rawPrediction instead of probability. This PR changes the param in the Scala, Python and R APIs. ## How was this patch tested? New unit test to make sure the threshold can be set to any Double value. Author: Joseph K. Bradley <jos...@databricks.com> Closes #18151 from jkbradley/ml-2.2-

spark git commit: [SPARK-20929][ML] LinearSVC should use its own threshold param

2017-06-20 Thread jkbradley
ies to rawPrediction instead of probability. This PR changes the param in the Scala, Python and R APIs. ## How was this patch tested? New unit test to make sure the threshold can be set to any Double value. Author: Joseph K. Bradley <jos...@databricks.com> Closes #18151 from jkbradley/ml-2.2-

[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

2017-06-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18151 Merging with master, branch-2.2 Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-06-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r122524554 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,37 @@ class OneVsRestSuite extends

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-06-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 @BryanCutler Thanks for the thoughts! I didn't see a response w.r.t. putting parallelism in a trait, so I'll say we won't do it for now. The usage of par collections / Futures in OneVsRest vs

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-06-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r122524395 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +350,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-06-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r122524220 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +350,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-06-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r122523766 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +350,13 @@ final class OneVsRest @Since("

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-06-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 One comment about putting parallelism in a trait vs. not: Would we want to avoid creating a "threadpool" when parallelism = 1? In that (common) case, maybe we'd want to avoid par c

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-06-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 You're right about Scala being an issue. This actually works with Scala 2.10 and 2.11 but not 2.12, in which Scala drops its own ForkJoinPool in favor of the java one. As long as we drop 2.10

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121836808 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +343,13 @@ final class OneVsRest @Since("

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-06-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 I agree; it'd be good to match on the Param name. Do you think "parallelism" is too vague? If not, then I like it since it's simple. I'd vote for default parallelism of 1

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121735491 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,40 @@ class OneVsRestSuite extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121736014 --- Diff: python/pyspark/ml/classification.py --- @@ -1510,21 +1511,26 @@ class OneVsRest(Estimator, OneVsRestParams, MLReadable, MLWritable

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121740343 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -325,8 +343,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121733736 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -65,6 +67,12 @@ private[ml] trait OneVsRestParams extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121734558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -283,6 +295,12 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121737179 --- Diff: python/pyspark/ml/classification.py --- @@ -1560,14 +1566,27 @@ def trainSingleClass(index

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121736656 --- Diff: python/pyspark/ml/classification.py --- @@ -1560,14 +1566,27 @@ def trainSingleClass(index

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121735870 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,40 @@ class OneVsRestSuite extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121735342 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,40 @@ class OneVsRestSuite extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121735271 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,40 @@ class OneVsRestSuite extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121737455 --- Diff: python/pyspark/ml/tests.py --- @@ -1229,7 +1229,35 @@ def test_output_columns(self): (2.0

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121734092 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -65,6 +67,12 @@ private[ml] trait OneVsRestParams extends

[GitHub] spark pull request #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tuna...

2017-06-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r121737343 --- Diff: python/pyspark/ml/tests.py --- @@ -1229,7 +1229,35 @@ def test_output_columns(self): (2.0

[GitHub] spark issue #18281: [SPARK-21027][SPARK-21028][ML][PYTHON] Added tunable par...

2017-06-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 taking a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18281: Added tunable parallelism to one vs. rest in pyspark

2017-06-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
es #18265 from jkbradley/word2vec-save-fix. (cherry picked from commit ff318c0d2f283c3f46491f229f82d93714da40c7) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48a8

spark git commit: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
8265 from jkbradley/word2vec-save-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff318c0d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff318c0d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff318c0d Bra

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 I'm going to call this ready...but please say if you see other fixes I should make. Thanks! Merging with master and branch-2.2 --- If your project is set up for it, you can reply

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 @Krimit Thanks for taking a look! Does it look ready to merge now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 looks like a spurious failure, retesting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 Yep, someone hit the bug! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315755 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315677 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315648 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -355,9 +364,12 @@ object Word2VecModel extends MLReadable[Word2VecModel

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 CC @Krimit and @srowen who had worked on the previous related patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-10 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/18265 [SPARK-21050][ML] Word2vec persistence overflow bug fix ## What changes were proposed in this pull request? The method calculateNumberOfPartitions() uses Int, not Long (unlike the MLlib

[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18256 Thanks! LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

2017-05-31 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18151 So...good thing you asked for the test b/c transform() wasn't going through the corrected code path. Another bit of evidence that the Prediction APIs don't generalize that well... --- If your

[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

2017-05-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18151#discussion_r119275003 --- Diff: R/pkg/R/mllib_classification.R --- @@ -62,7 +62,7 @@ setClass("NaiveBayesModel", representation(jo

[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

2017-05-30 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18151 CC @mlnick @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

2017-05-30 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/18151 [SPARK-20929][ML] LinearSVC should use its own threshold param ## What changes were proposed in this pull request? LinearSVC should use its own threshold param, rather than the shared

[GitHub] spark issue #18085: [SPARK-20631][FOLLOW-UP] Fix incorrect tests.

2017-05-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18085 Thanks for fixing this! Just curious: did you figure out why the test was working before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17891: [SPARK-20631][PYTHON][ML] LogisticRegression._che...

2017-05-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17891#discussion_r118175057 --- Diff: python/pyspark/ml/tests.py --- @@ -807,6 +807,18 @@ def test_logistic_regression(self): except OSError: pass

spark git commit: [SPARK-20861][ML][PYTHON] Delegate looping over paramMaps to estimators

2017-05-23 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.2 d20c64695 -> 00dee3902 [SPARK-20861][ML][PYTHON] Delegate looping over paramMaps to estimators Changes: pyspark.ml Estimators can take either a list of param maps or a dict of params. This change allows the CrossValidator and

spark git commit: [SPARK-20861][ML][PYTHON] Delegate looping over paramMaps to estimators

2017-05-23 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4816c2ef5 -> 9434280cf [SPARK-20861][ML][PYTHON] Delegate looping over paramMaps to estimators Changes: pyspark.ml Estimators can take either a list of param maps or a dict of params. This change allows the CrossValidator and

[GitHub] spark issue #18077: [SPARK-20861][ML][PYTHON] Delegate looping over paramMap...

2017-05-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18077 Merging with master and branch-2.2 which means this will get into 2.2.0 Thanks for the quick fix! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18077: [SPARK-20861] Delegate looping over paramMaps to estimat...

2017-05-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18077 Other than the tags, this LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18077: [SPARK-20861] Delegate looping over paramMaps to estimat...

2017-05-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18077 @MrBago Can you please add the tags "[ML][PYTHON]" to the title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark issue #17946: [SPARK-20707] [ML] ML deprecated APIs should be removed ...

2017-05-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17946 LGTM Thanks for doing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

spark git commit: [SPARK-20501][ML] ML 2.2 QA: New Scala APIs, docs

2017-05-15 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master d4022d495 -> dbe81633a [SPARK-20501][ML] ML 2.2 QA: New Scala APIs, docs ## What changes were proposed in this pull request? Review new Scala APIs introduced in 2.2. ## How was this patch tested? Existing tests. Author: Yanbo Liang

spark git commit: [SPARK-20501][ML] ML 2.2 QA: New Scala APIs, docs

2017-05-15 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.2 a869e8bfd -> 57c87cf2d [SPARK-20501][ML] ML 2.2 QA: New Scala APIs, docs ## What changes were proposed in this pull request? Review new Scala APIs introduced in 2.2. ## How was this patch tested? Existing tests. Author: Yanbo Liang

[GitHub] spark issue #17934: [SPARK-20501] [ML] ML 2.2 QA: New Scala APIs, docs

2017-05-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17934 LGTM I'll merge this with master and branch-2.2 Thanks all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17934: [SPARK-20501] [ML] ML 2.2 QA: New Scala APIs, doc...

2017-05-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17934#discussion_r116631097 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -146,7 +146,7 @@ object StringIndexer extends

[GitHub] spark issue #17829: [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regress...

2017-05-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17829 Awesome, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17944: Revert "[SPARK-20606][ML] ML 2.2 QA: Remove deprecated m...

2017-05-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17944 Thanks a lot @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-05-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r116614596 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-05-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r116596231 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable

[GitHub] spark issue #17867: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods f...

2017-05-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17867 I actually think @srowen has a good point that we should maintain more stable minor releases. I'd be in support of reverting this patch and changing the deprecation comments to say the items

<    5   6   7   8   9   10   11   12   13   14   >