Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87116350
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,134 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87116745
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87116485
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15074
@dbtsai and @holdenk have a good point about the equality being a actually
an equivalence check (we want to check that two matrices have the same content,
not that two python objects have
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r87029278
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -109,10 +118,89 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r87025609
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -109,10 +118,89 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r87025222
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -109,10 +118,89 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r87025073
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -109,10 +118,89 @@ class NaiveBayes @Since("
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15051
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r78610830
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -139,16 +145,32 @@ class KMeansSuite extends SparkFunSuite
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15002
Thank you @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15002#discussion_r77905310
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
---
@@ -258,7 +263,10 @@ object QuantileSummaries
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/15002
[ML][SPARK-17439] Fixing compression issues with approximate quantiles and
adding more tests
## What changes were proposed in this pull request?
This PR build on #14976 and fixes
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15002
cc @clockfly
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/14976#discussion_r77902612
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
---
@@ -59,9 +59,14 @@ class QuantileSummaries
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14976
By the way, you gave some great advice here. Is there a page on the wiki
where we collect all this internal knowledge?
---
If your project is set up for it, you can reply to this email and have
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14976
@srowen LGTM, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/14976#discussion_r77735611
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
---
@@ -236,7 +240,7 @@ object QuantileSummaries
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/14976#discussion_r77734419
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
---
@@ -59,9 +59,14 @@ class QuantileSummaries
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14603
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/14298#discussion_r72137984
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala
---
@@ -0,0 +1,456
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/14298#discussion_r71740068
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala
---
@@ -0,0 +1,456
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14023
This looks good to me (as the original reporter of the issue).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13789#discussion_r67778583
--- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/dataTypes.scala
---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/13176
I will try as well this afternoon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/13256
@Luwein can you please close this pull request? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12836#issuecomment-221930188
That sounds good, we can add `aggregate` later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13139#discussion_r64779485
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,148 @@ regression model and extracting model summary
statistics
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12836#issuecomment-221109113
@NarineK thank you for working on it, this is a great addition to SparkR!
Regarding the API, it is very close to R's `aggregate` function in the
`stats
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13139#discussion_r64248278
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,197 @@ regression model and extracting model summary
statistics
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13194#discussion_r64090534
--- Diff: python/pyspark/ml/evaluation.py ---
@@ -316,17 +335,27 @@ def setParams(self, predictionCol="prediction",
label
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13139#discussion_r64086909
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,148 @@ regression model and extracting model summary
statistics
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13139#discussion_r64086417
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,197 @@ regression model and extracting model summary
statistics
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13139#discussion_r64086020
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,197 @@ regression model and extracting model summary
statistics
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/13198#issuecomment-220677539
It seems to me that there is not much cost (other than us arguing for or
against) and a slightly positive benefit (it is more clear that no expensive
copy occurs
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079535
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079509
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/13087#issuecomment-220659554
@marmbrus +1
I suggest to change the documentation of UDFs instead to underline the
expectation that they be deterministic for the time being.
---
If your
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12782#issuecomment-215917416
@jkbradley this looks to me
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12782#discussion_r61661429
--- Diff: python/pyspark/ml/tests.py ---
@@ -44,8 +44,7 @@
from pyspark import keyword_only
from pyspark.ml import Estimator, Model
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12792#issuecomment-215900920
As discussed in private, kmeans persistence is new for this release, so
breaking the format is not an issue at this point.
LGTM
---
If your project is set
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/12789
[WIP][SPARKR][SPARK-14831]Make the SparkR MLlib API more consistent with
Spark
## What changes were proposed in this pull request?
This is a work in progress, not intended to be merged
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12536#issuecomment-215826014
@BenFradet thanks!
@jkbradley can you make a final pass?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12426#issuecomment-215821456
@shivaram @felixcheung @dongjoon-hyun thank you for your comments on my
first R pull request!
Also, I put a note in the ticket about updating
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12426#discussion_r61347932
--- Diff: R/pkg/R/context.R ---
@@ -226,6 +226,49 @@ setCheckpointDir <- function(sc, dirName) {
invisible(callJMethod(sc, "setCheck
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12536#issuecomment-215183774
@BenFradet it looks like there are some compilation issues
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12663#issuecomment-215091481
A quick look at the source code of scikit-learn shows that it always
reindexes, but it uses some efficient numpy primitive for doing that. I think
assuming an index
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12663#discussion_r61259678
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala
---
@@ -29,4 +69,31 @@ object ClassifierSuite
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12663#discussion_r61259664
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -126,16 +127,16 @@ final class GBTClassifier @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12663#discussion_r61259639
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -62,6 +65,57 @@ abstract class Classifier[
def
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12663#discussion_r61259624
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -62,6 +65,57 @@ abstract class Classifier[
def
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12430#issuecomment-214898522
LGTM.
FYI, there are some corner cases (maps of arrays for example) in which the
`_` shortcut syntax will now do what you want, but it is remote enough
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12711#issuecomment-214895988
@jkbradley LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r61168912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -62,8 +64,9 @@ class DecisionTree private[spark] (private val
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r61168881
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -88,17 +88,30 @@ final class DecisionTreeRegressor
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-214890439
In any case, we should also log the success with
`instrLog.logSuccess(model)`
---
If your project is set up for it, you can reply to this email and have your
reply
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12560#discussion_r61166736
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -395,6 +395,10 @@ class ALS(@Since("1.4.0") override val u
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-214890157
@wangmiao1981 sorry for the delay. I agree with @MLnick that the developer
API of ALS is already brittle enough as it stands. One way to still do it is to
add one
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12426#discussion_r60950867
--- Diff: R/pkg/R/context.R ---
@@ -225,3 +225,26 @@ broadcast <- function(sc, object) {
setCheckpointDir <- function(sc, dirName) {
inv
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12426#discussion_r60950901
--- Diff: R/pkg/NAMESPACE ---
@@ -287,6 +287,7 @@ export("as.DataFrame",
"read.json",
"read.parquet
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12426#discussion_r60950880
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -136,3 +136,8 @@ test_that("sparkJars sparkPackages as comma-separated
st
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12536#issuecomment-213021639
Oh sorry, I had not realized that @jkbradley had asked you to remove the
parameter. @jkbradley there is no performance issue doing that?
---
If your project is set
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12536#issuecomment-213020180
@BenFradet thanks for the PR. I have one comment about performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r60618838
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -610,7 +604,9 @@ private[spark] object RandomForest extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r60618857
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -668,6 +664,7 @@ private[spark] object RandomForest extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r60618828
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -80,6 +80,7 @@ private[spark] object RandomForest extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r60618750
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -94,17 +94,33 @@ final class DecisionTreeClassifier
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12536#discussion_r60618432
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -94,17 +94,33 @@ final class DecisionTreeClassifier
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-213009688
@wangmiao1981 thanks for the PR. Can you see if you could make it work
without having to evaluate the input data?
---
If your project is set up for it, you can
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12560#discussion_r60616196
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -395,13 +395,21 @@ class ALS(@Since("1.4.0") override val u
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12560#discussion_r60615798
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -395,13 +395,21 @@ class ALS(@Since("1.4.0") override val u
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12426#issuecomment-211506352
Forget my last comment, I found the other tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12433#issuecomment-211493984
+1, because we can then add a linter rule that disallows `Await.result`
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12426#issuecomment-211484675
@felixcheung do you have an example I could follow for testing in R?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12432#discussion_r59950837
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -264,6 +264,9 @@ class KMeans @Since("1.5.0") (
overri
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12432#issuecomment-210671834
@keypointt thanks! I have one comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12426#issuecomment-210646935
Some other changes got merged, removing them
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/12426
[SPARK-7264][ML] Parallel lapply for sparkR
## What changes were proposed in this pull request?
This PR adds a new function in SparkR called `sparkLapply(list, function)`.
This function
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12368#issuecomment-210090347
@jkbradley sorry for the delay, I have one comment. Also, you have some
merging issues.
---
If your project is set up for it, you can reply to this email and have
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12368#discussion_r59767952
--- Diff: python/pyspark/ml/pipeline.py ---
@@ -19,6 +19,7 @@
if sys.version > '3':
basestring = str
+unicode = str
---
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12331#issuecomment-209478984
@jkbradley comments addressed, thanks for the review! Regarding the other
messages, I would rather not change them for now, because they are not
instrumentation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12331#discussion_r59557439
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12331#discussion_r59557153
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12331#discussion_r59557115
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12331#discussion_r59557130
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/12331#discussion_r59478228
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -273,6 +273,10 @@ class LogisticRegression @Since
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12331#issuecomment-209124853
cc @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-209081278
I see that the size of the blocks can be tuned and is fairly small by
default (128). Out of curiosity, how did you pick this number, instead of the
full size
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/12331
[SPARK-14568][ML] Instrumentation framework for logistic regression
## What changes were proposed in this pull request?
This adds extra logging information about a `LogisticRegression
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-200023253
@mengxr it looks great, I have no comment
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/11549#issuecomment-199516534
@hhbyyh thanks for the split. I have two small comments. Also, can you
include some tests in `test_mllib.R`? It should be very close to the manual
testing you did
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11549#discussion_r56908061
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/SparkRWrappers.scala
---
@@ -17,15 +17,34 @@
package org.apache.spark.ml.api.r
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11549#discussion_r56907102
--- Diff: R/pkg/R/mllib.R ---
@@ -29,15 +29,10 @@ setClass("PipelineModel", representation(model =
"jobj"))
#' @param formula A
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/11621#issuecomment-198108360
@BryanCutler thanks for the PR, I only have a few small comments
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11802#discussion_r56595542
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -277,7 +277,7 @@ abstract class DStream[T: ClassTag
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11802#discussion_r56595456
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -269,7 +269,7 @@ abstract class DStream[T: ClassTag
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11621#discussion_r56587628
--- Diff: python/pyspark/ml/classification.py ---
@@ -231,6 +232,210 @@ def intercept(self):
"""
return
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11784#discussion_r56597344
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -508,12 +508,10 @@ class LogisticRegressionModel
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/11802#issuecomment-198135365
I had some small comments from my side. The changes in `ml` look good.
Someone from streaming should take over for the rest.
---
If your project is set up
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/11802#discussion_r56595285
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -132,7 +132,7 @@ class StreamingContext private[streaming
101 - 200 of 314 matches
Mail list logo