Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/22136
@huaxingao thank you for your pull request. Can you please add a test to
make sure this does not regress
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r150650409
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r150250118
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r150247999
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r147661505
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r147661396
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r147661078
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/19439
@hhbyyh I recall now the reason for an extra `origin` field, which is to
get around the standard issue of many small image files in S3 or other
distributed file systems. It is standard to compact
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/19439
@hhbyyh regarding the data representation, one could indeed have the each
of the representations being encoded with the proper array information. This
brings some additional complexity
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/19439
@hhbyyh thank you for bringing up these questions. In response to your
questions:
> Does the current schema support or plan to support image feature data in
Floats[] or Doub
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19156#discussion_r137603986
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -109,31 +108,47 @@ object Summarizer extends Logging
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/18798
Thank you @yanboliang.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/18798
@yanboliang do you feel comfortable to merge this PR? I think that all the
questions have been addressed.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r131971123
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,587 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r131970836
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,587 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130742319
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130742836
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130741880
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130742759
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130742524
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130742933
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130743131
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -0,0 +1,619 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130741348
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17419
I am going to close this PR, since this is being taken over by
@WeichenXu123 in #18798 .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user thunterdb closed the pull request at:
https://github.com/apache/spark/pull/17419
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/18798
cc @hvanhovell as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/18798
Thank you for the performance numbers @WeichenXu123 , I have a couple of
comments:
- you say that SQL uses adaptive compaction. How bad is that? I assume it
adds some overhead.
- did
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/18798
@WeichenXu123 thanks! Can you post some performance numbers as well?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/18281#discussion_r121790182
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -325,8 +343,13 @@ final class OneVsRest @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17419#discussion_r109063248
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,746 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17419
I looked a bit deeper into the performance aspect. Here are some quick
insights:
- there was an immediate bottleneck in `VectorUDT`, which boosts the
performance already by 3x
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17419#discussion_r108743634
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -335,4 +335,65 @@ class SummarizerSuite extends SparkFunSuite
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17419
I have added a small perf test to find the performance bottlenecks. Note
that this test works on the worst case (vectors of size 1) from the perspective
of overhead. Here are the numbers I
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17419
@sethah it would have been nice, but I do not think we should merge it this
late into the release cycle.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/17419
[SPARK-19634][ML][WIP] Multivariate summarizer - dataframes API
## What changes were proposed in this pull request?
This patch adds the DataFrames API to the multivariate summarizer
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17108
Tickets created:
- https://issues.apache.org/jira/browse/SPARK-20076
- https://issues.apache.org/jira/browse/SPARK-20077
---
If your project is set up for it, you can reply
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505718
--- Diff:
mllib-local/src/test/scala/org/apache/spark/ml/util/TestingUtils.scala ---
@@ -32,6 +32,10 @@ object TestingUtils {
* the relative
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505201
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505212
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505215
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505185
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r107505180
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/LinalgUtils.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16483
It looks good to me.
cc @jkbradley or @mengxr for final approval
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106746316
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -322,13 +335,12 @@ object PageRank extends Logging {
def
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16483
In addition, this introduces an extra step reduction at each iteration. I
am fine with that since it is for correctness, but @jkbradley may want to
comment as well.
---
If your project is set
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106529377
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -353,9 +365,19 @@ object PageRank extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106532078
--- Diff:
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala ---
@@ -68,26 +69,34 @@ class PageRankSuite extends SparkFunSuite
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106535595
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -322,13 +335,12 @@ object PageRank extends Logging {
def
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106528007
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -162,7 +162,15 @@ object PageRank extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16971#discussion_r106309333
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
---
@@ -245,7 +245,7 @@ object
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17108
I moved the code `Correlations` as suggested. @imatiach-msft , I addressed
your comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106307446
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/stat/StatisticsSuite.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106307517
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Statistics.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106306502
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/stat/StatisticsSuite.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106306385
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Statistics.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106306111
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Statistics.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r106305822
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Statistics.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17110
@jkbradley LGTM, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/17215
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104813864
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/ChiSquared.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104813302
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -237,6 +237,41 @@ class
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104812803
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -50,6 +50,50 @@ trait Impurity extends Serializable
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104812596
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -50,6 +50,50 @@ trait Impurity extends Serializable
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104812468
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -50,6 +50,50 @@ trait Impurity extends Serializable
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r104812484
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -50,6 +50,50 @@ trait Impurity extends Serializable
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/17108#discussion_r103596760
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala
---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software
GitHub user thunterdb opened a pull request:
https://github.com/apache/spark/pull/17108
[SPARK-19636][ML] Feature parity for correlation statistics in MLlib
## What changes were proposed in this pull request?
This patch adds the Dataframes-based support for the correlation
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15770
Note that any of these formats would cause trouble for a graph with high
centrality (lady gaga in the twitter graph). That being said, I do not have a
strong opinion as to which option we pick
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16971
@zhengruifeng thanks for looking into this issue. I have one comment above.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16971#discussion_r102592174
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
---
@@ -78,7 +80,12 @@ object StatFunctions extends Logging
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16971#discussion_r102589719
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -89,18 +89,17 @@ final class DataFrameStatFunctions private
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15770
@wangmiao1981 yes I had seen the discussions there. I believe that
eventually PIC should be moved into graphframes, but we can have a simple API
in `spark.ml` for the time being.
---
If your
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15770
You are right, I had forgotten that for this algorithm, the input is the
edges, and the output is the label for each of the vertices.
This is a tricky algorithm to put as a transformer
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14299
@AnthonyTruchet thank you for the PR. This is definitely worth fixing for
large deployments. Now, as you noticed, this portion of code does not quite
abide by the best engineering practices
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16774
Thanks for working on this task, this is a much requested feature. While it
will work for simple cases in the current shape, it is going to cause some
issues for any complex deployments (Apache
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r101834675
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -121,6 +121,33 @@ class CrossValidatorSuite
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16973
These changes look good to me, but my knowledge of R is very limited.
@mengxr should confirm.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16557
I agree, let's break this PR. It will go faster, and some changes may
require longer discussions.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/16776
Sorry I missed the conversation here. LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15770
@wangmiao1981 thanks a lot! I would be very happy to see that PR in Spark
2.2 and I will gladly help you for that.
---
If your project is set up for it, you can reply to this email and have your
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101666251
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
---
@@ -0,0 +1,153 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101665899
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101664268
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101663790
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101662332
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101662298
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101662273
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101662038
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r101662018
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15826
@yanboliang that looks great, thank you. LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r87267275
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -489,13 +485,14 @@ class LogisticRegression @Since
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15683
+1 for trying to get it into 2.1 (modulo tests)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/15809
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15826#discussion_r87256910
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -110,21 +110,20 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15826#discussion_r87256702
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -226,13 +206,33 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15826#discussion_r87256589
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -226,13 +206,33 @@ class NaiveBayes @Since("
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87116504
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87113033
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87113839
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r87113728
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,149 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
1 - 100 of 314 matches
Mail list logo