GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/7029
[SPARK-7739][MLlib] Improve ChiSqSelector example code in user guide
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sethah/spark
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/8112
[SPARK-8971][MLLIB][ML] Support balanced class labels when splitting
train/cross validation sets
I'm leaving a few comments about some of the design choices made in this PR.
- both train
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7655#issuecomment-126143616
@mengxr I updated the guide with your suggestions. I corrected the
capitalization scheme and other things you listed. Let me know if you find
anything else
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35829175
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -0,0 +1,1476 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7655#issuecomment-126513461
@mengxr I have created the
[JIRA](https://issues.apache.org/jira/browse/SPARK-9490). I may get a chance to
fix that sometime next week.
---
If your project is set up
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35663441
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -0,0 +1,1475 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35829163
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -0,0 +1,1476 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35829230
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -0,0 +1,1476 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35587655
--- Diff: docs/mllib-metrics.md ---
@@ -0,0 +1,1464 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href=mllib
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35587587
--- Diff: docs/mllib-metrics.md ---
@@ -0,0 +1,1464 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: a href=mllib
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35587545
--- Diff: docs/mllib-guide.md ---
@@ -48,6 +48,7 @@ This lists functionality included in `spark.mllib`, the
main MLlib API.
* [Feature extraction
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7655#issuecomment-125345480
Sean,
I added a bit of background on things like TP, FP, precision, recall, ROC,
etc... to the guide. I tried to explain the base concepts for classification
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7655#issuecomment-124870109
Sean,
Thanks for your feedback. I agree that more intuitive descriptions will be
helpful, and so I will work on getting those into the document. One thing I'd
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/7655
[SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sethah/spark Working_on_6129
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42790961
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -168,6 +191,28 @@ object QueryTest {
return None
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889596
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889558
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889941
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889772
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889712
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42793409
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42820025
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42677457
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-151267064
I guess this is still failing HiveComparisonTest due to small error in the
variance.
[info] !== HIVE - 1 row(s
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r43413026
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
---
@@ -991,3 +991,73 @@ case class StddevFunction
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-151211786
@mengxr Corrected `variance` to yield the population variance. Tests should
pass now.
---
If your project is set up for it, you can reply to this email and have your
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42923519
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,332 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42923553
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,330 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42923579
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,330 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42706392
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends QueryTest
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9380#discussion_r43662570
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -1135,7 +992,76 @@ abstract class
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/9474
[SPARK-10788][MLLIB][ML] Remove duplicate bins for decision trees
Decision trees in spark.ml (RandomForest.scala) communicate twice as much
data as needed for unordered categorical features. Here's
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9474#issuecomment-153889541
cc @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9474#issuecomment-154259011
test this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-149260063
@mengxr I am working on it, and I have incorporated changes from the note
you posted on the [Jira](https://issues.apache.org/jira/browse/SPARK-10641) -
thanks
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-149424324
@mengxr I think having `VarianceSamp` inherit from `Variance` will be a
fine solution. I'm not clear on why it is better not to touch `InternalRow` in
the subclasses
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42454065
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42453520
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42453557
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584406
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584417
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584440
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584447
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,302 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584819
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-149788583
@mengxr Addressed all comments and rearranged the class inheritances for
case classes. I made another abstract class `SecondMoment` which all `Variance`
classes inherit
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42584878
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9008#discussion_r41573997
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1211,4 +1212,34 @@ private[ml] object RandomForest extends Logging
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-146005754
A few notes:
* I wrote this implementation before stddev was merged, and hence I ended
up with a slightly different implementation. The differences are mostly
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/9003
[SPARK-10641][WIP][SQL] Add Skewness and Kurtosis Support
Implementing skewness and kurtosis support based on following algorithm:
https://en.wikipedia.org/wiki
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41324423
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -88,6 +88,276 @@ case class Average(child
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41324846
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
---
@@ -734,10 +750,30 @@ class SQLQuerySuite extends QueryTest
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41324803
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends QueryTest
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41325055
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -88,6 +88,276 @@ case class Average(child
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9008#discussion_r41597118
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -87,8 +86,10 @@ private[ml] object RandomForest extends Logging
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9008#discussion_r41648955
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1211,4 +1213,28 @@ private[ml] object RandomForest extends Logging
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7884#issuecomment-136408857
Just a thought I had while looking over this PR: It makes more sense to me
to refer to the sample weights as `sampleWeight` instead of `weight`, since the
regression
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/7884#issuecomment-136858157
Ah, I missed that, my apologies. That's a bit unfortunate!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8675#discussion_r39110298
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -166,6 +167,7 @@ private[ml] object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8675#discussion_r39110354
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -145,6 +145,10 @@ abstract class PredictionModel[FeaturesType, M <:
PredictionMo
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/8675
[SPARK-9715][ML] Store numFeatures in all ML PredictionModel types
All prediction models should store `numFeatures` indicating the number of
features the model was trained on. Default value of -1
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8675#discussion_r39298668
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -166,6 +167,7 @@ private[ml] object
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8675#issuecomment-140489654
@feynmanliang One thing I'm curious about is if this would still be a
problem if all the constructors were private to ml? Right now, GBTs are the
only one
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8675#issuecomment-140926689
@jkbradley Thanks for the feedback. I made the changes noted above and
fixed the GBT constructors. Let me know if you see anything else.
---
If your project is set up
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8675#discussion_r39698374
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -175,6 +177,14 @@ final class GBTClassificationModel
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8675#discussion_r39798759
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -167,7 +168,8 @@ object GBTClassifier {
final class
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-139362995
@mengxr this has been idle for a while. Will you have a chance to review it?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339870
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -267,6 +268,26 @@ object MLUtils
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339841
--- Diff:
core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala
---
@@ -216,6 +216,35 @@ private[spark] object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339926
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339580
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339724
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -80,7 +96,18 @@ class CrossValidator(override val uid: String) extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339822
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339748
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-142979645
@dusenberrymw Thanks for the feedback. I have addressed each of your
comments. Let me know if you see anything else.
---
If your project is set up for it, you can reply
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8675#issuecomment-142365103
@jkbradley Fixed the merge conflicts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8883#discussion_r40399684
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -129,12 +129,12 @@ abstract class ClassificationModel[FeaturesType
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9524#discussion_r46749563
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -189,6 +190,18 @@ final class GBTClassificationModel private[ml
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/10231
[SPARK-12182][ML] Distributed binning for trees in spark.ml
This PR changes the `findSplits` method in spark.ml to perform split
calculations on the workers. This PR is meant to copy
[PR-8246
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10231#issuecomment-163419168
@NathanHowell would you be able to review this?
cc @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10231#issuecomment-163430383
This JIRA was actually created as a blocker JIRA for
[SPARK-12183](https://issues.apache.org/jira/browse/SPARK-12183) which is for
removing the MLlib code entirely
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10231#discussion_r47165377
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10306#discussion_r47819551
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -250,114 +240,142 @@ class KMeans private
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10306#discussion_r47820060
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -250,114 +240,142 @@ class KMeans private
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r47530009
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -94,7 +110,7 @@ private[ml] class WeightedLeastSquares
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r47530361
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10231#discussion_r47382165
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933061
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -275,6 +274,40 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933490
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -275,6 +274,40 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933679
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -100,6 +101,40 @@ class GBTClassifierSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47932753
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -275,6 +274,40 @@ class
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165517939
I assume since this is a WIP you are still going to add test cases for the
other predictors? Additionally, since ShortType and DecimalType also extend
NumericType, I
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r47530742
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala
---
@@ -43,6 +44,18 @@ class WeightedLeastSquaresSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r47534238
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9581#discussion_r48774100
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -247,7 +248,27 @@ def _set(self, **kwargs):
Sets user-supplied params
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10231#issuecomment-168845446
@NathanHowell do you think you'll have any time to take a look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10231#issuecomment-169395890
@NathanHowell Thank you for reviewing!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10639#discussion_r49144625
--- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/GLMFamilies.scala
---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10639#discussion_r49144794
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10639#issuecomment-169848523
@yanboliang Could you post a link to a reference paper? I find
documentation on IRLS scattered, so it would be nice to have something concrete
to point
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10639#discussion_r49226861
--- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/GLMFamilies.scala
---
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation
1 - 100 of 1851 matches
Mail list logo