[GitHub] spark pull request: [SPARK-7739][MLlib] Improve ChiSqSelector exam...

2015-06-25 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/7029 [SPARK-7739][MLlib] Improve ChiSqSelector example code in user guide You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-08-11 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/8112 [SPARK-8971][MLLIB][ML] Support balanced class labels when splitting train/cross validation sets I'm leaving a few comments about some of the design choices made in this PR. - both train

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-29 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7655#issuecomment-126143616 @mengxr I updated the guide with your suggestions. I corrected the capitalization scheme and other things you listed. Let me know if you find anything else

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-29 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35829175 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -0,0 +1,1476 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-30 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7655#issuecomment-126513461 @mengxr I have created the [JIRA](https://issues.apache.org/jira/browse/SPARK-9490). I may get a chance to fix that sometime next week. --- If your project is set up

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-28 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35663441 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -0,0 +1,1475 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-29 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35829163 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -0,0 +1,1476 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-29 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35829230 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -0,0 +1,1476 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-27 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35587655 --- Diff: docs/mllib-metrics.md --- @@ -0,0 +1,1464 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href=mllib

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-27 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35587587 --- Diff: docs/mllib-metrics.md --- @@ -0,0 +1,1464 @@ +--- +layout: global +title: Evaluation Metrics - MLlib +displayTitle: a href=mllib

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-27 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/7655#discussion_r35587545 --- Diff: docs/mllib-guide.md --- @@ -48,6 +48,7 @@ This lists functionality included in `spark.mllib`, the main MLlib API. * [Feature extraction

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-27 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7655#issuecomment-125345480 Sean, I added a bit of background on things like TP, FP, precision, recall, ROC, etc... to the guide. I tried to explain the base concepts for classification

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-25 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7655#issuecomment-124870109 Sean, Thanks for your feedback. I agree that more intuitive descriptions will be helpful, and so I will work on getting those into the document. One thing I'd

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

2015-07-24 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/7655 [SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark Working_on_6129

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-22 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42790961 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -168,6 +191,28 @@ object QueryTest { return None

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889596 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-22 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42793409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-22 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42820025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-21 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42677457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-26 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-151267064 I guess this is still failing HiveComparisonTest due to small error in the variance. [info] !== HIVE - 1 row(s

[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...

2015-10-29 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r43413026 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -991,3 +991,73 @@ case class StddevFunction

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-26 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-151211786 @mengxr Corrected `variance` to yield the population variance. Tests should pass now. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42923519 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,332 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42923553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,330 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42923579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,330 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-21 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42706392 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9380#discussion_r43662570 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -1135,7 +992,76 @@ abstract class

[GitHub] spark pull request: [SPARK-10788][MLLIB][ML] Remove duplicate bins...

2015-11-04 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/9474 [SPARK-10788][MLLIB][ML] Remove duplicate bins for decision trees Decision trees in spark.ml (RandomForest.scala) communicate twice as much data as needed for unordered categorical features. Here's

[GitHub] spark pull request: [SPARK-10788][MLLIB][ML] Remove duplicate bins...

2015-11-04 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9474#issuecomment-153889541 cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10788][MLLIB][ML] Remove duplicate bins...

2015-11-05 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9474#issuecomment-154259011 test this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-149260063 @mengxr I am working on it, and I have incorporated changes from the note you posted on the [Jira](https://issues.apache.org/jira/browse/SPARK-10641) - thanks

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-149424324 @mengxr I think having `VarianceSamp` inherit from `Variance` will be a fine solution. I'm not clear on why it is better not to touch `InternalRow` in the subclasses

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42454065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42453520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42453557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584406 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584440 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584447 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,302 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584819 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-149788583 @mengxr Addressed all comments and rearranged the class inheritances for case classes. I made another abstract class `SecondMoment` which all `Variance` classes inherit

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-20 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42584878 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...

2015-10-08 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9008#discussion_r41573997 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1211,4 +1212,34 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-146005754 A few notes: * I wrote this implementation before stddev was merged, and hence I ended up with a slightly different implementation. The differences are mostly

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/9003 [SPARK-10641][WIP][SQL] Add Skewness and Kurtosis Support Implementing skewness and kurtosis support based on following algorithm: https://en.wikipedia.org/wiki

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41324423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -88,6 +88,276 @@ case class Average(child

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41324846 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -734,10 +750,30 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41324803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41325055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -88,6 +88,276 @@ case class Average(child

[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...

2015-10-08 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9008#discussion_r41597118 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -87,8 +86,10 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...

2015-10-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9008#discussion_r41648955 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1211,4 +1213,28 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-08-31 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7884#issuecomment-136408857 Just a thought I had while looking over this PR: It makes more sense to me to refer to the sample weights as `sampleWeight` instead of `weight`, since the regression

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-09-01 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/7884#issuecomment-136858157 Ah, I missed that, my apologies. That's a bit unfortunate! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8675#discussion_r39110298 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -166,6 +167,7 @@ private[ml] object

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8675#discussion_r39110354 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -145,6 +145,10 @@ abstract class PredictionModel[FeaturesType, M <: PredictionMo

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-09 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/8675 [SPARK-9715][ML] Store numFeatures in all ML PredictionModel types All prediction models should store `numFeatures` indicating the number of features the model was trained on. Default value of -1

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8675#discussion_r39298668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -166,6 +167,7 @@ private[ml] object

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-15 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8675#issuecomment-140489654 @feynmanliang One thing I'm curious about is if this would still be a problem if all the constructors were private to ml? Right now, GBTs are the only one

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-16 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8675#issuecomment-140926689 @jkbradley Thanks for the feedback. I made the changes noted above and fixed the GBT constructors. Let me know if you see anything else. --- If your project is set up

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-16 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8675#discussion_r39698374 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -175,6 +177,14 @@ final class GBTClassificationModel

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-17 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8675#discussion_r39798759 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -167,7 +168,8 @@ object GBTClassifier { final class

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-10 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-139362995 @mengxr this has been idle for a while. Will you have a chance to review it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339870 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -267,6 +268,26 @@ object MLUtils

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339841 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala --- @@ -216,6 +216,35 @@ private[spark] object

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339926 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339580 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339724 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -80,7 +96,18 @@ class CrossValidator(override val uid: String) extends

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339822 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339748 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-142979645 @dusenberrymw Thanks for the feedback. I have addressed each of your comments. Let me know if you see anything else. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-9715][ML] Store numFeatures in all ML P...

2015-09-22 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8675#issuecomment-142365103 @jkbradley Fixed the merge conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [WIP] [SPARK-10413] [ML] Model should support ...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8883#discussion_r40399684 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala --- @@ -129,12 +129,12 @@ abstract class ClassificationModel[FeaturesType

[GitHub] spark pull request: [SPARK-10387][ML] Add code gen for gbt

2015-12-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9524#discussion_r46749563 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -189,6 +190,18 @@ final class GBTClassificationModel private[ml

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/10231 [SPARK-12182][ML] Distributed binning for trees in spark.ml This PR changes the `findSplits` method in spark.ml to perform split calculations on the workers. This PR is meant to copy [PR-8246

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163419168 @NathanHowell would you be able to review this? cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163430383 This JIRA was actually created as a blocker JIRA for [SPARK-12183](https://issues.apache.org/jira/browse/SPARK-12183) which is for removing the MLlib code entirely

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47165377 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-8519] [ML] [MLlib] Blockify distance co...

2015-12-16 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10306#discussion_r47819551 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -250,114 +240,142 @@ class KMeans private

[GitHub] spark pull request: [SPARK-8519] [ML] [MLlib] Blockify distance co...

2015-12-16 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10306#discussion_r47820060 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -250,114 +240,142 @@ class KMeans private

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2015-12-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r47530009 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -94,7 +110,7 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2015-12-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r47530361 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47382165 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

2015-12-17 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r47933061 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -275,6 +274,40 @@ class

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

2015-12-17 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r47933490 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -275,6 +274,40 @@ class

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

2015-12-17 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r47933679 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -100,6 +101,40 @@ class GBTClassifierSuite extends

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

2015-12-17 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r47932753 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -275,6 +274,40 @@ class

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

2015-12-17 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-165517939 I assume since this is a WIP you are still going to add test cases for the other predictors? Additionally, since ShortType and DecimalType also extend NumericType, I

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2015-12-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r47530742 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala --- @@ -43,6 +44,18 @@ class WeightedLeastSquaresSuite extends

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2015-12-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r47534238 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-7675][ML][PYSpark] sparkml params type ...

2016-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9581#discussion_r48774100 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -247,7 +248,27 @@ def _set(self, **kwargs): Sets user-supplied params

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-01-04 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-168845446 @NathanHowell do you think you'll have any time to take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-01-06 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-169395890 @NathanHowell Thank you for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10639#discussion_r49144625 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/GLMFamilies.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10639#discussion_r49144794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-07 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-169848523 @yanboliang Could you post a link to a reference paper? I find documentation on IRLS scattered, so it would be nice to have something concrete to point

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-08 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10639#discussion_r49226861 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/GLMFamilies.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation

  1   2   3   4   5   6   7   8   9   10   >