[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-19 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11621#discussion_r56587642 --- Diff: python/pyspark/ml/classification.py --- @@ -231,6 +232,210 @@ def intercept(self): """ return

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-19 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11621#discussion_r56587676 --- Diff: python/pyspark/ml/wrapper.py --- @@ -223,3 +223,20 @@ def _call_java(self, name, *args): sc = SparkContext._active_spark_context

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-18 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11621#discussion_r56587656 --- Diff: python/pyspark/ml/regression.py --- @@ -151,6 +151,209 @@ def intercept(self): """ return self._call_

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-18 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11802#discussion_r56595822 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStreamCheckpointData.scala --- @@ -102,8 +102,8 @@ class DStreamCheckpointData[T

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-18 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11802#discussion_r56595611 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStreamCheckpointData.scala --- @@ -45,7 +45,7 @@ class DStreamCheckpointData[T

[GitHub] spark pull request: Changes to support KMeans with large feature s...

2016-03-10 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10739#issuecomment-195090771 @levin-royl it looks like @hhbyyh has a branch with some code that tackles the same issue (see the jira discussion for more information), you may want to coordinate

[GitHub] spark pull request: [SPARK-13600] [MLlib] Use approxQuantile from ...

2016-03-10 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/11553#issuecomment-195054258 @oliverpierson I was thinking that `relativeError` should be automatically selected (and not exposed as a param). However, I am fine with exposing it for the sake

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-194528019 @hhbyyh thanks! I just have some small comments; my main comment being in the jira ticket regarding the choice of options 1/2/3. --- If your project is set up

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11549#discussion_r55597240 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -569,9 +572,46 @@ class

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11549#discussion_r55595546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -569,9 +572,46 @@ class

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11549#discussion_r55592623 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/SparkRWrappers.scala --- @@ -17,15 +17,41 @@ package org.apache.spark.ml.api.r

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11549#discussion_r55591400 --- Diff: R/pkg/R/mllib.R --- @@ -51,13 +45,12 @@ setClass("PipelineModel", representation(model = "jobj")) #' summary(model)

[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-09 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11549#discussion_r55563547 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/SparkRWrappers.scala --- @@ -17,15 +17,41 @@ package org.apache.spark.ml.api.r

[GitHub] spark pull request: [SPARK-13600] [MLlib] Incorrect number of buck...

2016-03-08 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/11553#issuecomment-193889292 If you use 0.0 for the relative error, it is going to return the exact quantiles. However in this case, there will be no data compression and the algorithm

[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-07 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9522#issuecomment-193542005 @pravingadakh sorry for the delay. Would you mind resolving the conflicts? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-6761][SQL][ML] Fixes to API and documen...

2016-02-23 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/11332#issuecomment-187964822 LGTM, thanks @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/stat/ApproxQuantileSuite.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821883 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala --- @@ -269,3 +291,41 @@ class DataFrameStatSuite extends QueryTest

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821857 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -27,6 +30,312 @@ import

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821725 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -27,6 +30,312 @@ import

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821685 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -37,6 +37,16 @@ import org.apache.spark.util.sketch.{BloomFilter

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -27,6 +30,312 @@ import

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821735 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -27,6 +30,312 @@ import

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821700 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -27,6 +30,312 @@ import

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/6042#discussion_r53821676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -37,6 +37,16 @@ import org.apache.spark.util.sketch.{BloomFilter

[GitHub] spark pull request: [SPARK-6761][SQL][ML] Fixes to API and documen...

2016-02-23 Thread thunterdb
GitHub user thunterdb opened a pull request: https://github.com/apache/spark/pull/11325 [SPARK-6761][SQL][ML] Fixes to API and documentation of approximate quantiles ## What changes were proposed in this pull request? This PR addresses the remaining comments from @mengxr

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-23 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/6042#issuecomment-187774708 @mengxr thanks for the review, will do in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-11 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/6042#issuecomment-182958229 @viirya sorry I missed your email, I will look at your PR today. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13146][SQL] Management API for continuo...

2016-02-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11030#discussion_r52201810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ContinuousQuery.scala --- @@ -17,11 +17,47 @@ package org.apache.spark.sql

[GitHub] spark pull request: [SPARK-13146][SQL] Management API for continuo...

2016-02-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/11030#discussion_r52202089 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ContinuousQuery.scala --- @@ -17,11 +17,47 @@ package org.apache.spark.sql

[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...

2016-01-19 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9535#discussion_r50163562 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -23,8 +23,8 @@ import org.apache.spark.Logging import

[GitHub] spark pull request: [SPARK-12765] [ML] [CountVectorizer]fix CountV...

2016-01-14 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10720#issuecomment-171778898 @sloth2012 this looks good to me, thanks for the fix. cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12026] [MLlib] ChiSqTest gets slower an...

2016-01-12 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10146#issuecomment-170949938 @hhbyyh yes, option 3 sounds good. A caveat, though, about the numbers you posted: micro benchmarks on the JVM are very hard to get right, and a simple loop

[GitHub] spark pull request: [SPARK-11938][ML] Expose numFeatures in all ML...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9936#discussion_r49143685 --- Diff: python/pyspark/ml/tests.py --- @@ -371,6 +378,103 @@ def test_fit_maximize_metric(self): self.assertEqual(1.0, bestModelMetric, "

[GitHub] spark pull request: [SPARK-11938][ML] Expose numFeatures in all ML...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9936#discussion_r49143733 --- Diff: python/pyspark/ml/tests.py --- @@ -371,6 +378,103 @@ def test_fit_maximize_metric(self): self.assertEqual(1.0, bestModelMetric, "

[GitHub] spark pull request: [SPARK-11923][ML] Python API for ml.feature.Ch...

2016-01-07 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10186#issuecomment-169840549 LGTM cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11944][PYSPARK][MLLIB] python mllib.clu...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10150#discussion_r49140159 --- Diff: python/pyspark/mllib/clustering.py --- @@ -38,13 +38,116 @@ from pyspark.mllib.util import Saveable, Loader, inherit_doc, JavaLoader

[GitHub] spark pull request: [SPARK-11944][PYSPARK][MLLIB] python mllib.clu...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10150#discussion_r49140117 --- Diff: python/pyspark/mllib/clustering.py --- @@ -38,13 +38,116 @@ from pyspark.mllib.util import Saveable, Loader, inherit_doc, JavaLoader

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10570#discussion_r49139498 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -57,8 +57,8 @@ trait

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10602#discussion_r49140273 --- Diff: python/pyspark/mllib/fpm.py --- @@ -130,15 +133,21 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

2016-01-07 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10602#discussion_r49140295 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -239,6 +239,17 @@ def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-07 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-169828910 @srowen I am confused, did you send me this message for this PR? It somehow does not show up in github, and I do not see the jenkins run that failed

[GitHub] spark pull request: [SPARK-12634][Python][MLlib][DOC] Update param...

2016-01-07 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10601#issuecomment-169830444 @vijaykiran thanks for the style fixes. cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10472#discussion_r48993085 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -79,13 +79,14 @@ class

[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10472#discussion_r48993376 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala --- @@ -44,6 +44,23 @@ private[spark] object SchemaUtils

[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10472#issuecomment-169418377 @BenFradet thanks! Just a small comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10323#issuecomment-169425244 @ericl this looks great, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10570#discussion_r48983200 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -57,8 +57,8 @@ trait

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10570#discussion_r48983231 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -275,8 +275,8 @@ trait

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-169396527 @srowen thanks a lot for the cleanup! Just two comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10570#discussion_r48983461 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -206,7 +206,7 @@ abstract class QueryTest extends PlanTest { val

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10570#discussion_r48983495 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -231,7 +231,7 @@ abstract class QueryTest extends PlanTest { val

[GitHub] spark pull request: [SPARK-12631] [PYSPARK] [DOC] PySpark clusteri...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10610#issuecomment-169506325 @BryanCutler it looks great, thanks! One overall comment about the seeds: it is unclear what `None` means, and it has a different behavior between `spark.ml

[GitHub] spark pull request: [SPARK-12631] [PYSPARK] [DOC] PySpark clusteri...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10610#discussion_r49028171 --- Diff: python/pyspark/mllib/clustering.py --- @@ -774,17 +843,32 @@ def train(cls, rdd, k=10, maxIterations=20, docConcentration=-1.0

[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10472#issuecomment-169483754 cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12663] [MLlib] More informative error m...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10611#issuecomment-169483568 cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11520][ML] RegressionMetrics should sup...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9907#discussion_r49002581 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -109,4 +109,55 @@ class RegressionMetricsSuite

[GitHub] spark pull request: [SPARK-11520][ML] RegressionMetrics should sup...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9907#issuecomment-169444237 @Lewuathe thanks for your patch. I think it will require more work in `RegressionMetrics` to fully implement weighted metrics. We need to do the following changes

[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...

2016-01-06 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-169449477 cc @jkbradley @srowen I have some concerns about the number of files being touched by this PR, it may be hard to merge without an ever-present conflict

[GitHub] spark pull request: [SPARK-11520][ML] RegressionMetrics should sup...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9907#discussion_r49000659 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -39,16 +51,17 @@ class RegressionMetrics @Since("

[GitHub] spark pull request: [SPARK-11520][ML] RegressionMetrics should sup...

2016-01-06 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9907#discussion_r49002633 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -109,4 +109,55 @@ class RegressionMetricsSuite

[GitHub] spark pull request: [SPARK-12006][ML][PYTHON] Fix GMM failure if i...

2016-01-05 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9986#discussion_r48880089 --- Diff: python/pyspark/mllib/clustering.py --- @@ -346,7 +346,7 @@ def train(cls, rdd, k, convergenceTol=1e-3, maxIterations=100, seed=None, initia

[GitHub] spark pull request: [SPARK-12663] [MLlib] More informative error m...

2016-01-05 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10611#discussion_r48919311 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -86,7 +86,7 @@ object MLUtils { val indicesLength

[GitHub] spark pull request: [SPARK-12663] [MLlib] More informative error m...

2016-01-05 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10611#issuecomment-169190994 @robert-dodier thanks for your PR. Can you please fix the style issue? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-11515][ML] QuantileDiscretizer should t...

2016-01-04 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9535#issuecomment-168861235 @yu-iskw thanks for fixing this issue. It looks good to me, but can you please resolve the conflicts? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12041] [ML] [PySpark] Add columnSimilar...

2016-01-04 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10158#issuecomment-168863404 @vectorijk thanks for the PR, it looks good to me. cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12026] [MLlib] ChiSqTest gets slower an...

2016-01-04 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10146#issuecomment-168867750 @hhbyyh thanks for the fix; I just have one small comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-11815] [ML] [PySpark] PySpark DecisionT...

2016-01-04 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9807#issuecomment-168862678 This looks good to me. cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12026] [MLlib] ChiSqTest gets slower an...

2016-01-04 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10146#discussion_r48805286 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala --- @@ -109,7 +109,9 @@ private[stat] object ChiSqTest extends Logging

[GitHub] spark pull request: [SPARK-12450][MLLib] Un-persist broadcasted va...

2016-01-04 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10415#issuecomment-168859502 @rnowling thanks for fixing this, I just have some style comments. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12450][MLLib] Un-persist broadcasted va...

2016-01-04 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10415#discussion_r48802820 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -353,6 +356,7 @@ class KMeans private ( ((r

[GitHub] spark pull request: [SPARK-12450][MLLib] Un-persist broadcasted va...

2016-01-04 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10415#discussion_r48802864 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -324,6 +326,7 @@ class KMeans private ( s0

[GitHub] spark pull request: [SPARK-11608][MLLIB][DOC] Added migration guid...

2015-12-16 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10235#issuecomment-165206222 This looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47720994 --- Diff: docs/_layouts/global.html --- @@ -128,19 +128,31 @@ {% if page.url contains "/ml" %} {% i

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47721217 --- Diff: docs/css/main.css --- @@ -171,24 +202,104 @@ a.anchorjs-link:hover { text-decoration: none; } * The left navigation bar

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47721302 --- Diff: docs/css/main.css --- @@ -39,13 +39,26 @@ margin-left: 10px; } +/* body .container-wrapper { position: absolute

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47721444 --- Diff: docs/css/main.css --- @@ -171,24 +202,104 @@ a.anchorjs-link:hover { text-decoration: none; } * The left navigation bar

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47721419 --- Diff: docs/css/main.css --- @@ -171,24 +202,104 @@ a.anchorjs-link:hover { text-decoration: none; } * The left navigation bar

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47727256 --- Diff: docs/css/main.css --- @@ -101,6 +98,26 @@ a:hover code { max-width: 914px; } +.content { + z-index: 1; + position

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10297#discussion_r47727246 --- Diff: docs/css/main.css --- @@ -101,6 +98,26 @@ a:hover code { max-width: 914px; } +.content { + z-index: 1; + position

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-14 Thread thunterdb
GitHub user thunterdb opened a pull request: https://github.com/apache/spark/pull/10297 [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes

[GitHub] spark pull request: [SPARK-12324][MLLIB][DOC] Fixes the sidebar in...

2015-12-14 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10297#issuecomment-164556778 @jkbradley can you take a look at this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163723975 @jkbradley done: https://cloud.githubusercontent.com/assets/7594753/11725710/e9949f04-9f2f-11e5-8ba5-7f955e8b41fa.png;> --- If your project is set

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163713115 @jkbradley done with changes, let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10234#discussion_r47262597 --- Diff: docs/mllib-feature-extraction.md --- @@ -1,7 +1,7 @@ --- layout: global -title: Feature Extraction and Transformation - MLlib

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10234#discussion_r47262584 --- Diff: docs/ml-classification-regression.md --- @@ -27,10 +27,10 @@ displayTitle: Classification and regression in spark.ml * This will become

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-09 Thread thunterdb
GitHub user thunterdb opened a pull request: https://github.com/apache/spark/pull/10234 [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-09 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163455399 @jkbradley I agree with you. Note that when you use the doc, the current section appears in bold in the side menu (and the submenu gets expandend). This is why I did

[GitHub] spark pull request: [SPARK-8517][MLLIB][DOC] Reorganizes the spark...

2015-12-08 Thread thunterdb
GitHub user thunterdb opened a pull request: https://github.com/apache/spark/pull/10207 [SPARK-8517][MLLIB][DOC] Reorganizes the spark.ml user guide This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested

[GitHub] spark pull request: [SPARK-8517][ML][DOC] Reorganizes the spark.ml...

2015-12-08 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10207#issuecomment-163009350 @jkbradley no this PR just moves the text around, with little modification. More substantital changes will be done later. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-8517][ML][DOC] Reorganizes the spark.ml...

2015-12-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10207#discussion_r47043656 --- Diff: docs/ml-classification-regression.md --- @@ -0,0 +1,762 @@ +--- +layout: global +title: Classification and regression - spark.ml

[GitHub] spark pull request: [SPARK-8517][ML][DOC] Reorganizes the spark.ml...

2015-12-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10207#discussion_r47018195 --- Diff: docs/ml-classification-regression.md --- @@ -0,0 +1,733 @@ +--- +layout: global +title: Classification and regression - spark.ml

[GitHub] spark pull request: [SPARK-8517][ML][DOC] Reorganizes the spark.ml...

2015-12-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10207#discussion_r47018213 --- Diff: docs/_data/menu-ml.yaml --- @@ -1,10 +1,10 @@ -- text: Feature extraction, transformation, and selection +- text: "Overview: estim

[GitHub] spark pull request: [SPARK-8517][ML][DOC] Reorganizes the spark.ml...

2015-12-08 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10207#discussion_r47017809 --- Diff: docs/mllib-guide.md --- @@ -66,15 +66,18 @@ We list major functionality from both below, with links to detailed guides. # spark.ml

[GitHub] spark pull request: [SPARK-12000] Fixes compilation issues with ja...

2015-11-30 Thread thunterdb
GitHub user thunterdb opened a pull request: https://github.com/apache/spark/pull/10048 [SPARK-12000] Fixes compilation issues with java8 and sbt. Currently, trying to publish spark locally fails when java version is >= 1.8.0: - a javadoc option has been removed - java

[GitHub] spark pull request: [SPARK-12000] Fixes compilation issues with ja...

2015-11-30 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/10048#discussion_r46210267 --- Diff: project/SparkBuild.scala --- @@ -155,13 +155,15 @@ object SparkBuild extends PomBuild { if (major.toInt >= 1 && minor.toIn

[GitHub] spark pull request: [SPARK-11835] Adds a sidebar menu to MLlib's d...

2015-11-20 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9826#issuecomment-158541369 @mengxr comment addressed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11835] Adds a sidebar menu to MLlib's d...

2015-11-19 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9826#issuecomment-158169615 @mengxr with the fixes, the top menu breaks before the side menu :-) ![screen shot 2015-11-19 at 11 30 13 am](https://cloud.githubusercontent.com/assets/7594753

[GitHub] spark pull request: [SPARK-11835] Adds a sidebar menu to MLlib's d...

2015-11-19 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/9826#issuecomment-158169959 This is about as much as I can do with my very limited knowledge of CSS, so additional fixes can be done in a separate PR. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-11835] Adds a sidebar menu to MLlib's d...

2015-11-19 Thread thunterdb
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/9826#discussion_r45381998 --- Diff: docs/_includes/nav-left-wrapper-ml.html --- @@ -0,0 +1,6 @@ + +ML --- End diff -- sure --- If your project is set up

<    1   2   3   4   >