[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-10-10 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21524 Let's close it for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21524: [SPARK-24212][ML][doc] Add the example and user g...

2018-10-10 Thread tengpeng
Github user tengpeng closed the pull request at: https://github.com/apache/spark/pull/21524 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-09-23 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21524 Yes, but may not be recently. Is there a "deadline" (e.g. branch cut) coming? On Tue, Sep 18, 2018 at 4:23 PM Sean Owen wrote: > @tengpeng <https://github.com

[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-09-07 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21524 Right, the doc is mostly copied with minor modifications. Let me update my PR for the Python API this weekend. On Fri, Sep 7, 2018 at 12:28 PM Sean Owen wrote: > CC @tengp

[GitHub] spark issue #21861: [SPARK-24907][WIP] Migrate JDBC DataSource to JDBCDataSo...

2018-07-24 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21861 @gatorsmile Got you. I will update the implementation after DataSourceV2 API changes. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21861: [SPARK-24907][WIP] Migrate JDBC DataSource to JDB...

2018-07-24 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/21861 [SPARK-24907][WIP] Migrate JDBC DataSource to JDBCDataSourceV2 Read using DataSourceV2 API ## What changes were proposed in this pull request? (After the update of DataSourceV2 API

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-07-22 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21123 Any updates on this PR? Yes, I know this is a temporary hack, but without it being merged, it is not possible to migrate other data sources to V2 (experimentally

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-07-07 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r200819285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,39 +240,47 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-06-27 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r198456106 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartitionUtil.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed

[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-06-22 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21524 Gentle ping @jkbradley @WeichenXu123 @mengxr Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21524: [SPARK-24212][ML][doc] Add the example and user g...

2018-06-10 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/21524 [SPARK-24212][ML][doc] Add the example and user guide for ML PrefixSpan ## What changes were proposed in this pull request? There are no example and user guide for ML PrefixSpan

[GitHub] spark pull request #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-06-09 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/21522 [SPARK-24467][ML] VectorAssemblerEstimator Background: See the JIRA ticket. This PR is on its very early stage, and hopefully it would help us decide what's the right direction

[GitHub] spark pull request #21125: [Spark-24024][ML] Fix poisson deviance calculatio...

2018-04-23 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21125#discussion_r183386685 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -495,8 +495,8 @@ class

[GitHub] spark pull request #21125: [Spark-24024][ML] Fix poisson deviance calculatio...

2018-04-23 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21125#discussion_r183386571 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -782,8 +782,12 @@ object

[GitHub] spark pull request #21125: [Spark-24024][ML] Fix poisson deviance calculatio...

2018-04-23 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21125#discussion_r183386022 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -495,8 +495,8 @@ class

[GitHub] spark pull request #21125: [Spark-24024] Fix poisson deviance calculations i...

2018-04-22 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/21125 [Spark-24024] Fix poisson deviance calculations in GLM to handle y = 0 ## What changes were proposed in this pull request? It is reported by Spark users that the deviance calculations

[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-04-09 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/20632 (A note to me & future readers) It seems this is actually for [SPARK-3159] Check for reducible DecisionTree, rather than SPARK-3155 Support DecisionTree pruning. The title is confusing t

[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-24 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/20842 Looks good! Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-03-19 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r175646373 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -152,15 +152,13 @@ private[spark] object

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-03-19 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r175646335 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -152,15 +152,13 @@ private[spark] object

[GitHub] spark pull request #20732: [SPARK-23578][ML] Add multicolumn support for Bin...

2018-03-04 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20732#discussion_r172059794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Binarizer.scala --- @@ -45,66 +47,117 @@ final class Binarizer @Since("1.4.0") (@Si

[GitHub] spark pull request #20732: [SPARK-23578][ML] Add multicolumn support for Bin...

2018-03-04 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/20732 [SPARK-23578][ML] Add multicolumn support for Binarizer ## What changes were proposed in this pull request? [Spark-20542] added an API that Bucketizer that can bin multiple columns

[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...

2018-03-03 Thread tengpeng
Github user tengpeng closed the pull request at: https://github.com/apache/spark/pull/20729 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...

2018-03-03 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/20729 [SPARK-23578][ML]Add multicolumn support for Binarizer [Spark-20542] added an API that Bucketizer that can bin multiple columns. Based on this change, a multicolumn support is added for Binarizer

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-11-23 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r152891005 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -108,26 +164,53 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-14 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19638 Not sure what's happening here. The test on my local machine passed: Running Apache RAT checks

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-11 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r150394001 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-11 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r150393944 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19660: [SPARK-18755][WIP][ML] Add Randomized Grid Search...

2017-11-10 Thread tengpeng
Github user tengpeng closed the pull request at: https://github.com/apache/spark/pull/19660 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-07 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149560345 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-07 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149558607 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark issue #19660: [SPARK-18755][WIP][ML] Add Randomized Grid Search to Spa...

2017-11-06 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19660 @srowen You are absolutely right. That's the what 2 aims to accomplish. I believe implementing 1 & 2 is the goal, like what they did in sklearn. Need some discussions on de

[GitHub] spark pull request #19660: [SPARK-18755][WIP][ML] Add Randomized Grid Search...

2017-11-05 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/19660 [SPARK-18755][WIP][ML] Add Randomized Grid Search to Spark ML ## What changes were proposed in this pull request? Python sklearn has a randomized grid search for reducing the time

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-03 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148919520 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-03 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148869278 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-02 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19638 I have used @sethah 's approach to address the issues we have. Since we are not adding a new method to the public trait, there is no more binary compatibility issue

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148692614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -125,4 +125,14 @@ class RegressionMetrics @Since("

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148641672 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala --- @@ -73,6 +73,11 @@ class RegressionEvaluatorSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread tengpeng
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148626297 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -49,8 +49,8 @@ final class RegressionEvaluator @Since

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-02 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19638 @srowen I have fixed scaladocs and since issues. I will pay special attention to this issue next time. --- - To unsubscribe, e

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-01 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19638 Would it be possible to add me to the white list for test? Thanks. On Thu, Nov 2, 2017 at 12:17 AM UCB AMPLab <notificati...@github.com> wrote: > Can one of the admi

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-01 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/19638 [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics ## What changes were proposed in this pull request? I added adjusted R2 as a regression metric which was implemented in all major

[GitHub] spark issue #19600: Added more information to Imputer

2017-10-29 Thread tengpeng
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19600 I will follow the guideline strictly next time. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19600: Added more information to Imputer

2017-10-28 Thread tengpeng
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/19600 Added more information to Imputer Often times we want to impute custom values other than 'NaN'. My addition helps people locate this function without reading the API. ## What changes