spark git commit: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-02-01 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 871fd48dc -> 205bce974 [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs. ## What changes were proposed in this pull request? Audit new APIs and docs in 2.3.0. ## How was this patch tested? No test. Author: Yanbo Liang

spark git commit: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-02-01 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 07cee3373 -> e15da5b14 [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs. ## What changes were proposed in this pull request? Audit new APIs and docs in 2.3.0. ## How was this patch tested? No test. Author: Yanbo Liang

spark git commit: [SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.

2018-01-31 Thread mlnick
nly Author: Nick Pentreath <ni...@za.ibm.com> Closes #20421 from MLnick/SPARK-23112-ml-guide. (cherry picked from commit 161a3f2ae324271a601500e3d2900db9359ee2ef) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.

2018-01-31 Thread mlnick
nly Author: Nick Pentreath <ni...@za.ibm.com> Closes #20421 from MLnick/SPARK-23112-ml-guide. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/161a3f2a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/161a3f2a Diff: h

spark git commit: [SPARK-23138][ML][DOC] Multiclass logistic regression summary example and user guide

2018-01-29 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 bb7502f9a -> 107d4e293 [SPARK-23138][ML][DOC] Multiclass logistic regression summary example and user guide ## What changes were proposed in this pull request? User guide and examples are updated to reflect multiclass logistic

spark git commit: [SPARK-23138][ML][DOC] Multiclass logistic regression summary example and user guide

2018-01-29 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 8b983243e -> 5056877e8 [SPARK-23138][ML][DOC] Multiclass logistic regression summary example and user guide ## What changes were proposed in this pull request? User guide and examples are updated to reflect multiclass logistic regression

spark git commit: Revert "[SPARK-22797][PYSPARK] Bucketizer support multi-column"

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/master dd8e257d1 -> a8a3e9b7c Revert "[SPARK-22797][PYSPARK] Bucketizer support multi-column" This reverts commit c22eaa94e85aaac649566495dcf763a5de3c8d06. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Revert "[SPARK-22797][PYSPARK] Bucketizer support multi-column"

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 ca3613be2 -> f5911d489 Revert "[SPARK-22797][PYSPARK] Bucketizer support multi-column" This reverts commit ab1b5d921b395cb7df3a3a2c4a7e5778d98e6f01. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-22797][PYSPARK] Bucketizer support multi-column

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 d6cdc699e -> ab1b5d921 [SPARK-22797][PYSPARK] Bucketizer support multi-column ## What changes were proposed in this pull request? Bucketizer support multi-column in the python side ## How was this patch tested? existing tests and

spark git commit: [SPARK-22797][PYSPARK] Bucketizer support multi-column

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/master cd3956df0 -> c22eaa94e [SPARK-22797][PYSPARK] Bucketizer support multi-column ## What changes were proposed in this pull request? Bucketizer support multi-column in the python side ## How was this patch tested? existing tests and added

spark git commit: [SPARK-22799][ML] Bucketizer should throw exception if single- and multi-column params are both set

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 fdf140e25 -> d6cdc699e [SPARK-22799][ML] Bucketizer should throw exception if single- and multi-column params are both set ## What changes were proposed in this pull request? Currently there is a mixed situation when both single- and

spark git commit: [SPARK-22799][ML] Bucketizer should throw exception if single- and multi-column params are both set

2018-01-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/master d1721816d -> cd3956df0 [SPARK-22799][ML] Bucketizer should throw exception if single- and multi-column params are both set ## What changes were proposed in this pull request? Currently there is a mixed situation when both single- and

spark git commit: [SPARK-23112][DOC] Add highlights and migration guide for 2.3

2018-01-25 Thread mlnick
<ni...@za.ibm.com> Closes #20363 from MLnick/SPARK-23112-ml-guide. (cherry picked from commit 8532e26f335b67b74c976712ad82c20ea6dbbf80) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/

spark git commit: [SPARK-23112][DOC] Add highlights and migration guide for 2.3

2018-01-25 Thread mlnick
bm.com> Closes #20363 from MLnick/SPARK-23112-ml-guide. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8532e26f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8532e26f Diff: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-23048][ML] Add OneHotEncoderEstimator document and examples

2018-01-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 e58223171 -> ef7989d55 [SPARK-23048][ML] Add OneHotEncoderEstimator document and examples ## What changes were proposed in this pull request? We have `OneHotEncoderEstimator` now and `OneHotEncoder` will be deprecated since 2.3.0. We

spark git commit: [SPARK-23048][ML] Add OneHotEncoderEstimator document and examples

2018-01-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 60203fca6 -> b74366481 [SPARK-23048][ML] Add OneHotEncoderEstimator document and examples ## What changes were proposed in this pull request? We have `OneHotEncoderEstimator` now and `OneHotEncoder` will be deprecated since 2.3.0. We

spark git commit: [SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter

2018-01-19 Thread mlnick
How was this patch tested? Doc only Author: Nick Pentreath <ni...@za.ibm.com> Closes #20293 from MLnick/SPARK-23127-catCol-userguide. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60203fca Tree: http://git-wip-us.apache.org/

spark git commit: [SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter

2018-01-19 Thread mlnick
How was this patch tested? Doc only Author: Nick Pentreath <ni...@za.ibm.com> Closes #20293 from MLnick/SPARK-23127-catCol-userguide. (cherry picked from commit 60203fca6a605ad158184e1e0ce5187e144a3ea7) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apach

spark git commit: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated

2018-01-12 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.3 3ae3e1bb7 -> d512d873b [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How was this patch tested? N/A Author:

spark git commit: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated

2018-01-12 Thread mlnick
Repository: spark Updated Branches: refs/heads/master cbe7c6fbf -> a7d98d53c [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How was this patch tested? N/A Author:

spark git commit: [SPARK-22801][ML][PYSPARK] Allow FeatureHasher to treat numeric columns as categorical

2017-12-31 Thread mlnick
ted as categorical features. ## How was this patch tested? New unit tests. Author: Nick Pentreath <ni...@za.ibm.com> Closes #19991 from MLnick/hasher-num-cat. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/028ee401 T

spark git commit: [SPARK-22397][ML] add multiple columns support to QuantileDiscretizer

2017-12-31 Thread mlnick
Repository: spark Updated Branches: refs/heads/master cfbe11e81 -> 3d8837e59 [SPARK-22397][ML] add multiple columns support to QuantileDiscretizer ## What changes were proposed in this pull request? add multi columns support to QuantileDiscretizer. When calculating the splits, we can either

spark git commit: [SPARK-22700][ML] Bucketizer.transform incorrectly drops row containing NaN

2017-12-12 Thread mlnick
Repository: spark Updated Branches: refs/heads/master bdb5e55c2 -> 874350905 [SPARK-22700][ML] Bucketizer.transform incorrectly drops row containing NaN ## What changes were proposed in this pull request? only drops the rows containing NaN in the input columns ## How was this patch tested?

spark git commit: [SPARK-20199][ML] : Provided featureSubsetStrategy to GBTClassifier and GBTRegressor

2017-11-10 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 28ab5bf59 -> 9b9827759 [SPARK-20199][ML] : Provided featureSubsetStrategy to GBTClassifier and GBTRegressor ## What changes were proposed in this pull request? (Provided featureSubset Strategy to GBTClassifier a) Moved

spark git commit: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can bin multiple columns

2017-11-09 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 6793a3dac -> 77f74539e [SPARK-20542][ML][SQL] Add an API to Bucketizer that can bin multiple columns ## What changes were proposed in this pull request? Current ML's Bucketizer can only bin a column of continuous features. If a dataset

svn commit: r22186 - /dev/spark/spark-2.1.2-rc4-bin/ /release/spark/spark-2.1.2/

2017-10-09 Thread mlnick
Author: mlnick Date: Mon Oct 9 19:37:45 2017 New Revision: 22186 Log: Release Spark 2.1.2 Added: release/spark/spark-2.1.2/ - copied from r22185, dev/spark/spark-2.1.2-rc4-bin/ Removed: dev/spark/spark-2.1.2-rc4-bin

spark git commit: [SPARK-20679][ML] Support recommending for a subset of users/items in ALSModel

2017-10-09 Thread mlnick
put dataframe. ## How was this patch tested? New unit tests in `ALSSuite` and Python doctests in `ALS`. Ran updated examples locally. Author: Nick Pentreath <ni...@za.ibm.com> Closes #18748 from MLnick/als-recommend-df. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-21958][ML] Word2VecModel save: transform data in the cluster

2017-09-15 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 3c6198c86 -> 79a4dab62 [SPARK-21958][ML] Word2VecModel save: transform data in the cluster ## What changes were proposed in this pull request? Change a data transformation while saving a Word2VecModel to happen with distributed data

spark git commit: [SPARK-19357][ML] Adding parallel model evaluation in ML tuning

2017-09-06 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 4ee7dfe41 -> 16c4c03c7 [SPARK-19357][ML] Adding parallel model evaluation in ML tuning ## What changes were proposed in this pull request? Modified `CrossValidator` and `TrainValidationSplit` to be able to evaluate models in parallel for

spark git commit: [SPARK-21469][ML][EXAMPLES] Adding Examples for FeatureHasher

2017-08-30 Thread mlnick
Repository: spark Updated Branches: refs/heads/master b30a11a6a -> 4133c1b0a [SPARK-21469][ML][EXAMPLES] Adding Examples for FeatureHasher ## What changes were proposed in this pull request? This PR adds ML examples for the FeatureHasher transform in Scala, Java, Python. ## How was this

spark git commit: [SPARK-21468][PYSPARK][ML] Python API for FeatureHasher

2017-08-21 Thread mlnick
8970 from MLnick/SPARK-21468-pyspark-hasher. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/988b84d7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/988b84d7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/988b84d

spark git commit: [SPARK-13969][ML] Add FeatureHasher transformer

2017-08-16 Thread mlnick
hor: Nick Pentreath <ni...@za.ibm.com> Closes #18513 from MLnick/FeatureHasher. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bb8d1f3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bb8d1f3 Diff: http:

spark git commit: [SPARK-20988][ML] Logistic regression uses aggregator hierarchy

2017-07-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/master ae4ea5fe2 -> cf29828d7 [SPARK-20988][ML] Logistic regression uses aggregator hierarchy ## What changes were proposed in this pull request? This change pulls the `LogisticAggregator` class out of LogisticRegression.scala and makes it

spark git commit: [SPARK-20506][DOCS] Add HTML links to highlight list in MLlib guide for 2.2

2017-05-22 Thread mlnick
How was this patch tested? Built docs locally and tested links. Author: Nick Pentreath <ni...@za.ibm.com> Closes #18043 from MLnick/SPARK-20506-2.2-migration-guide-2. (cherry picked from commit be846db48b226de2b0dfb5f87d059eda15ecf7cd) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Proj

spark git commit: [SPARK-20506][DOCS] Add HTML links to highlight list in MLlib guide for 2.2

2017-05-22 Thread mlnick
How was this patch tested? Built docs locally and tested links. Author: Nick Pentreath <ni...@za.ibm.com> Closes #18043 from MLnick/SPARK-20506-2.2-migration-guide-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be846db4 Tree: h

spark git commit: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-19 Thread mlnick
treath <ni...@za.ibm.com> Closes #17996 from MLnick/SPARK-20506-2.2-migration-guide. (cherry picked from commit b5d8d9ba17d62167cfbacd5f6188a8b4a5b8a2be) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-u

spark git commit: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-19 Thread mlnick
th <ni...@za.ibm.com> Closes #17996 from MLnick/SPARK-20506-2.2-migration-guide. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b5d8d9ba Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b5d8d9ba Diff: ht

spark git commit: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs

2017-05-16 Thread mlnick
bm.com> Closes #17919 from MLnick/SPARK-20677-als-perf-followup. (cherry picked from commit 25b4f41d239ac67402566c0254a893e2e58ae7d8) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs

2017-05-16 Thread mlnick
; Closes #17919 from MLnick/SPARK-20677-als-perf-followup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/25b4f41d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/25b4f41d Diff: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-20553][ML][PYSPARK] Update ALS examples with recommend-all methods

2017-05-16 Thread mlnick
cally Author: Nick Pentreath <ni...@za.ibm.com> Closes #17950 from MLnick/SPARK-20553-update-als-examples. (cherry picked from commit 6af7b43b34942c662122e3905b0724b2dd40a63f) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-20587][ML] Improve performance of ML ALS recommendForAll

2017-05-09 Thread mlnick
nce of `recommendAll` methods. ## How was this patch tested? Existing unit tests. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17845 from MLnick/ml-als-perf. (cherry picked from commit 10b00abadf4a3473332eef996db7b66f491316f2) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Proj

spark git commit: [SPARK-20587][ML] Improve performance of ML ALS recommendForAll

2017-05-09 Thread mlnick
nce of `recommendAll` methods. ## How was this patch tested? Existing unit tests. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17845 from MLnick/ml-als-perf. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/10b00aba T

spark git commit: [SPARK-11968][MLLIB] Optimize MLLIB ALS recommendForAll

2017-05-09 Thread mlnick
Repository: spark Updated Branches: refs/heads/master b952b44af -> 807942476 [SPARK-11968][MLLIB] Optimize MLLIB ALS recommendForAll The recommendForAll of MLLIB ALS is very slow. GC is a key problem of the current method. The task use the following code to keep temp result: val output = new

spark git commit: [SPARK-11968][MLLIB] Optimize MLLIB ALS recommendForAll

2017-05-09 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.2 54e074349 -> 72fca9a0a [SPARK-11968][MLLIB] Optimize MLLIB ALS recommendForAll The recommendForAll of MLLIB ALS is very slow. GC is a key problem of the current method. The task use the following code to keep temp result: val output =

spark git commit: [SPARK-20596][ML][TEST] Consolidate and improve ALS recommendAll test cases

2017-05-08 Thread mlnick
`k < num items` and `k = num items`. Technically we should also test that `k > num items` returns the same results as `k = num items`. ## How was this patch tested? Updated existing unit tests. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17860 from MLnick/SPARK-20596-als-rec-t

spark git commit: [SPARK-20596][ML][TEST] Consolidate and improve ALS recommendAll test cases

2017-05-08 Thread mlnick
`k < num items` and `k = num items`. Technically we should also test that `k > num items` returns the same results as `k = num items`. ## How was this patch tested? Updated existing unit tests. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17860 from MLnick/SPARK-20596-als-rec-

spark git commit: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers for SVD and PCA (v2)

2017-05-03 Thread mlnick
tch tested? New doc tests and unit tests. Ran all examples locally. Author: MechCoder <manojkumarsivaraj...@gmail.com> Author: Nick Pentreath <ni...@za.ibm.com> Closes #17621 from MLnick/SPARK-6227-pyspark-svd-pca. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers for SVD and PCA (v2)

2017-05-03 Thread mlnick
How was this patch tested? New doc tests and unit tests. Ran all examples locally. Author: MechCoder <manojkumarsivaraj...@gmail.com> Author: Nick Pentreath <ni...@za.ibm.com> Closes #17621 from MLnick/SPARK-6227-pyspark-svd-pca. (cherry picked from commit db2fb84b4a3c45daa449cc9232340193ce8eb

spark git commit: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recommendForAllUsers, Items

2017-05-02 Thread mlnick
sts. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17622 from MLnick/SPARK-20300-pyspark-recall. (cherry picked from commit e300a5a145820ecd466885c73245d6684e8cb0aa) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recommendForAllUsers, Items

2017-05-02 Thread mlnick
sts. Author: Nick Pentreath <ni...@za.ibm.com> Closes #17622 from MLnick/SPARK-20300-pyspark-recall. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e300a5a1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e300a5a1 D

spark git commit: [SPARK-20097][ML] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR

2017-04-11 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 734dfbfcf -> 0d2b79642 [SPARK-20097][ML] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR ## What changes were proposed in this pull request? - made `numInstances` public in GLR - made `degreesOfFreedom`

spark git commit: [SPARK-20076][ML][PYSPARK] Add Python interface for ml.stats.Correlation

2017-04-07 Thread mlnick
Repository: spark Updated Branches: refs/heads/master ad3cc1312 -> 1a52a6237 [SPARK-20076][ML][PYSPARK] Add Python interface for ml.stats.Correlation ## What changes were proposed in this pull request? The Dataframes-based support for the correlation statistics is added in #17108. This

spark git commit: [SPARK-19953][ML] Random Forest Models use parent UID when being fit

2017-04-06 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 5142e5d4e -> e156b5dd3 [SPARK-19953][ML] Random Forest Models use parent UID when being fit ## What changes were proposed in this pull request? The ML `RandomForestClassificationModel` and `RandomForestRegressionModel` were not using the

spark git commit: [SPARK-19969][ML] Imputer doc and example

2017-04-03 Thread mlnick
Repository: spark Updated Branches: refs/heads/master fb5869f2c -> 4d28e8430 [SPARK-19969][ML] Imputer doc and example ## What changes were proposed in this pull request? Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will

spark git commit: [SPARK-19985][ML] Fixed copy method for some ML Models

2017-04-03 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 93dbfe705 -> 2a903a1ee [SPARK-19985][ML] Fixed copy method for some ML Models ## What changes were proposed in this pull request? Some ML Models were using `defaultCopy` which expects a default constructor, and others were not setting the

spark git commit: [SPARK-15040][ML][PYSPARK] Add Imputer to PySpark

2017-03-24 Thread mlnick
<ni...@za.ibm.com> Closes #17316 from MLnick/SPARK-15040-pyspark-imputer. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9f4ce69 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9f4ce69 Diff: http://git-wip-us.a

spark git commit: [SPARK-13568][ML] Create feature transformer to impute missing values

2017-03-16 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 1472cac4b -> d647aae27 [SPARK-13568][ML] Create feature transformer to impute missing values ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13568 It is quite common to encounter

spark git commit: [SPARK-19345][ML][DOC] Add doc for "coldStartStrategy" usage in ALS

2017-03-02 Thread mlnick
Author: Nick Pentreath <ni...@za.ibm.com> Closes #17102 from MLnick/SPARK-19345-coldstart-doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9cca3dbf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9cca3dbf D

spark git commit: [SPARK-19704][ML] AFTSurvivalRegression should support numeric censorCol

2017-03-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 625cfe09e -> 50c08e82f [SPARK-19704][ML] AFTSurvivalRegression should support numeric censorCol ## What changes were proposed in this pull request? make `AFTSurvivalRegression` support numeric censorCol ## How was this patch tested?

spark git commit: [SPARK-19733][ML] Removed unnecessary castings and refactored checked casts in ALS.

2017-03-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 8d6ef895e -> 625cfe09e [SPARK-19733][ML] Removed unnecessary castings and refactored checked casts in ALS. ## What changes were proposed in this pull request? The original ALS was performing unnecessary casting to the user and item ids

spark git commit: [SPARK-19787][ML] Changing the default parameter of regParam.

2017-03-01 Thread mlnick
nly exception is the unit-tests on ALSSuite but the change does not break them. Note: This PR should get the award of the laziest commit in Spark history. Originally I wanted to correct this on another PR but MLnick [suggested](https://github.com/apache/spark/pull/17059#issuecomment-28572) to cre

spark git commit: [SPARK-14489][ML][PYSPARK] ALS unknown user/item prediction strategy

2017-02-28 Thread mlnick
ng `coldStartStrategy` to `drop` results in valid metrics. Author: Nick Pentreath <ni...@za.ibm.com> Closes #12896 from MLnick/SPARK-14489-als-nan. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4054665 Tree: http://git-

spark git commit: [SPARK-19679][ML] Destroy broadcasted object without blocking

2017-02-22 Thread mlnick
Repository: spark Updated Branches: refs/heads/master ef3c73535 -> bf7bb4977 [SPARK-19679][ML] Destroy broadcasted object without blocking ## What changes were proposed in this pull request? Destroy broadcasted object without blocking use `find mllib -name '*.scala' | xargs -i bash -c 'egrep

spark git commit: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-30 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 56c82edab -> fe854f2e4 [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer ## What changes were proposed in this pull request? added the new handleInvalid param for these transformers to Python

spark git commit: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-30 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.1 5e4afbfb6 -> 7043c6b69 [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer ## What changes were proposed in this pull request? added the new handleInvalid param for these transformers to

spark git commit: [SPARK-15113][PYSPARK][ML] Add missing num features num classes

2016-08-22 Thread mlnick
Repository: spark Updated Branches: refs/heads/master bd9655063 -> b264cbb16 [SPARK-15113][PYSPARK][ML] Add missing num features num classes ## What changes were proposed in this pull request? Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines.

spark git commit: [SPARK-15254][DOC] Improve ML pipeline Cross Validation Scaladoc & PyDoc

2016-07-27 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 045fc3606 -> 7e8279fde [SPARK-15254][DOC] Improve ML pipeline Cross Validation Scaladoc & PyDoc ## What changes were proposed in this pull request? Updated ML pipeline Cross Validation Scaladoc & PyDoc. ## How was this patch tested?

spark git commit: [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples

2016-06-29 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 1b4d63f6f -> ba71cf451 [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples ## What changes were proposed in this pull request? Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala. This

spark git commit: [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples

2016-06-29 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 7ee9e39cb -> 21385d02a [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples ## What changes were proposed in this pull request? Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala. This corrects

spark git commit: [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer

2016-06-24 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 201d5e8db -> 76741b570 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and

spark git commit: [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer

2016-06-24 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 158af162e -> be88383e1 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and CountVectorizer

spark git commit: [SPARK-15162][SPARK-15164][PYSPARK][DOCS][ML] update some pydocs

2016-06-22 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 e7a489c7f -> 838143a2a [SPARK-15162][SPARK-15164][PYSPARK][DOCS][ML] update some pydocs ## What changes were proposed in this pull request? Mark ml.classification algorithms as experimental to match Scala algorithms, update PyDoc for

spark git commit: [SPARK-15162][SPARK-15164][PYSPARK][DOCS][ML] update some pydocs

2016-06-22 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 0e3ce7533 -> d281b0baf [SPARK-15162][SPARK-15164][PYSPARK][DOCS][ML] update some pydocs ## What changes were proposed in this pull request? Mark ml.classification algorithms as experimental to match Scala algorithms, update PyDoc for for

spark git commit: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf" property

2016-06-09 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 eb9e8fc09 -> 10f759947 [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf" property ## What changes were proposed in this pull request? add method idf to IDF in pyspark ## How was this patch tested? add unit test Author: Jeff

spark git commit: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf" property

2016-06-09 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 99386fe39 -> e594b4928 [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf" property ## What changes were proposed in this pull request? add method idf to IDF in pyspark ## How was this patch tested? add unit test Author: Jeff

spark git commit: [SPARK-15168][PYSPARK][ML] Add missing params to MultilayerPerceptronClassifier

2016-06-03 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 3670b2c64 -> f702e9941 [SPARK-15168][PYSPARK][ML] Add missing params to MultilayerPerceptronClassifier ## What changes were proposed in this pull request? MultilayerPerceptronClassifier is missing step size, solver, and weights. Add

spark git commit: [SPARK-15168][PYSPARK][ML] Add missing params to MultilayerPerceptronClassifier

2016-06-03 Thread mlnick
Repository: spark Updated Branches: refs/heads/master b1cc7da3e -> 67cc89ff0 [SPARK-15168][PYSPARK][ML] Add missing params to MultilayerPerceptronClassifier ## What changes were proposed in this pull request? MultilayerPerceptronClassifier is missing step size, solver, and weights. Add

spark git commit: [SPARK-15668][ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

2016-06-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 698b6f67c -> 0802ff9f6 [SPARK-15668][ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type ## What changes were proposed in this pull request? ml.feature: update check schema to avoid

spark git commit: [SPARK-15668][ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

2016-06-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/master ccd298eb6 -> 5855e0057 [SPARK-15668][ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type ## What changes were proposed in this pull request? ml.feature: update check schema to avoid confusion

spark git commit: [MINOR] clean up style for storage param setters in ALS

2016-06-02 Thread mlnick
PRs that wasn't cleaned up). ## How was this patch tested? Existing tests - no functionality change. Author: Nick Pentreath <ni...@za.ibm.com> Closes #13480 from MLnick/als-param-minor-cleanup. (cherry picked from commit ccd298eb6794cbcb220ac9889db60d745231e0fe) Signed-off-by: Nick Pent

spark git commit: [MINOR] clean up style for storage param setters in ALS

2016-06-02 Thread mlnick
PRs that wasn't cleaned up). ## How was this patch tested? Existing tests - no functionality change. Author: Nick Pentreath <ni...@za.ibm.com> Closes #13480 from MLnick/als-param-minor-cleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods

2016-06-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 a55454eb6 -> 847ccf793 [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods ## What changes were proposed in this pull request? Add `toDebugString` and `totalNumNodes` to `TreeEnsembleModels` and add

spark git commit: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods

2016-06-02 Thread mlnick
Repository: spark Updated Branches: refs/heads/master d109a1bee -> 72353311d [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods ## What changes were proposed in this pull request? Add `toDebugString` and `totalNumNodes` to `TreeEnsembleModels` and add

spark git commit: [SPARK-15587][ML] ML 2.0 QA: Scala APIs audit for ml.feature

2016-06-01 Thread mlnick
Repository: spark Updated Branches: refs/heads/master a71d1364a -> 07a98ca4c [SPARK-15587][ML] ML 2.0 QA: Scala APIs audit for ml.feature ## What changes were proposed in this pull request? ML 2.0 QA: Scala APIs audit for ml.feature. Mainly include: * Remove seed for

spark git commit: [SPARK-15587][ML] ML 2.0 QA: Scala APIs audit for ml.feature

2016-06-01 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 71e8aaeaa -> beb4ea0b4 [SPARK-15587][ML] ML 2.0 QA: Scala APIs audit for ml.feature ## What changes were proposed in this pull request? ML 2.0 QA: Scala APIs audit for ml.feature. Mainly include: * Remove seed for

spark git commit: [MINOR][DOC][ML] ml.clustering scala & python api doc sync

2016-05-31 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 20a07e443 -> 7feb79085 [MINOR][DOC][ML] ml.clustering scala & python api doc sync ## What changes were proposed in this pull request? Since we done Scala API audit for ml.clustering at #13148, we should also fix and update the

spark git commit: [MINOR][DOC][ML] ml.clustering scala & python api doc sync

2016-05-31 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 9a74de18a -> 594484cd8 [MINOR][DOC][ML] ml.clustering scala & python api doc sync ## What changes were proposed in this pull request? Since we done Scala API audit for ml.clustering at #13148, we should also fix and update the

spark git commit: [SPARK-15492][ML][DOC] Binarization scala example copy & paste to spark-shell error

2016-05-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 53d4abe9e -> e451f7f0c [SPARK-15492][ML][DOC] Binarization scala example copy & paste to spark-shell error ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) The Binarization scala example

spark git commit: [SPARK-15492][ML][DOC] Binarization scala example copy & paste to spark-shell error

2016-05-26 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 36acd53e8 -> c54a07348 [SPARK-15492][ML][DOC] Binarization scala example copy & paste to spark-shell error ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) The Binarization scala

spark git commit: [SPARK-15500][DOC][ML][PYSPARK] Remove default value in Param doc field in ALS

2016-05-25 Thread mlnick
Closes #13277 from MLnick/SPARK-15500-als-remove-default-storage-param. (cherry picked from commit 1cb347fbc446092b478ae0578fc7d1b0626a9294) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-15500][DOC][ML][PYSPARK] Remove default value in Param doc field in ALS

2016-05-25 Thread mlnick
nParam(s)` so that default values are not displayed twice. We can revisit in the case that [SPARK-15130](https://issues.apache.org/jira/browse/SPARK-15130) moves ahead with adding defaults in some way to PySpark param doc fields. Tests N/A. Author: Nick Pentreath <ni...@za.ibm.com> Closes

spark git commit: [SPARK-15442][ML][PYSPARK] Add 'relativeError' param to PySpark QuantileDiscretizer

2016-05-24 Thread mlnick
est and built API docs locally to check HTML doc generation. Author: Nick Pentreath <ni...@za.ibm.com> Closes #13228 from MLnick/SPARK-15442-py-relerror-param. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6075f5b4 Tree: http:

spark git commit: [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression

2016-05-20 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 53c09f065 -> 1346f3cd6 [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression ## What changes were proposed in this pull request? Default value mismatch of param

spark git commit: [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression

2016-05-20 Thread mlnick
Repository: spark Updated Branches: refs/heads/master c32b1b162 -> 4e7393311 [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression ## What changes were proposed in this pull request? Default value mismatch of param

spark git commit: [SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinearRegression

2016-05-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 554e0f30a -> 97fd9a09c [SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinearRegression ## What changes were proposed in this pull request? Add linkPredictionCol to GeneralizedLinearRegression and fix the PyDoc to

spark git commit: [SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinearRegression

2016-05-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/master f5065abf4 -> e71cd96bf [SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinearRegression ## What changes were proposed in this pull request? Add linkPredictionCol to GeneralizedLinearRegression and fix the PyDoc to generate

spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession

2016-05-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 34c743c4b -> b2a4dac2d [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by

spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession

2016-05-19 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 661c21049 -> e2ec32dab [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by

spark git commit: [DOC][MINOR] ml.feature Scala and Python API sync

2016-05-18 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 4987f39ac -> b1bc5ebdd [DOC][MINOR] ml.feature Scala and Python API sync ## What changes were proposed in this pull request? I reviewed Scala and Python APIs for ml.feature and corrected discrepancies. ## How was this patch tested?

spark git commit: [SPARK-14891][ML] Add schema validation for ALS

2016-05-18 Thread mlnick
hor: Nick Pentreath <ni...@za.ibm.com> Closes #12762 from MLnick/SPARK-14891-als-validate-schema. (cherry picked from commit e8b79afa024123f9d4ceaf0a1043a7e37d913a8d) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

  1   2   >