[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-19 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117545657
  
--- Diff: docs/ml-guide.md ---
@@ -66,41 +66,59 @@ To use MLlib in Python, you will need 
[NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised 
natives, you may wish to
 watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra 
in Scala](http://fommil.github.io/scalax14/#/).
 
+# Highlights in 2.2
+
+The list below highlights some of the new features and enhancements added 
to MLlib in the `2.2`
+release of Spark:
+
+* `ALS` methods for _top-k_ recommendations for all users or items, 
matching the functionality
+ in `mllib` 
([SPARK-19535](https://issues.apache.org/jira/browse/SPARK-19535)). Performance
+ was also improved for both `ml` and `mllib`
+ ([SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968) and
+ [SPARK-20587](https://issues.apache.org/jira/browse/SPARK-20587))
+* `Correlation` and `ChiSquareTest` stats functions for `DataFrames`
+ ([SPARK-19635](https://issues.apache.org/jira/browse/SPARK-19635) and
--- End diff --

Ah right thanks for catching that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117527699
  
--- Diff: docs/ml-guide.md ---
@@ -66,41 +66,59 @@ To use MLlib in Python, you will need 
[NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised 
natives, you may wish to
 watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra 
in Scala](http://fommil.github.io/scalax14/#/).
 
+# Highlights in 2.2
+
+The list below highlights some of the new features and enhancements added 
to MLlib in the `2.2`
+release of Spark:
+
+* `ALS` methods for _top-k_ recommendations for all users or items, 
matching the functionality
+ in `mllib` 
([SPARK-19535](https://issues.apache.org/jira/browse/SPARK-19535)). Performance
+ was also improved for both `ml` and `mllib`
+ ([SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968) and
+ [SPARK-20587](https://issues.apache.org/jira/browse/SPARK-20587))
+* `Correlation` and `ChiSquareTest` stats functions for `DataFrames`
+ ([SPARK-19635](https://issues.apache.org/jira/browse/SPARK-19635) and
--- End diff --

Hi, @MLnick . `Correlation` issue is SPARK-19636.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117174667
  
--- Diff: docs/ml-guide.md ---
@@ -72,35 +72,26 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
 and the migration guide below will explain all changes between releases.
 
-## From 2.0 to 2.1
+## From 2.1 to 2.2
 
 ### Breaking changes
- 
-**Deprecated methods removed**
 
-* `setLabelCol` in `feature.ChiSqSelectorModel`
-* `numTrees` in `classification.RandomForestClassificationModel` (This now 
refers to the Param called `numTrees`)
-* `numTrees` in `regression.RandomForestRegressionModel` (This now refers 
to the Param called `numTrees`)
-* `model` in `regression.LinearRegressionSummary`
-* `validateParams` in `PipelineStage`
-* `validateParams` in `Evaluator`
+There are no breaking changes.
 
 ### Deprecations and changes of behavior
 
 **Deprecations**
 
-* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592):
-  Deprecate all Param setter methods except for input/output column Params 
for `DecisionTreeClassificationModel`, `GBTClassificationModel`, 
`RandomForestClassificationModel`, `DecisionTreeRegressionModel`, 
`GBTRegressionModel` and `RandomForestRegressionModel`
+There are no deprecations.
 
 **Changes of behavior**
--- End diff --

Thanks - didn't catch that one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117155950
  
--- Diff: docs/ml-guide.md ---
@@ -72,35 +72,26 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
 and the migration guide below will explain all changes between releases.
 
-## From 2.0 to 2.1
+## From 2.1 to 2.2
 
 ### Breaking changes
- 
-**Deprecated methods removed**
 
-* `setLabelCol` in `feature.ChiSqSelectorModel`
-* `numTrees` in `classification.RandomForestClassificationModel` (This now 
refers to the Param called `numTrees`)
-* `numTrees` in `regression.RandomForestRegressionModel` (This now refers 
to the Param called `numTrees`)
-* `model` in `regression.LinearRegressionSummary`
-* `validateParams` in `PipelineStage`
-* `validateParams` in `Evaluator`
+There are no breaking changes.
 
 ### Deprecations and changes of behavior
 
 **Deprecations**
 
-* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592):
-  Deprecate all Param setter methods except for input/output column Params 
for `DecisionTreeClassificationModel`, `GBTClassificationModel`, 
`RandomForestClassificationModel`, `DecisionTreeRegressionModel`, 
`GBTRegressionModel` and `RandomForestRegressionModel`
+There are no deprecations.
 
 **Changes of behavior**
--- End diff --

Should we include #17233 in this section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-16 Thread MLnick
GitHub user MLnick opened a pull request:

https://github.com/apache/spark/pull/17996

[SPARK-20506][DOCS] 2.2 migration guide

Update ML guide for migration `2.1` -> `2.2` and the previous version 
migration guide section.

## How was this patch tested?

Build doc locally.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MLnick/spark SPARK-20506-2.2-migration-guide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17996


commit ba1097686041e6d57183e7979814dff0ff7ffa5a
Author: Nick Pentreath 
Date:   2017-05-16T08:20:28Z

Migration guide 2.1->2.2

commit 5a3d87b4a58b1c3db6ce49fe3a6aa9caf8ed9b42
Author: Nick Pentreath 
Date:   2017-05-16T08:34:56Z

Bump expected parity release number




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org