[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38176683
  
--- Diff: docs/mllib-migration-guides.md ---
@@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark 
SPARK_VERSION_SHORT
 
 The migration guide for the current Spark version is kept on the [MLlib 
Programming Guide main page](mllib-guide.html#migration-guide).
 
+## From 1.3 to 1.4
--- End diff --

No content change here. Just moved the paragraphs from `mllib-guide` and 
`ml-guide`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135663172
  
  [Test build #41734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41734/consoleFull)
 for   PR 8498 at commit 
[`2790270`](https://github.com/apache/spark/commit/279027032b21f98170d7729050bdcc697b91fb5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/8498

[SPARK-9671] [MLLIB] re-org user guide and add migration guide

This PR updates the MLlib user guide and adds migration guide for 1.4-1.5.

* merge migration guide for `spark.mllib` and `spark.ml` packages
* remove dependency section from `spark.ml` guide
* move the paragraph about `spark.mllib` and `spark.ml` to the top and 
recommend `spark.ml`
* move Sam's talk to footnote to make the section focus on dependencies

@jkbradley @feynmanliang 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-9671

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8498


commit 279027032b21f98170d7729050bdcc697b91fb5d
Author: Xiangrui Meng m...@databricks.com
Date:   2015-08-28T07:13:15Z

re-org user guide and add migration guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135662135
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135662156
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135735710
  
  [Test build #41734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41734/console)
 for   PR 8498 at commit 
[`2790270`](https://github.com/apache/spark/commit/279027032b21f98170d7729050bdcc697b91fb5d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `* *(Breaking change)* The `apply` and `copy` methods for the case 
class 
[`BoostingStrategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy)
 have been changed because of a modification to the case class fields.  This 
could be an issue for users who use `BoostingStrategy` to set GBT parameters.`
  * `* *(Breaking change)* The return value of 
[`LDA.run`](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has 
changed.  It now returns an abstract class `LDAModel` instead of the concrete 
class `DistributedLDAModel`.  The object of type `LDAModel` can still be cast 
to the appropriate concrete type, which depends on the optimization algorithm.`
  * `* The `scoreCol` output column (with default value score) was 
renamed to be `probabilityCol` (with default value probability).  The type 
was originally `Double` (for the probability of class 1.0), but it is now 
`Vector` (for the probability of each class, to support multiclass 
classification in the future).`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135735979
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41734/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135735975
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38233062
  
--- Diff: docs/mllib-guide.md ---
@@ -56,71 +63,63 @@ This lists functionality included in `spark.mllib`, the 
main MLlib API.
   * [limited-memory BFGS 
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
 * [PMML model export](mllib-pmml-model-export.html)
 
-MLlib is under active development.
-The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
-and the migration guide below will explain all changes between releases.
-
 # spark.ml: high-level APIs for ML pipelines
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-Guides for `spark.ml` include:
+**[spark.ml programming guide](ml-guide.html)** provides an overview of 
the Pipelines API and major
+concepts. It also contains sections on using algorithms within the 
Pipelines API, for example:
 
-* **[spark.ml programming guide](ml-guide.html)**: overview of the 
Pipelines API and major concepts
-* Guides on using algorithms within the Pipelines API:
-  * [Feature transformers](ml-features.html), including a few not in the 
lower-level `spark.mllib` API
-  * [Decision trees](ml-decision-tree.html)
-  * [Ensembles](ml-ensembles.html)
-  * [Linear methods](ml-linear-methods.html)
+* [Feature extractors and transformers](ml-features.html)
+* [Linear methods](ml-linear-methods.html)
+* [Decision trees](ml-decision-tree.html)
+* [Ensembles](ml-ensembles.html)
+* [Artificial neural network](ml-ann.html)
--- End diff --

This is referred to as multilayer perceptron in `ml-guide`, we should be 
consistent with how we refer to it (I prefer MLP because ANN usually include 
other flavors e.g. convnets, RBMs); `ml-ann.md` will also have to be renamed if 
we make this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38232610
  
--- Diff: docs/ml-guide.md ---
@@ -868,34 +859,4 @@ jsc.stop();
 
 /div
 
-# Dependencies
-
-Spark ML currently depends on MLlib and has the same dependencies.
-Please see the [MLlib Dependencies guide](mllib-guide.html#dependencies) 
for more info.
-
-Spark ML also depends upon Spark SQL, but the relevant parts of Spark SQL 
do not bring additional dependencies.
-
-# Migration Guide
-
-## From 1.3 to 1.4
-
-Several major API changes occurred, including:
-* `Param` and other APIs for specifying parameters
-* `uid` unique IDs for Pipeline components
-* Reorganization of certain classes
-Since the `spark.ml` API was an Alpha Component in Spark 1.3, we do not 
list all changes here.
-
-However, now that `spark.ml` is no longer an Alpha Component, we will 
provide details on any API changes for future releases.
-
-## From 1.2 to 1.3
-
-The main API changes are from Spark SQL.  We list the most important 
changes here:
-
-* The old 
[SchemaRDD](http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD)
 has been replaced with 
[DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a 
somewhat modified API.  All algorithms in Spark ML which used to use SchemaRDD 
now use DataFrame.
-* In Spark 1.2, we used implicit conversions from `RDD`s of `LabeledPoint` 
into `SchemaRDD`s by calling `import sqlContext._` where `sqlContext` was an 
instance of `SQLContext`.  These implicits have been moved, so we now call 
`import sqlContext.implicits._`.
-* Java APIs for SQL have also changed accordingly.  Please see the 
examples above and the [Spark SQL Programming 
Guide](sql-programming-guide.html) for details.
-
-Other changes were in `LogisticRegression`:
-
-* The `scoreCol` output column (with default value score) was renamed to 
be `probabilityCol` (with default value probability).  The type was 
originally `Double` (for the probability of class 1.0), but it is now `Vector` 
(for the probability of each class, to support multiclass classification in the 
future).
-* In Spark 1.2, `LogisticRegressionModel` did not include an intercept.  
In Spark 1.3, it includes an intercept; however, it will always be 0.0 since it 
uses the default settings for 
[spark.mllib.LogisticRegressionWithLBFGS](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS).
  The option to use an intercept will be added in the future.
+---
--- End diff --

Why are these dividers only present in `ml-guide` and `mllib-guide` but not 
others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38233148
  
--- Diff: docs/mllib-guide.md ---
@@ -56,71 +63,63 @@ This lists functionality included in `spark.mllib`, the 
main MLlib API.
   * [limited-memory BFGS 
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
 * [PMML model export](mllib-pmml-model-export.html)
 
-MLlib is under active development.
-The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
-and the migration guide below will explain all changes between releases.
-
 # spark.ml: high-level APIs for ML pipelines
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-Guides for `spark.ml` include:
+**[spark.ml programming guide](ml-guide.html)** provides an overview of 
the Pipelines API and major
+concepts. It also contains sections on using algorithms within the 
Pipelines API, for example:
 
-* **[spark.ml programming guide](ml-guide.html)**: overview of the 
Pipelines API and major concepts
-* Guides on using algorithms within the Pipelines API:
-  * [Feature transformers](ml-features.html), including a few not in the 
lower-level `spark.mllib` API
-  * [Decision trees](ml-decision-tree.html)
-  * [Ensembles](ml-ensembles.html)
-  * [Linear methods](ml-linear-methods.html)
+* [Feature extractors and transformers](ml-features.html)
+* [Linear methods](ml-linear-methods.html)
+* [Decision trees](ml-decision-tree.html)
+* [Ensembles](ml-ensembles.html)
+* [Artificial neural network](ml-ann.html)
--- End diff --

Should we just duplicate what's in `ml-guide` since the content is 
identical with just differences in naming


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38232345
  
--- Diff: docs/ml-guide.md ---
@@ -21,19 +21,10 @@ title: Spark ML Programming Guide
 \]`
 
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-See the [Algorithm Guides section](#algorithm-guides) below for guides on 
sub-packages of `spark.ml`, including feature transformers unique to the 
Pipelines API, ensembles, and more.
-
+The `spark.ml` package aims to provide a uniform set of high-level APIs 
that help users create and
+tune practical machine learning pipelines.
--- End diff --

Should we mention Dataframes here in `ml-guide` as well as in `mllib-guide`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38233348
  
--- Diff: docs/mllib-guide.md ---
@@ -56,71 +63,63 @@ This lists functionality included in `spark.mllib`, the 
main MLlib API.
   * [limited-memory BFGS 
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
 * [PMML model export](mllib-pmml-model-export.html)
 
-MLlib is under active development.
-The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
-and the migration guide below will explain all changes between releases.
-
 # spark.ml: high-level APIs for ML pipelines
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-Guides for `spark.ml` include:
+**[spark.ml programming guide](ml-guide.html)** provides an overview of 
the Pipelines API and major
+concepts. It also contains sections on using algorithms within the 
Pipelines API, for example:
 
-* **[spark.ml programming guide](ml-guide.html)**: overview of the 
Pipelines API and major concepts
-* Guides on using algorithms within the Pipelines API:
-  * [Feature transformers](ml-features.html), including a few not in the 
lower-level `spark.mllib` API
-  * [Decision trees](ml-decision-tree.html)
-  * [Ensembles](ml-ensembles.html)
-  * [Linear methods](ml-linear-methods.html)
+* [Feature extractors and transformers](ml-features.html)
+* [Linear methods](ml-linear-methods.html)
+* [Decision trees](ml-decision-tree.html)
+* [Ensembles](ml-ensembles.html)
+* [Artificial neural network](ml-ann.html)
 
 # Dependencies
 
-MLlib uses the linear algebra package
-[Breeze](http://www.scalanlp.org/), which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised
-numerical processing. If natives are not available at runtime, you
-will see a warning message and a pure JVM implementation will be used
-instead.
+MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
+[netlib-java](https://github.com/fommil/netlib-java) for optimised 
numerical processing.
+If natives libraries[^1] are not available at runtime, you will see a 
warning message and a pure JVM
+implementation will be used instead.
 
-To learn more about the benefits and background of system optimised
-natives, you may wish to watch Sam Halliday's ScalaX talk on
-[High Performance Linear Algebra in 
Scala](http://fommil.github.io/scalax14/#/)).
+Due to licensing issues with runtime proprietary binaries, we do not 
include `netlib-java`'s native
+proxies by default.
+To configure `netlib-java` / Breeze to use system optimised binaries, 
include
+`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) 
as a dependency of your
+project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
+platform's additional installation instructions.
 
-Due to licensing issues with runtime proprietary binaries, we do not
-include `netlib-java`'s native proxies by default. To configure
-`netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with
-`-Pnetlib-lgpl`) as a dependency of your project and read the
-[netlib-java](https://github.com/fommil/netlib-java) documentation for
-your platform's additional installation instructions.
+To use MLlib in Python, you will need [NumPy](http://www.numpy.org) 
version 1.4 or newer.
 
-To use MLlib in Python, you will need [NumPy](http://www.numpy.org)
-version 1.4 or newer.
+[^1]: To learn more about the benefits and background of system optimised 
natives, you may wish to
+watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra 
in Scala](http://fommil.github.io/scalax14/#/).
 

+# Migration guide
 
-# Migration Guide
+MLlib is under active development.
+The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
+and the migration guide below will explain all changes between releases.
+
+## From 1.4 to 1.5
 
-For the `spark.ml` package, please see the [spark.ml 

[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135885585
  
Merged into master and branch-1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8498


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135878455
  
  [Test build #41757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41757/console)
 for   PR 8498 at commit 
[`f8efdcc`](https://github.com/apache/spark/commit/f8efdcc6aa630f676fed2e17287f2e9bbbf278ed).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `* *(Breaking change)* The `apply` and `copy` methods for the case 
class 
[`BoostingStrategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy)
 have been changed because of a modification to the case class fields.  This 
could be an issue for users who use `BoostingStrategy` to set GBT parameters.`
  * `* *(Breaking change)* The return value of 
[`LDA.run`](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has 
changed.  It now returns an abstract class `LDAModel` instead of the concrete 
class `DistributedLDAModel`.  The object of type `LDAModel` can still be cast 
to the appropriate concrete type, which depends on the optimization algorithm.`
  * `* The `scoreCol` output column (with default value score) was 
renamed to be `probabilityCol` (with default value probability).  The type 
was originally `Double` (for the probability of class 1.0), but it is now 
`Vector` (for the probability of each class, to support multiclass 
classification in the future).`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135878545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41757/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135878543
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread feynmanliang
Github user feynmanliang commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135884424
  
LGTM, the `ml-ann.md` filename is inconsistent with all referencing text 
(which refers to it as MLP) but that's unrelated to this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38236867
  
--- Diff: docs/ml-guide.md ---
@@ -868,34 +859,4 @@ jsc.stop();
 
 /div
 
-# Dependencies
-
-Spark ML currently depends on MLlib and has the same dependencies.
-Please see the [MLlib Dependencies guide](mllib-guide.html#dependencies) 
for more info.
-
-Spark ML also depends upon Spark SQL, but the relevant parts of Spark SQL 
do not bring additional dependencies.
-
-# Migration Guide
-
-## From 1.3 to 1.4
-
-Several major API changes occurred, including:
-* `Param` and other APIs for specifying parameters
-* `uid` unique IDs for Pipeline components
-* Reorganization of certain classes
-Since the `spark.ml` API was an Alpha Component in Spark 1.3, we do not 
list all changes here.
-
-However, now that `spark.ml` is no longer an Alpha Component, we will 
provide details on any API changes for future releases.
-
-## From 1.2 to 1.3
-
-The main API changes are from Spark SQL.  We list the most important 
changes here:
-
-* The old 
[SchemaRDD](http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD)
 has been replaced with 
[DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a 
somewhat modified API.  All algorithms in Spark ML which used to use SchemaRDD 
now use DataFrame.
-* In Spark 1.2, we used implicit conversions from `RDD`s of `LabeledPoint` 
into `SchemaRDD`s by calling `import sqlContext._` where `sqlContext` was an 
instance of `SQLContext`.  These implicits have been moved, so we now call 
`import sqlContext.implicits._`.
-* Java APIs for SQL have also changed accordingly.  Please see the 
examples above and the [Spark SQL Programming 
Guide](sql-programming-guide.html) for details.
-
-Other changes were in `LogisticRegression`:
-
-* The `scoreCol` output column (with default value score) was renamed to 
be `probabilityCol` (with default value probability).  The type was 
originally `Double` (for the probability of class 1.0), but it is now `Vector` 
(for the probability of each class, to support multiclass classification in the 
future).
-* In Spark 1.2, `LogisticRegressionModel` did not include an intercept.  
In Spark 1.3, it includes an intercept; however, it will always be 0.0 since it 
uses the default settings for 
[spark.mllib.LogisticRegressionWithLBFGS](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS).
  The option to use an intercept will be added in the future.
+---
--- End diff --

There is one footnote in `mllib-guide.md` with this PR. I should remove the 
one in `ml-guide.md`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38236878
  
--- Diff: docs/ml-guide.md ---
@@ -21,19 +21,10 @@ title: Spark ML Programming Guide
 \]`
 
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-See the [Algorithm Guides section](#algorithm-guides) below for guides on 
sub-packages of `spark.ml`, including feature transformers unique to the 
Pipelines API, ensembles, and more.
-
+The `spark.ml` package aims to provide a uniform set of high-level APIs 
that help users create and
+tune practical machine learning pipelines.
--- End diff --

okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135873183
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135873172
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8498#discussion_r38236937
  
--- Diff: docs/mllib-guide.md ---
@@ -56,71 +63,63 @@ This lists functionality included in `spark.mllib`, the 
main MLlib API.
   * [limited-memory BFGS 
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
 * [PMML model export](mllib-pmml-model-export.html)
 
-MLlib is under active development.
-The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
-and the migration guide below will explain all changes between releases.
-
 # spark.ml: high-level APIs for ML pipelines
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to 
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning 
pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha 
component, although many elements of it are still `Experimental` or 
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` 
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more 
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can 
optionally contribute
-to `spark.ml`.
-
-Guides for `spark.ml` include:
+**[spark.ml programming guide](ml-guide.html)** provides an overview of 
the Pipelines API and major
+concepts. It also contains sections on using algorithms within the 
Pipelines API, for example:
 
-* **[spark.ml programming guide](ml-guide.html)**: overview of the 
Pipelines API and major concepts
-* Guides on using algorithms within the Pipelines API:
-  * [Feature transformers](ml-features.html), including a few not in the 
lower-level `spark.mllib` API
-  * [Decision trees](ml-decision-tree.html)
-  * [Ensembles](ml-ensembles.html)
-  * [Linear methods](ml-linear-methods.html)
+* [Feature extractors and transformers](ml-features.html)
+* [Linear methods](ml-linear-methods.html)
+* [Decision trees](ml-decision-tree.html)
+* [Ensembles](ml-ensembles.html)
+* [Artificial neural network](ml-ann.html)
--- End diff --

okay. I tried to keep this list as examples but not a full list. It would 
be hard to sync two full lists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8498#issuecomment-135874304
  
  [Test build #41757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41757/consoleFull)
 for   PR 8498 at commit 
[`f8efdcc`](https://github.com/apache/spark/commit/f8efdcc6aa630f676fed2e17287f2e9bbbf278ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org