[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

mengxr Fri, 28 Aug 2015 12:46:24 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8498#discussion_r38236867
  
    --- Diff: docs/ml-guide.md ---
    @@ -868,34 +859,4 @@ jsc.stop();
     
     </div>
     
    -# Dependencies
    -
    -Spark ML currently depends on MLlib and has the same dependencies.
    -Please see the [MLlib Dependencies guide](mllib-guide.html#dependencies) 
for more info.
    -
    -Spark ML also depends upon Spark SQL, but the relevant parts of Spark SQL 
do not bring additional dependencies.
    -
    -# Migration Guide
    -
    -## From 1.3 to 1.4
    -
    -Several major API changes occurred, including:
    -* `Param` and other APIs for specifying parameters
    -* `uid` unique IDs for Pipeline components
    -* Reorganization of certain classes
    -Since the `spark.ml` API was an Alpha Component in Spark 1.3, we do not 
list all changes here.
    -
    -However, now that `spark.ml` is no longer an Alpha Component, we will 
provide details on any API changes for future releases.
    -
    -## From 1.2 to 1.3
    -
    -The main API changes are from Spark SQL.  We list the most important 
changes here:
    -
    -* The old 
[SchemaRDD](http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD)
 has been replaced with 
[DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a 
somewhat modified API.  All algorithms in Spark ML which used to use SchemaRDD 
now use DataFrame.
    -* In Spark 1.2, we used implicit conversions from `RDD`s of `LabeledPoint` 
into `SchemaRDD`s by calling `import sqlContext._` where `sqlContext` was an 
instance of `SQLContext`.  These implicits have been moved, so we now call 
`import sqlContext.implicits._`.
    -* Java APIs for SQL have also changed accordingly.  Please see the 
examples above and the [Spark SQL Programming 
Guide](sql-programming-guide.html) for details.
    -
    -Other changes were in `LogisticRegression`:
    -
    -* The `scoreCol` output column (with default value "score") was renamed to 
be `probabilityCol` (with default value "probability").  The type was 
originally `Double` (for the probability of class 1.0), but it is now `Vector` 
(for the probability of each class, to support multiclass classification in the 
future).
    -* In Spark 1.2, `LogisticRegressionModel` did not include an intercept.  
In Spark 1.3, it includes an intercept; however, it will always be 0.0 since it 
uses the default settings for 
[spark.mllib.LogisticRegressionWithLBFGS](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS).
  The option to use an intercept will be added in the future.
    +---
    --- End diff --
    
    There is one footnote in `mllib-guide.md` with this PR. I should remove the 
one in `ml-guide.md`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9671] [MLLIB] re-org user guide and add...

Reply via email to