GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/12274

    [WIP] [SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib 
APIs

    ## What changes were proposed in this pull request?
    
    This PR updates MLlib APIs to accept `Dataset[_]` as input where 
`DataFrame` was the input type. This PR doesn't change the output type. In 
Java, `Dataset[_]` maps to `Dataset<?>`, which includes `Dataset<Row>`. Some 
implementations were changed to return `DataFrame`. Tests and examples were 
updated.
    
    TODOs:
    - [ ] update MiMaExcludes
    - [ ] Python
    - [ ] add a new test to accept Dataset[LabeledPoint]
    
    ## How was this patch tested?
    
    Exiting unit tests with some modifications.
    
    cc: @rxin @jkbradley 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-14500

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12274
    
----
commit 7b8fe962c90fec92b0c35f911e490aeb358c8c8a
Author: Xiangrui Meng <m...@databricks.com>
Date:   2016-04-09T01:53:16Z

    accept Dataset[_] instead of DataFrame in MLlib

commit 67fd643a401544e52fc98e87e7552c7b30460ce2
Author: Xiangrui Meng <m...@databricks.com>
Date:   2016-04-09T16:19:54Z

    fix compile

commit 8420014fea9fdaced225c3785908898debb7aff3
Author: Xiangrui Meng <m...@databricks.com>
Date:   2016-04-09T16:54:40Z

    fix tests

commit 82ee0d9c23a403b635b88b58cbd2f3e2cb5a6321
Author: Xiangrui Meng <m...@databricks.com>
Date:   2016-04-09T17:01:09Z

    Merge remote-tracking branch 'apache/master' into SPARK-14500

commit 3f765dd75df8afb444bb433a209cf4237f584b29
Author: Xiangrui Meng <m...@databricks.com>
Date:   2016-04-09T17:27:40Z

    fix examples

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to