[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

ludatabricks Mon, 16 Apr 2018 11:00:59 -0700

GitHub user ludatabricks opened a pull request:

    https://github.com/apache/spark/pull/21081


    [SPARK-23975][ML]Allow Clustering to take Arrays of Double as input features

    ## What changes were proposed in this pull request?
    
    - Multiple possible input types is added in validateAndTransformSchema() 
and computeCost() while checking column type
    
    - Add if statement in transform() to support array type as featuresCol
    
    - Add the case statement in fit() while selecting columns from dataset
    
    These changes will be applied to KMeans first, then to other clustering 
method
    
    ## How was this patch tested?
    
    unit test is added
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ludatabricks/spark-1 SPARK-23975

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21081
    
----
commit ed890d35ff1e9edbe2a557f68732835b3e911906
Author: Lu WANG <lu.wang@...>
Date:   2018-04-16T17:32:02Z

    add Array input support for KMeans

commit badb0cc5ca6ca69bb8e8fc0fce5ea05a4100bca0
Author: Lu WANG <lu.wang@...>
Date:   2018-04-16T17:49:00Z

    remove redundent code

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

Reply via email to