[ 
https://issues.apache.org/jira/browse/MADLIB-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-1380.
-------------------------------------
    Resolution: Fixed

https://github.com/apache/madlib/pull/433
https://github.com/apache/madlib/pull/441

> Select number of centroids in k-means
> -------------------------------------
>
>                 Key: MADLIB-1380
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1380
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: k-Means Clustering
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.17
>
>
> {code}
> kmeans_random( rel_source,
>                expr_point,
>                k,                             -- can be a single value like 
> now or an array of k values
>                fn_dist,                       -- optional
>                agg_centroid,                  -- optional
>                max_num_iterations,            -- optional
>                min_frac_reassigned,           -- optional
>                k_selection_algorithm    -- optional (only applies if 'k' 
> parameter is an array with multiple k values)
>              )
> {code}
> {code}
> kmeanspp( rel_source,
>           expr_point,
>           k,                          -- can be a single value like now or an 
> array of k values
>           fn_dist,                                            -- optional
>           agg_centroid,                                       -- optional
>           max_num_iterations,                 -- optional
>           min_frac_reassigned,                        -- optional
>           seeding_sample_ratio,                       -- optional
>           k_selection_algorithm               -- optional (only applies if 
> 'k' parameter is an array with multiple k values)
>         )
> {code}
> {code}
> k
> INTEGER of INTEGER[]. The number of centroids to calculate.  Can be a single 
> value
> or an array of k values to explore.  If array of k values given, the 
> parameter 'k_selection_algorithm'
> determines the evaluation method.
> {code}
> {code}
> k_selection_algorithm (optional)
> TEXT, default: 'elbow'. Method to evaluate number of centroids k.
> Only applies if the parameter 'k' is an array with multiple k values.
> Currently two approaches are supported: 'elbow', and 'silhouette'. 
> The text can be any subset of the strings; for e.g., 'silh' will use the 
> silhouette method.
> {code}
> e.g., 
> {code}
> SELECT * FROM madlib.kmeanspp (
>                                                               'km_sample',    
>                 -- rel_source
>                                                               'points',       
>                         -- expr_point
>                                                               'ARRAY[2, 4, 6, 
> 8, 10]',        -- k       
>                                                       
> 'madlib.squared_dist_norm2',    -- fn_dist
>                                                       'madlib.avg',           
>                 -- agg_centroid
>                                                       20,                     
>                         -- max_num_iterations
>                                                       0.001,                  
>                 -- min_frac_reassigned
>                                                       'elbow'                 
>                 -- k_selection_algorithm
>                                                       );
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to