[ https://issues.apache.org/jira/browse/MADLIB-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan resolved MADLIB-1380. ------------------------------------- Resolution: Fixed https://github.com/apache/madlib/pull/433 https://github.com/apache/madlib/pull/441 > Select number of centroids in k-means > ------------------------------------- > > Key: MADLIB-1380 > URL: https://issues.apache.org/jira/browse/MADLIB-1380 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: k-Means Clustering > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.17 > > > {code} > kmeans_random( rel_source, > expr_point, > k, -- can be a single value like > now or an array of k values > fn_dist, -- optional > agg_centroid, -- optional > max_num_iterations, -- optional > min_frac_reassigned, -- optional > k_selection_algorithm -- optional (only applies if 'k' > parameter is an array with multiple k values) > ) > {code} > {code} > kmeanspp( rel_source, > expr_point, > k, -- can be a single value like now or an > array of k values > fn_dist, -- optional > agg_centroid, -- optional > max_num_iterations, -- optional > min_frac_reassigned, -- optional > seeding_sample_ratio, -- optional > k_selection_algorithm -- optional (only applies if > 'k' parameter is an array with multiple k values) > ) > {code} > {code} > k > INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single > value > or an array of k values to explore. If array of k values given, the > parameter 'k_selection_algorithm' > determines the evaluation method. > {code} > {code} > k_selection_algorithm (optional) > TEXT, default: 'elbow'. Method to evaluate number of centroids k. > Only applies if the parameter 'k' is an array with multiple k values. > Currently two approaches are supported: 'elbow', and 'silhouette'. > The text can be any subset of the strings; for e.g., 'silh' will use the > silhouette method. > {code} > e.g., > {code} > SELECT * FROM madlib.kmeanspp ( > 'km_sample', > -- rel_source > 'points', > -- expr_point > 'ARRAY[2, 4, 6, > 8, 10]', -- k > > 'madlib.squared_dist_norm2', -- fn_dist > 'madlib.avg', > -- agg_centroid > 20, > -- max_num_iterations > 0.001, > -- min_frac_reassigned > 'elbow' > -- k_selection_algorithm > ); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)