[jira] [Updated] (MATH-1371) Provide accelerated kmeans++ implementation
[ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles Sadowski updated MATH-1371: -- Fix Version/s: (was: 4.0) 4.X > Provide accelerated kmeans++ implementation > --- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement >Reporter: Artem Barger >Assignee: Artem Barger >Priority: Major > Fix For: 4.X > > Attachments: ElkanKmeansPlusPlusClusterer.java, > ElkanKmeansPlusPlusClustererTest.java > > > There is an updated version of kmeans++ algorithm available, which is > published in: Elkan, Charles. "Using the triangle inequality to accelerate > k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of > distances between centers and points when there is no need for that. For > example after the update cluster center haven't moved too far from the point > therefore no change in point assignment. The accelerated algorithm avoids > unnecessary distance calculations by applying the triangle inequality in two > different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (MATH-1371) Provide accelerated kmeans++ implementation
[ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Tompkins updated MATH-1371: --- Fix Version/s: 4.0 > Provide accelerated kmeans++ implementation > --- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement >Reporter: Artem Barger >Assignee: Artem Barger > Fix For: 4.0 > > Attachments: ElkanKmeansPlusPlusClusterer.java, > ElkanKmeansPlusPlusClustererTest.java > > > There is an updated version of kmeans++ algorithm available, which is > published in: Elkan, Charles. "Using the triangle inequality to accelerate > k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of > distances between centers and points when there is no need for that. For > example after the update cluster center haven't moved too far from the point > therefore no change in point assignment. The accelerated algorithm avoids > unnecessary distance calculations by applying the triangle inequality in two > different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MATH-1371) Provide accelerated kmeans++ implementation
[ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Barger updated MATH-1371: --- Attachment: (was: ElkanKmeansPlusPlusClusterer.java) > Provide accelerated kmeans++ implementation > --- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement >Reporter: Artem Barger >Assignee: Artem Barger > Attachments: ElkanKmeansPlusPlusClusterer.java, > ElkanKmeansPlusPlusClustererTest.java > > > There is an updated version of kmeans++ algorithm available, which is > published in: Elkan, Charles. "Using the triangle inequality to accelerate > k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of > distances between centers and points when there is no need for that. For > example after the update cluster center haven't moved too far from the point > therefore no change in point assignment. The accelerated algorithm avoids > unnecessary distance calculations by applying the triangle inequality in two > different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MATH-1371) Provide accelerated kmeans++ implementation
[ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Barger updated MATH-1371: --- Attachment: ElkanKmeansPlusPlusClustererTest.java ElkanKmeansPlusPlusClusterer.java Update version of kmeans implementation, all comments has been addressed as requested. Unit test added. > Provide accelerated kmeans++ implementation > --- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement >Reporter: Artem Barger >Assignee: Artem Barger > Attachments: ElkanKmeansPlusPlusClusterer.java, > ElkanKmeansPlusPlusClusterer.java, ElkanKmeansPlusPlusClustererTest.java > > > There is an updated version of kmeans++ algorithm available, which is > published in: Elkan, Charles. "Using the triangle inequality to accelerate > k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of > distances between centers and points when there is no need for that. For > example after the update cluster center haven't moved too far from the point > therefore no change in point assignment. The accelerated algorithm avoids > unnecessary distance calculations by applying the triangle inequality in two > different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MATH-1371) Provide accelerated kmeans++ implementation
[ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Barger updated MATH-1371: --- Attachment: ElkanKmeansPlusPlusClusterer.java My Java implementation of the algorithm described here: Elkan, Charles. "Using the triangle inequality to accelerate k-means." ICML. Vol. 3. 2003. https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf. Used recently in my research project, found it actually able to speed up order of magnitude kmeans clustering algorithm provided by CM. Also not sure whenever Elkan implementation of kmean++ has actually current solution. > Provide accelerated kmeans++ implementation > --- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement >Reporter: Artem Barger >Assignee: Artem Barger > Attachments: ElkanKmeansPlusPlusClusterer.java > > > There is an updated version of kmeans++ algorithm available, which is > published in: Elkan, Charles. "Using the triangle inequality to accelerate > k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of > distances between centers and points when there is no need for that. For > example after the update cluster center haven't moved too far from the point > therefore no change in point assignment. The accelerated algorithm avoids > unnecessary distance calculations by applying the triangle inequality in two > different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)