[ https://issues.apache.org/jira/browse/MATH-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073100#comment-15073100 ]
Thomas Neidhart commented on MATH-1171: --------------------------------------- In commit f0943a724, I have added an example for the userguide how to cluster images with the current API. I did some first experiments to improve the API with a Dataset interface that provides access to all elements to cluster without the need to create explicit Clusterable instances. In order to make the case of image clustering efficient, it would require some more refactoring to avoid unneeded allocations of double arrays (as usually an image is a large array or its pixels / samples). The distance API currently only works with arrays without offset / length arguments, thus for each pixel a separate array must be created, which is more or less the same as creating a Clusterable. Changing the API to support distance calculations in arrays with offsets / length parameters would allow to create a Dataset that directly operates on the image data without creating intermediate objects. This might be beneficial for other use-cases as well. > clustering implementations have unnecessary overhead > ---------------------------------------------------- > > Key: MATH-1171 > URL: https://issues.apache.org/jira/browse/MATH-1171 > Project: Commons Math > Issue Type: Bug > Reporter: Mark > > I want to apply clustering algorithms like KMeansPlusPlusClusterer to > pictures. And creating a point instance for each pixel is not a good idea. > Therefore the interface should not be based on Collections, but on some > interface that provides sort of "get(index)" accessors to data that is > potentially stored in a pixel array etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)