[ 
https://issues.apache.org/jira/browse/MATH-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073100#comment-15073100
 ] 

Thomas Neidhart commented on MATH-1171:
---------------------------------------

In commit f0943a724, I have added an example for the userguide how to cluster 
images with the current API.

I did some first experiments to improve the API with a Dataset interface that 
provides access to all elements to cluster without the need to create explicit 
Clusterable instances.

In order to make the case of image clustering efficient, it would require some 
more refactoring to avoid unneeded allocations of double arrays (as usually an 
image is a large array or its pixels / samples). The distance API currently 
only works with arrays without offset / length arguments, thus for each pixel a 
separate array must be created, which is more or less the same as creating a 
Clusterable.

Changing the API to support distance calculations in arrays with offsets / 
length parameters would allow to create a Dataset that directly operates on the 
image data without creating intermediate objects. This might be beneficial for 
other use-cases as well.

> clustering implementations have unnecessary overhead
> ----------------------------------------------------
>
>                 Key: MATH-1171
>                 URL: https://issues.apache.org/jira/browse/MATH-1171
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Mark
>
> I want to apply clustering algorithms like KMeansPlusPlusClusterer to 
> pictures. And creating a point instance for each pixel is not a good idea.
> Therefore the interface should not be based on Collections, but on some 
> interface that provides sort of "get(index)" accessors to data that is 
> potentially stored in a pixel array etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to