Hi, >Hello. > >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <gillese...@gmail.com> a écrit : >> >> Hi. >> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com <chentao...@qq.com>: >> > Hi, >> > I have created a PR to show my aim: >> > https://github.com/apache/commons-math/pull/126/files >> >> Am I correct that the implementations of "ClustersPointExtractor" >> modify the argument of the "extract" method? >> If so, that seems quite unsafe. I would not expect this behaviour >> in a public API. > >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable >code): > >public class Extractor { > T extract(List<List<T>> list); >}
I have read the exists code again, and I found "EmptyClusterStrategy" and related logic only used in K-Means. And the "Extractor" remove a point from exists Clusters is indeed unsuitable to be a public API(as you mentioned before.) I think another choice is simple make "EmptyClusterStrategy" and related logic protected, then it can be resuse by MiniBatchKMeans. Also after the "CentroidInitializer" has been move out, the "KMeansPlusPlusClusterer" should rename to "KMeansClusterer" with a construct parameter "CentroidInitializer" > >with something along those lines: > >public class ClusterSet<T> { > private Set<T> data; > private List<List<T>> clusters; > > void remove(T e) { > return data.remove(e); > } > > public interface Visitor { > T visit(List<List<T>> list); > } >} > >Key point is separation of concern (selection of element vs >removal of element). I propose the ClusterSet should has more member like "distanceMeasure" to store origin clustering parameters, or to accelerate or support cluster finding or point finding. ```java public class ClusterSet<T> { private List<Cluster> clusters; private DistanceMeasure measure; ... public List<Cluster> closestTopN(Clusterable point); public int pointSize(); } ``` > >Of course the devil is in the details (maintain consistency among >the fields of "ClusterSet", ensure that "Visitor" implementations >cannot modify the internal state, ...). > >> Unless I missed some point, I'd ask again that the API be reviewed >> *before* implementing several features (such as those "extractors") >> on top of something that does not look right. > >One perspective might be to try and come with a design for use in >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet"). > >Best regards, >Gilles > >> >> [...] > >--------------------------------------------------------------------- >To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org