Hi,
>Le ven. 20 mars 2020 à 04:47, [email protected] <[email protected]> a écrit :
>>
>> Hi,
>>
>> >Hello.
>> >
>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[email protected]> a
>> >écrit :
>> >>
>> >> Hi.
>> >>
>> >> 2020-03-18 15:10 UTC+01:00, [email protected] <[email protected]>:
>> >> > Hi,
>> >> > I have created a PR to show my aim:
>> >> > https://github.com/apache/commons-math/pull/126/files
>> >>
>> >> Am I correct that the implementations of "ClustersPointExtractor"
>> >> modify the argument of the "extract" method?
>> >> If so, that seems quite unsafe. I would not expect this behaviour
>> >> in a public API.
>> >
>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>> >code):
>> >
>> >public class Extractor {
>> > T extract(List<List<T>> list);
>> >}
>>
>> I have read the exists code again, and I found "EmptyClusterStrategy"
>> and related logic only used in K-Means.
>> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
>> to be a public API(as you mentioned before.)
>> I think another choice is simple make "EmptyClusterStrategy" and related
>> logic protected,
>> then it can be resuse by MiniBatchKMeans.
>>
>> Also after the "CentroidInitializer" has been move out, the
>> "KMeansPlusPlusClusterer"
>> should rename to "KMeansClusterer" with a construct parameter
>>"CentroidInitializer"
>
>If you think that it makes sense to define a class hierarchy for
>"KMeans"-based algorithms, please do so. But note that "protected"
>is only slightly better than "public" (from a maintenance POV, it isn't
>better, since we'll be required to maintain compatibility all the same).
>Perhaps the base classe(s) can be package-private...
OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>
>> >
>> >with something along those lines:
>> >
>> >public class ClusterSet<T> {
>> > private Set<T> data;
>> > private List<List<T>> clusters;
>> >
>> > void remove(T e) {
>> > return data.remove(e);
>> > }
>> >
>> > public interface Visitor {
>> > T visit(List<List<T>> list);
>> > }
>> >}
>> >
>> >Key point is separation of concern (selection of element vs
>> >removal of element).
>>
>> I propose the ClusterSet should has more member like "distanceMeasure"
>> to store origin clustering parameters,
>> or to accelerate or support cluster finding or point finding.
>> ```java
>> public class ClusterSet<T> {
>> private List<Cluster> clusters;
>> private DistanceMeasure measure;
>> ...
>> public List<Cluster> closestTopN(Clusterable point);
>
>How about providing a comparator factory?
>
>public <T extends Clusterable> static
> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>DistanceMeasure m) {
> return new Comparator<>() {
> public int compare(Cluster<T> c1, Cluster<T> c2) {
> final double d1 = c1.distance(p, m); // "distance" method
>to be added to the interface (?).
> final double d2 = c2.distance(p, m);
> return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
> }
> }
>}
How do you think about the "distanceMeasure" to be a memeber of "ClusterSet",
In my opinion "ClusterSet" should contains all the parameters and result for
clustering and clusters.
>
>Then, if also providing an accessor to the clusters:
>
>public List<Cluster<T>> getClusters() {
> return Collections.umodifiableList(clusters);
>}
>
>users can do whatever they want, a.o. easily implement the
>ranking method which you suggest:
>
>/** @return clusters in descending order of proximity. */
>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>DistanceMeasure m) {
> final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
> Collections.sort(cList, set.getComparator(p, m));
> return cList;
>}
>
>>
>> public int pointSize();
>> }
>> ```
>
>A priori, I'd favour more encapsulation, in order to ensure that
>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>data-structures, and provide access through identifiers that are
>atomically created when clusters are added to the instance, and
>so on...
>
>It would be great if you can come up with a concrete (and fully
>tested) implementation of "ClusterSet". ;-)
I would like to try, but it is a big change to API, we need more time and
discussions.
>
>Best,
>Gilles
>
>> >
>> >Of course the devil is in the details (maintain consistency among
>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>> >cannot modify the internal state, ...).
>> >
>> >> Unless I missed some point, I'd ask again that the API be reviewed
>> >> *before* implementing several features (such as those "extractors")
>> >> on top of something that does not look right.
>> >
>> >One perspective might be to try and come with a design for use in
>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>> >
>> >Best regards,
>> >Gilles
>> >
>> >> >> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
>
>