Hi Gilles:
  Sorry for my unfamiliar in contribution. I started a new PR for most of 
your suggestion:
https://github.com/apache/commons-math/pull/120
I remain have one question below:


------------------ Original ------------------
From:&nbsp;"Gilles Sadowski"<gillese...@gmail.com&gt;;
Date:&nbsp;Mon, Feb 24, 2020 09:52 PM
To:&nbsp;"dev"<dev@commons.apache.org&gt;;

Subject:&nbsp;Re: [math] Discuss: New feature MiniBatchKMeansClusterer



&gt;Hi.
&gt;
&gt;Le sam. 22 févr. 2020 à 14:37, CT <chentao...@qq.com&gt; a écrit :
&gt;&gt;
&gt;&gt; Hi Gilles:
&gt;&gt;&nbsp;&nbsp; I really appricate for your patient to help me to solve 
the mail sending problem, I try to set the only setting about charset "Use 
&gt;Unicode" for this mail.
&gt;
&gt;The contents seems fine.&nbsp; However, there is something wrong: replying 
to your
&gt;message sends to you instead of the mailing list...
&gt;
&gt;&gt;&nbsp;&nbsp; I have created a pull request: 
https://github.com/apache/commons-math/pull/118
&gt;
&gt;That PR fails the build:
&gt;&nbsp; 
&nbsp;https://travis-ci.org/apache/commons-math/builds/653293451?utm_source=github_status&amp;utm_medium=notification
&gt;
&gt;Also, the code doesn't take into account all the remarks which I
&gt;made in previous comments (e.g. file JIRA reports to allow tracking
&gt;of changes to existing code and not mix those with new code).
&gt;


I did not get what kinds of problem should tracking by JIRA.
In my opinion all the change is related to my PR.&nbsp;
The JIRA issue about Feature MiniBatch has been closed.
Should I fire a new issue?

&gt;In addition, you should avoid mixing massive cosmetic changes (e.g.
&gt;Javadoc reformatting) within a commit than contains other types of
&gt;change:
&gt;&nbsp; &nbsp; 
https://github.com/apache/commons-math/pull/118/commits/47a055e6264d084547854f9290461f020f2131cf
&gt;
&gt;&gt;&nbsp;&nbsp; And a comparsion between KMeans and MiniBatchKMeans by 
using the &gt;"org.apache.commons.math4.userguide.ClusterAlgorithmComparison":
&gt;
&gt;Functionality seems to work fine, but ability to review incremental
&gt;changes is extremely important in a collaborative project.
&gt;


Thanks for your&nbsp;remind, and I created a new PR that improve the problems 
one by one.

&gt;Thanks,
&gt;Gilles

Thank you,
Chentao

&gt;
&gt; Hello.
&gt;
&gt; Le mar. 18 févr. 2020 à 04:49, 陈 涛 <chentao...@hotmail.com&gt; a écrit :
&gt; &gt;
&gt; &gt; Hi Gilles:
&gt; &gt;
&gt; &gt;&nbsp;&nbsp;&nbsp; I really do not know if anyone received my last 
mail, no one replay me for a long time so I send it again and copy to you with 
another email system.
&gt;
&gt; Sorry for the delay. :-}
&gt;
&gt; &gt;
&gt; &gt; &gt; Some remarks:
&gt; &gt;
&gt; &gt; &gt; * I didn't get why the "KMeansPlusPlusCentroidInitializer" class
&gt; &gt; &gt; does not call the existing "KMeansPlusPlusClusterer".
&gt; &gt; &gt; Code seems duplicated: As there is a case for reuse, the 
currently
&gt; &gt; &gt; "private" centroid initialization code should be factored out.
&gt; &gt;
&gt; &gt; This is alpha version for discuss the "MiniBatchKMeansClusterer" 
algorithm,
&gt;
&gt; I guess that you mean that we discuss your implementation of the
&gt; algorithm referenced in the Javadoc.
&gt;
&gt; &gt; and when "centroidOf" is extracted from "KMeansPlusPlusClusterer",
&gt; &gt;
&gt; &gt; the "KMeansPlusPlusClusterer" is not "KMeansPlusPlusClusterer" 
anymore but "KMeansClusterer",
&gt;
&gt; I don't follow.
&gt;
&gt; &gt;
&gt; &gt; this is a significant change, so I did not reactor it.
&gt;
&gt; Significant changes are welcome (since the next release will contain
&gt; other major changes anyways) if they improve the code base (like e.g.
&gt; reducing code duplication).
&gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; &gt; * In "CentroidInitializer", I'd rename "chooseCentroids" to 
emphasize
&gt; &gt; &gt; that a computation is performed (as opposed to selecting from an
&gt; &gt; &gt; existing data structure).
&gt;
&gt; I think I'd prefer "selectCentroids".
&gt;
&gt; &gt;
&gt; &gt; It is extract from "KMeansPlusPlusClusterer.centroidOf", should 
remain be "centroidOf"?
&gt;
&gt; I don't understand.
&gt;
&gt; It would be easier if you create a pull request, so that we can clearly see
&gt; what codes are added/removed/changed.
&gt;
&gt; &gt;
&gt; &gt; The subclass "RandomCentroidInitializer" and 
"KMeansPlusPlusCentroidInitializer" indicate the algorithm used.
&gt; &gt;
&gt; &gt;
&gt; &gt; &gt; * Not convinced that there should be so many constructors (in 
most
&gt; &gt; &gt; cases, it makes no sense to pick default values that are likely 
to
&gt; &gt; &gt; be heavily application-dependent.
&gt; &gt;
&gt; &gt; I can add more constructors.
&gt;
&gt; No, the constructors with default values clutter the API, for
&gt; no obvious gain IMHO.
&gt; [If the default values make sense, they must be documented.]
&gt;
&gt; &gt;
&gt; &gt; I'd like a builder class more than constructors, but does not meet 
the historical code style.
&gt;
&gt; Now is the time for improving the API.
&gt; It would be quite helpful to create a report on JIRA with "sub-tasks"
&gt; for all such API proposed changes.
&gt;
&gt; &gt; &gt; * Input should generally be validated: e.g. the maximum number of
&gt; &gt; &gt; iterations should not be changed unwittingly; rather, an 
exception
&gt; &gt; &gt; should be raised if the user passed a negative value.
&gt; &gt;
&gt; &gt; Thanks for your advices, I will improve these.
&gt; &gt;
&gt; &gt; &gt; Could be nice to illustrate (not just with a picture, but in a 
table
&gt; &gt; &gt; with entries average over several runs) the differences in result
&gt; &gt; &gt; between the implementations, using various setups (number of
&gt; &gt; &gt; clusters, stopping criterion, etc.).
&gt; &gt;
&gt; &gt; I will make more tests, include benchmarks.
&gt; &gt;
&gt; &gt; It is a challenge for me to generate the various kinds of test data,
&gt; &gt;
&gt; &gt; could anybody supply me the test data of this comparsion: 
http://commons.apache.org/proper/commons-math/userguide/ml.html
&gt;
&gt; They are generated programmatically; code is here:
&gt;&nbsp;&nbsp;&nbsp;&nbsp; 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/userguide/java/org/apache/commons/math4/userguide
&gt;
&gt; [I've just updated the codes so that they compiles and run using
&gt; the new dependencies (see the "README" file).]
&gt;
&gt; &gt; &gt; "MT_64" is probably not the best default.&nbsp; And this is one 
of the
&gt; &gt; &gt; parameters from which there should not be a default IMO.
&gt; &gt;
&gt; &gt; I will do more tests
&gt;
&gt; You don't need to test the generators; users should choose
&gt; by themselves (from those provided in "Commons RNG").
&gt;
&gt; &gt;
&gt; &gt; &gt; [Note: there are spurious characters in your message (see e.g. 
the
&gt; &gt; &gt; paragraph quoted just above) that make it difficult to read.]
&gt; &gt;
&gt; &gt; I had well format my mail in my mail box, it may been changed by the 
Mail List service.
&gt; &gt;
&gt; &gt; I will try various kinds of mail editor. It will helpful if you told 
me which mail editor is work well with the ML.
&gt;
&gt; It's probably an encoding thing (setting it to "UTF-8" should be
&gt; fine).
&gt;
&gt; Best regards,
&gt; Gilles
&gt;
&gt; &gt; &gt; [...]
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to