Re: [MATH][GENETICS] Review of PR #197

Avijit Basak Sun, 07 Nov 2021 01:34:52 -0800

Hi All

         Please find my comments below:



The convention for the log summary (first line of the log message) is to
put the issue number in front; thus, instead of
---CUT---
Developed the new genetic algorithm module following the JIRA MATH-1563.
---CUT---
it should be something like
---CUT---
MATH-1563: Introducing new genetic algorithm module.
---CUT---

--Will change this.

> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > What kind of comparisons are you considering?
> > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > regression testing; but please note that when your proposal is
> > merged, it will imply that the "legacy" codes *must* be removed.
> > [We don't want to keep duplicate functionality.]
> > -- The new implementation has improved the quality of optimization over
> the
> > legacy model.
>
> "Improved" in what sense?
> If you mean enhanced performance, such checks should be done
> using JMH (producing data to be published on the web site).
>
> --Along with performance and memory utilization, stochastic algorithms
have
> another comparison parameter "quality of result". In stochastic
algorithms,
> global optimum is not guaranteed, We have to compare the quality of the
> result along with performance and memory consumption to compare two
> algorithm implementations. I have kept the legacy example just for
> comparison between the new and the legacy implementations.

Great that you take care of checking improvements on the quality
measures.  Just make sure that the new code do not depend on
anything in module "commons-math-legacy".  As noted earlier, a
dependency on a previous release of CM with
  <scope>test</scope>
is fine.

--test scope can only be used for test packages. However, this is kept in
src of example module. Changing previous release's dependency scope to test
is showing compilation error in src package. I am not sure how to manage
this.

>
> Also (if no objections are made):
> 1. the declaration should go in the main POM (in the
<dependencyManagement>
>     section).
> 2. the library (CM) must not depend on a specific logger implementation.
> --Do you mean I should also remove the simple implementation
"slf4j-simple"
> in commons-math-ga.

Yes.
Specific implementations a user's choice that could be different in
each of the "examples" modules.

--Done.

> [...]
> >
> > * Class "RandomGenerator"
> > 1. Duplicates functionality (storage of thread-local instances)
> > readily available in "Commons RNG".
> > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > (like in GA).
> > --Modified
>
> You still use "ThreadLocalRandomSource".
> I think that if some functionality requires a RNG instance it should be
> passed to its constructor (or created by it).
>
> I noticed that the interface "ChromosomeRepresentationUtils"
> is actually a utility class (as the name indicates): It only contains
> "static" functions.  This will probably require refactoring (utility
> classes are frowned upon nowadays).
> Some functionality in that class (e.g. sampling from a range of
> integers) exists in the "sampling" module of "Commons RNG".
>
> --Will look into this.

I understand that "hiding" the access to the RNG functionality makes
it for a seemingly nicer API, but I'm not sure that we should do it given
that the "random" aspect is a fundamental, and explicit, part of the GA
algorithm.
How about introducing "factories" to create the objects that use a RNG,
giving them their "own" RNG instance.  For example:
---CUT---
public interface MutationPolicy<P> {
    Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

    interface Factory<P> {
        /**
         * Creates an instance with a dedicated source of randomness.
         *
         * @param rng RNG algorithm.
         * @param seed Seed.
         * @return an instance that must <em>not</em> be shared among
threads.
         */
        MutationPolicy<P> create(RandomSource rng, Object seed);

        default MutationPolicy<P> create(RandomSource rng) {
            return create(rng, null);
        }
    }
}
---CUT---
?

--[IMHO] We can give users the option of choosing RandomSource. However,
choosing the RandomSource in all places separately seems to be little
overwork for the users. Separate RandomSource for each place may be
redundant. I also believe the system should provide a default option for
RandomSource instead of completely depending on the choice of users. What
if we provide the feature of configuring RandomSource in our
RandomNumberGenerator(earlier RandomGenerator) class like this. Please
share your thoughts regarding this.

--CUT--
    /** The default RandomSource for random number generation. **/
    private static RandomSource randomSource =
RandomSource.XO_RO_SHI_RO_128_PP;

    /**
     * Sets the random source for this random generator.
     * @param randomSource
     */
    public static void configure(RandomSource randomSource) {
        RandomNumberGenerator.randomSource = randomSource;
    }

    /**
     * constructs the singleton instance.
     */
    private RandomNumberGenerator() {
    }

    /**
     * Returns the (static) random generator.
     * @return the static random generator shared by GA implementation
classes
     */
    public static UniformRandomProvider getRandomGenerator() {
        return
ThreadLocalRandomSource.current(RandomNumberGenerator.randomSource);
    }
--CUT--

> >
> > * Class "ValidationUtils"
> > -> should not be public (or should be defined in an "internal" package).
> > --Changed
>
> The class actually provides no added value.
>
> --I was thinking of having a validation utility which can be reused
> everywhere. Otherwise I have to duplicate the code in all places. Do you
> think that is a good way of doing this?

Reuse is good, sure; but in this case, the very small gain (one line)
is not worth the loss of clarity (method call vs direct conditional test).

--Made changes

>> [...]
> >
> > (E) Unit tests
> > * src/test
> > 1. New tests should use Junit 5.
> > 2. "Example usages" probably belong in the "examples-ga" module.
> > --JUnit version is inherited from commons-math module.
>
> Not sure what you mean.
> It's possible to use Junit5 in new modules/test classes even if
> other classes use older Junit.
> --Do you think it is fine to have two separate versions of JUnit library
in
> CM. [IMHO] we should keep only one version only.

In the mid-term, yes.  But the point is that we must start somewhere.
And as you are creating a new tests suite, it's worth using the up-to-date
framework.  [This will also reduce the burden on the people who'll take
on updating all the other tests.]

--Done

Thanks & Regards
--Avijit Basak

On Mon, 1 Nov 2021 at 21:09, Gilles Sadowski <gillese...@gmail.com> wrote:

> Hello.
>
> Le lun. 1 nov. 2021 à 08:56, Avijit Basak <avijit.ba...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please find my comments below:
> >
> > >
> > > Hi All
> > >
> > >         I have fixed most of the review comments. The changes have been
> > > committed to PR#199.
> > >
> > > (A)
> > > Please "rebase" on "master".
> > > Please "squash" intermediate commits: For a new feature, a single
> commit
> > > should exist (that corresponds to the JIRA report describing it).
> > > --Will be done once all changes are finalized and committed.
> >
> > What is the rationale for not doing it right now?
> > The PR should always be "rebased" on the latest "master".
> >
> > --Done both rebase and squash.
>
> Thanks.
>
> The convention for the log summary (first line of the log message) is to
> put the issue number in front; thus, instead of
> ---CUT---
> Developed the new genetic algorithm module following the JIRA MATH-1563.
> ---CUT---
> it should be something like
> ---CUT---
> MATH-1563: Introducing new genetic algorithm module.
> ---CUT---
>
> > >
> > > (B)
> > > I'm confused by your defining "legacy" packages in new modules...
> > > What kind of comparisons are you considering?
> > > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > > regression testing; but please note that when your proposal is
> > > merged, it will imply that the "legacy" codes *must* be removed.
> > > [We don't want to keep duplicate functionality.]
> > > -- The new implementation has improved the quality of optimization over
> > the
> > > legacy model.
> >
> > "Improved" in what sense?
> > If you mean enhanced performance, such checks should be done
> > using JMH (producing data to be published on the web site).
> >
> > --Along with performance and memory utilization, stochastic algorithms
> have
> > another comparison parameter "quality of result". In stochastic
> algorithms,
> > global optimum is not guaranteed, We have to compare the quality of the
> > result along with performance and memory consumption to compare two
> > algorithm implementations. I have kept the legacy example just for
> > comparison between the new and the legacy implementations.
>
> Great that you take care of checking improvements on the quality
> measures.  Just make sure that the new code do not depend on
> anything in module "commons-math-legacy".  As noted earlier, a
> dependency on a previous release of CM with
>   <scope>test</scope>
> is fine.
>
> > [...]
>
> > >
> > > (D) General design
> > > Class "ConsoleLogger"
> > > -> We should not reinvent the wheel.  We should consider whether
> logging
> > > is necessary, and in the affirmative, depend on the de facto standard:
> > > "slf4j".
> > > -- Introduced the slf4j with simple implementation and removed the
> > > consolelogger class.
> >
> > Could you please post a separate message highlighting the
> > new dependency?
> > --I shall do it.
>
> Thanks.
>
> >
> > Also (if no objections are made):
> > 1. the declaration should go in the main POM (in the
> <dependencyManagement>
> >     section).
> > 2. the library (CM) must not depend on a specific logger implementation.
> > --Do you mean I should also remove the simple implementation
> "slf4j-simple"
> > in commons-math-ga.
>
> Yes.
> Specific implementations a user's choice that could be different in
> each of the "examples" modules.
>
> > [...]
> > >
> > > * Class "RandomGenerator"
> > > 1. Duplicates functionality (storage of thread-local instances)
> > > readily available in "Commons RNG".
> > > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > > (like in GA).
> > > --Modified
> >
> > You still use "ThreadLocalRandomSource".
> > I think that if some functionality requires a RNG instance it should be
> > passed to its constructor (or created by it).
> >
> > I noticed that the interface "ChromosomeRepresentationUtils"
> > is actually a utility class (as the name indicates): It only contains
> > "static" functions.  This will probably require refactoring (utility
> > classes are frowned upon nowadays).
> > Some functionality in that class (e.g. sampling from a range of
> > integers) exists in the "sampling" module of "Commons RNG".
> >
> > --Will look into this.
>
> I understand that "hiding" the access to the RNG functionality makes
> it for a seemingly nicer API, but I'm not sure that we should do it given
> that the "random" aspect is a fundamental, and explicit, part of the GA
> algorithm.
> How about introducing "factories" to create the objects that use a RNG,
> giving them their "own" RNG instance.  For example:
> ---CUT---
> public interface MutationPolicy<P> {
>     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
>
>     interface Factory<P> {
>         /**
>          * Creates an instance with a dedicated source of randomness.
>          *
>          * @param rng RNG algorithm.
>          * @param seed Seed.
>          * @return an instance that must <em>not</em> be shared among
> threads.
>          */
>         MutationPolicy<P> create(RandomSource rng, Object seed);
>
>         default MutationPolicy<P> create(RandomSource rng) {
>             return create(rng, null);
>         }
>     }
> }
> ---CUT---
> ?
>
> > >
> > > * Class "ValidationUtils"
> > > -> should not be public (or should be defined in an "internal"
> package).
> > > --Changed
> >
> > The class actually provides no added value.
> >
> > --I was thinking of having a validation utility which can be reused
> > everywhere. Otherwise I have to duplicate the code in all places. Do you
> > think that is a good way of doing this?
>
> Reuse is good, sure; but in this case, the very small gain (one line)
> is not worth the loss of clarity (method call vs direct conditional test).
>
> >>> [...]
> >
> > The abstraction is quite useful, but the point is indeed that the
> > abstraction
> > is somewhat broken by exposing the List<T> as the representation.
> > I understand that the advantage is to be able to define several
> > functionalities
> > (crossover, ...) in a generic way; but is it worth the obvious drawback
> is
> > that
> > object instances must be used to represent genes?
> >
> > --Initially I also thought of removing the List<T>. But I could not find
> > any wayout for crossover and mutation operators. Those are the primary
> > facilities this library should provide to the users. If any
> implementation
> > needs to use primitive type, then the new chromosome class would be
> > extended from the AbstractChromosome class. The new Binary chromosome
> would
> > follow the same design. However, that much memory optimization would be
> > required only for problems with quite higher dimensions like neuro
> > evolution.[IMHO] For most practical purposes, List of objects might be
> > sufficient.
>
> As this is a fundamental design decision about which another
> conclusion had been some time ago, it is worth posting a
> dedicated message to the ML (to establish that this is the
> choice being made).
>
> > > > Any domain specific new chromosome implementation extending the
> > > > AbstractListChromosome class can reuse all crossover and mutation
> > > operators.
> > > > For our proposed improvement of BinaryChromosome we should be able to
> > > > extend the AbstractChromosome (*not* AbstractListChromosome) for the
> new
> > > > class and provide the dedicated crossover and mutation operators for
> the
> > > > corresponding Genotype. Without an *explicit* abstraction,
> management of
> > > > crossover and mutation operators would be difficult.
> >
> > For sure, the design would be different. But it is not obvious to me that
> > the current one is necessarily better.
> >
> > ---CUT---
> > public interface CrossoverPolicy<P> {
> >    ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
> > second, double crossoverRate);
> > }
> > ---CUT---
> > vs
> > ---CUT---
> > public interface Chromosome {
> >    Chromosome crossover(Chromosome other, double crossoverRate);
> > }
> > ---CUT---
> >
> > -- [IMHO] The introduction of phenotype(P) is a better design. As this
> > would enable reuse of the same chromosome genotype for both direct and
> > indirect encodings. By virtue of this we have removed the RandomKey
> > chromosome. Now RealValuedChromosome can serve the purpose with proper
> > decoder.
> > -- I have introduced crossoverRate as a parameter just to give more
> control
> > to the crossover operator. This will enable users to change any logic by
> > only changing the corresponding CrossoverPolicy. This is also in line
> with
> > changes in MutationPolicy. This will minimize any impact on the algorithm
> > class.
> >
> >
> > > > Please share further thoughts regarding this.
>
> I cannot comment yet as I didn't delve into the implementation details...
>
> > >
> > > Whether we should have a data-based (subclass of "List<T>") or
> > > behaviour-based  access to individual genes is a fundamental design
> > > decision.
> > > The "legacy" implementation chose the former but it might be worth
> > > considering an alternative (lest we'll need to support several APIs if
> > > we later come to the conclusion that we need a more compact
> > > representation for some use cases.
> > > --How are you thinking the design to be behaviour-based. It will be
> > helpful
> > > if you share some examples.
> > > The legacy was only based on data types. But the current model
> introduces
> > > the concept of phenotype along with genotype. The genotype  (List<T>)
> > > represents the actual chromosome implementation and the phenotype (<P>)
> > > represents the problem domain. A Decoder is also introduced to convert
> > > genotype to phenotype. In this way this design would be capable of
> using a
> > > single genotype for a wide variety of problem domains(phenotype)
> including
> > > both direct and indirect encoding. Do you see any challenge to this
> > current
> > > implementation?
>
> You are the expert, who proposes design changes to allow
> extended usage. ;-)
>
> > > Kindly share your thoughts.
> >
> > I admit that I did not fully think about it (but I thought that we had
> > agreed about the List<T> being a problem)...
> > --Initially I also thought of removing the List<T>. But I could not find
> > any wayout for crossover and mutation operators. Those are the primary
> > facilities this library should provide to the users.
>
> Sure.
> If the "waste" in memory is worth the simplified handling, that's
> fine (TBD following your post about, as said above).
>
> >> [...]
> > >
> > > (E) Unit tests
> > > * src/test
> > > 1. New tests should use Junit 5.
> > > 2. "Example usages" probably belong in the "examples-ga" module.
> > > --JUnit version is inherited from commons-math module.
> >
> > Not sure what you mean.
> > It's possible to use Junit5 in new modules/test classes even if
> > other classes use older Junit.
> > --Do you think it is fine to have two separate versions of JUnit library
> in
> > CM. [IMHO] we should keep only one version only.
>
> In the mid-term, yes.  But the point is that we must start somewhere.
> And as you are creating a new tests suite, it's worth using the up-to-date
> framework.  [This will also reduce the burden on the people who'll take
> on updating all the other tests.]
>
> > [...]
> > >
> > > (F) Code readability
> > > * Please write one argument per line.
> > > * Write one condition check per line.
> > > * Avoid comments with no added value (like "constructor" for a
> > constructor).
> > > * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> > > preferable.
> > > * Do no duplicate documentation (see e.g. "OnePointCrossover").
> > > --I have formatted the method declaration to have one parameter in one
> > line.
> > > --Most of the if conditions are having a single condition except very
> few
> > > pre existing ones. I could not see any way to format the if statement
> in
> > > eclipse like the suggestion. I cannot introduce any formatting rule
> which
> > > cannot be handled in eclipse as that will be very hard to manage.
> >
> > ?
> > [I can't imagine that Eclipse won't let you add a newline.]
> > --I searched a little bit but could not find anything relevant. However,
> I
> > found the following reference
> >
> https://stackoverflow.com/questions/31808237/formatting-if-else-in-eclipse
> > Putting parenthesis is not an option. Let me know if you find anything
> > relevant to this.
>
> We can leave this nit for later.
>
> >
> > > --ASCII art and other crossover classes are untouched for this release.
> >
> > What do you mean by "this release"?
> > In this instance, it is easy to make the docs clearer by using a link
> > rather than ASCII figures.  If you want to argue that the latter should
> > be kept, please start a new thread in order to collect other opinions.
> > --I can start making the changes.
>
> Great; thanks.
>
> Best regards,
> Gilles
>
> >>>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Reply via email to