Re: [Math] Review of "genetic algorithm" module

Avijit Basak Sun, 01 May 2022 01:07:21 -0700

Hi All

         Please find my responses.


>Currently; it prints everything on "stderr" (so that simple usage of
>the UNIX shell's "pipe" syntax is not possible).
-- This is the default behaviour of slf4j-simple.
It is possible to provide specific system properties as VM args to redirect
the log to desired location like stdout or file.
Kindly look into the documentation provided below:
https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html

>
>> There was a discussion
>> earlier on this and the decision was to use console for example modules
for
>> log messages.
>
>I recall that we want a separation between the logger API (library
>dependency) and logger implementation ("examples" dependency).
>
>Is it a feature of "slf4j-simple" to only print to "stderr" or can it
>be configured?
>There is no issue on the "examples" module to depend on a
>full-featured logger (primary candidate would be "Log4j2").
--Already mentioned earlier.

>
>> Library users would add the implementation of slf4j according
>> to their need and provide required configurations to control the log
levels
>> and messages.
>
>We (developers) also want minimal flexibility for testing purposes...
>
>Also, IMO "examples" codes should make a difference
>between "logging" (optional information about internals)
>and "output" (mandatory result).
>For example, the TSP application result (itinerary and travel
>distance) could be saved in a file whose name is given as
>argument to a mandatory "--output" command-line option, as
>in module "commons-math-examples/examples-sofm/tsp".
--I have made the changes to accommodate the --output as input argument.

>
>> >
>> >There is still a mix between library code and application code (but
>> >this is to be discussed in MATH-1643.
>> -- Kindly update the JIRA with your suggestions.
>
>Sure, we'll come to that.  But I think we should first converge to
>a codebase that can be merged to "master" as a replacement of
>the "o.a.c.math4.genetics" package (to be removed).
>Do you confirm that all the unit tests in that package have an
>equivalent in the new "commons-math-ga" module?
-- I have accommodated mostly. But some are altered due to changes in
design. But it is good to have a review of the same.

>
>> >From browsing the library code, I'm tempted to believe that the
>> >dependency towards a logging framework is not necessary (or
>> >underused).
>> -- Do you mean dependency on the slf4j?
>
>Yes.
>
>>   I think that such a feature could be left to the application
>> >layer (per the "ConvergenceListener" registry).
>> -- ConvergenceListener is only to log the nature of convergence,
>> not internal details of all operators. Currently we have a very low
number
>> of log messages which can be increased later.
>
>"Increased later" for what purpose?
>Logging can be very useful (e.g. for debugging); however once
>this new implementation is trusted to behave correctly, I don't see
>that additional logging statements inside the library will be very
>useful; they would rather be needed by the application in order to
>e.g. display the content of the current "Population".  The "data" (to
>be eventually displayed) would be provided by the library through
>regular accessors.
>Another case in point is that the library cannot know what display
>format is suitable for the application (e.g. a GUI).
>
>If we really need (not clear as of the current codebase) to provide
>runtime access to other parts of the library (operators, etc.), why
>not generalise the "observer" pattern?
-- IMHO fine grain log statements are a better alternative than operator
level observer to convey internal behavior of operators for the specific
application.
Use of an observer for each operator is a bit unnecessary.

>
>>
>> >Likewise, the "PopulationStatisticsLogger" is not general enough to
>> >be worth being part of the library.
>> -- I would love to keep this simple implementation of convergence
listener.
>> This can provide a very basic overview on the nature of convergence.
>
>As a user, you'll be able to provide the feature to your application
>with less than 10 lines of code (by registering a "listener").
>Everyone else might want a slightly different output (contents and/or
>format) than what this class generates.
-- Users would always have the freedom to register their own listener.
PopulationStatisticsLogger provides only a very basic option to track the
population statistics during the convergence process.
This is never a very generic solution to meet the needs of all users.

>Yes.
>Not everybody follows strict rules, and certainly not all "Commons"
>components have the same rules (unfortunately), but since it was
>decided that this module belongs in "Commons Math", I'd like to
>minimize inconsistent coding styles (converging on what is used in
>non-"legacy" source files).
>
>> This would
>> initiate changes in almost all classes.
>
>Do you mean "all classes" in the GA module?
>Then so be it. ;-)
-- I shall start to work on this.

>> [...]
>> >In "AdaptiveGeneticAlgorithm":
>> >* There should be a single constructor (same remark as above).
>> -- Removed the constructor with default argument.
>>
>> >* Why the use of reflection ("isAssignableFrom")?
>> -- Replaced it by instanceof.
>
>Marginally better ;-) it still does not say why the statistics is disabled
>depending on the operator type...
--Statistics is required only for the adaptive probability generation.
In case application users choose to use a constant probability generation
scheme there is no use of this computation.
I shall document the same.

>
>
>
>(1)
>o.a.c.m.ga.GeneticAlgorithmTestPermutations
>(under "src/test")
>
>As per your comment in that class, it is a usage example.
>Given that its name does not end with "Test", it is not run by the
>test suite.  Please move it to the "examples" module.
-- This was taken from legacy GA. There is another example of binary
genotype called onemax. We already have examples in the "examples-ga"
module.
If we don't need these in test packages we can delete the same. But in that
case there won't be any end-to-end test case execution scenario during
build. Are we fine with that?

>
>(2)
>I'm missing a high-level doc that would enable a newbie to figure
>out what to implement in order to get going.
>E.g. what is the interplay between
> * genotype
> * allele
> * phenotype
> * decoder
> * fitness function
>?
>Several classes do not provide explanations (or links) about the
>concept which they represent.  For example, there is no doc about
>what a "RandomKeyDecoder" is, and the reason for using it (or not).
-- I need to work on that.

>
>(3)
>o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>
>It seems to be a "mixed-bag" kind of class (that is being frowned
>upon nowadays).
>Its comment refers to "random" but some methods are not using
>any randomization.  Most methods are only used in unit tests.
>
-- Actually most methods are taken from the legacy GA application. In the
legacy library representation there was no separate concept for the decoder
and all representation utility methods were kept in the corresponding
Chromosome classes. ChromosomeRepresentationUtils is created as a
placeholder for those utility methods.

>(4)
>o.a.c.m.ga.RandomProviderManager
>
>As already discussed, this class should not be part of the public
>API, namely because the "getRandomProvider()" method returns
>an object that is not thread-safe.
>If used internally as "syntactic sugar", it should be located in a
>package named "internal"; however I'd tend to remove it
>altogether, and call "ThreadLocalRandomSource.current(...)"
>explicitly.
>
-- The class o.a.c.m.ga.RandomProviderManager has been removed.
As per our previous discussion I have implemented the customization of
RandomSource at each operator level.

>(5)
>Why does a "Chromosome" need an "identifier"?
>Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>that is an internal class, where it seems that the chromosome itself
>(rather than its "id") could serve as the map's key.
-- How can we generate a map's key from the chromosome itself? Please
suggest.

>
>(6)
>o.a.c.m.ga.chromsome.AbstractChromosome
>
>Field "fitness" is not "final", yet it could be: a "FitnessFunction"
>object (used in "evaluate() to compute that field) is passed to the
>constructor.  Is there a reason for the "lazy" evaluation?
>Dropping it would make the instance immutable (and "evaluate()"
>should be renamed to "getFitness()").
>
>Why should the "FitnessFunction" be stored in every chromosome?
>
-- I have modified the fitness as final and initialized the same in the
constructor.

>(7)
>Spurious "@since" tags: In the new code (in "commons-math-ga"
>module), none should refer to a version < 4.0.
>
-- Some files are taken unchanged from the previous release. I have kept
the same @since tag for those files.
Do you need any change here?

>(8)
>@SuppressWarnings("unchecked")
>
>By default, I'm a bit suspicious about having to resort to these
annotations,
>especially for the kind of algorithms we are trying to implement.
>What do you think of the alternative approach outlined in the ZIP file
>attached in MATH-1618:
>    https://issues.apache.org/jira/browse/MATH-1618
>?
-- This annotation is required because we have kept an option to use
different types of genotypes including primitive.
Because of that our base interfaces only declares phenotype not genotype.
This introduced a kind of hierarchy in all operators and chromosome classes
which required us to use the mentioned annotation.
Even with the proposed new architecture we may not be able to avoid the
same.
-- It will be good if you can share some more information about the newly
proposed architecture. The areas of current design which it can improve as
well as the underlying intention.

>
>(9)
>Naming of factory methods should be harmonized to match the convention
>adopted in components like [RNG] and [Numbers].
>E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
>for "value object", and "create(...)" otherwise.
>
-- I have renamed the same for Chromosome classes.
What about the nextGeneration() method of ListPopulation class. Renaming
this to create() or from() won't convey the purpose of it.

>(10)
>o.a.c.m.ga.chromosome.AbstractListChromosome
>
>Constructor is called with an argument that is a previously instantiated
>"representation".  If the latter is mutable, the caller will be able to
modify
>the underlying data structure of the newly created chromosome.  [The
>doc assumes immutability of the representation but this cannot be
>enforced, and mixed ownership can entail subtle bugs.]
-- I think this applies to both representation as well as generic parameter
type T. But I don't see any other option but to rely on the user.
If you have any suggestions kindly share.

>
>(11)
>Do we agree that, in a GA, the most time-consuming task is the fitness
>computation?  Hence IMO, it should be the focus of the multithreading
>tools (i.e. "ExecutorService"), probably keeping the other parts (namely
>the genetic operators) within a simple sequential loop (as in class
>"GeneticAlgorithmFactory" in MATH-1618).
-- Current implementation uses separate threads for applying crossover and
mutation operators for each pair of selected chromosomes.
I think this ensures better utilization of multi-core processors compared
to use of multi-threading only for the fitness calculation.

-- Some codes are checked in. But there is a conflict in the pull request.
So I shall create a new one and delete the old branch itself.


Thanks & Regards
--Avijit Basak


On Fri, 15 Apr 2022 at 03:03, Gilles Sadowski <[email protected]> wrote:

> Hello.
>
> > > > [...]
>
> (1)
> o.a.c.m.ga.GeneticAlgorithmTestPermutations
> (under "src/test")
>
> As per your comment in that class, it is a usage example.
> Given that its name does not end with "Test", it is not run by the
> test suite.  Please move it to the "examples" module.
>
> (2)
> I'm missing a high-level doc that would enable a newbie to figure
> out what to implement in order to get going.
> E.g. what is the interplay between
>  * genotype
>  * allele
>  * phenotype
>  * decoder
>  * fitness function
> ?
> Several classes do not provide explanations (or links) about the
> concept which they represent.  For example, there is no doc about
> what a "RandomKeyDecoder" is, and the reason for using it (or not).
>
> (3)
> o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>
> It seems to be a "mixed-bag" kind of class (that is being frowned
> upon nowadays).
> Its comment refers to "random" but some methods are not using
> any randomization.  Most methods are only used in unit tests.
>
> (4)
> o.a.c.m.ga.RandomProviderManager
>
> As already discussed, this class should not be part of the public
> API, namely because the "getRandomProvider()" method returns
> an object that is not thread-safe.
> If used internally as "syntactic sugar", it should be located in a
> package named "internal"; however I'd tend to remove it
> altogether, and call "ThreadLocalRandomSource.current(...)"
> explicitly.
>
> (5)
> Why does a "Chromosome" need an "identifier"?
> Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> that is an internal class, where it seems that the chromosome itself
> (rather than its "id") could serve as the map's key.
>
> (6)
> o.a.c.m.ga.chromsome.AbstractChromosome
>
> Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> object (used in "evaluate() to compute that field) is passed to the
> constructor.  Is there a reason for the "lazy" evaluation?
> Dropping it would make the instance immutable (and "evaluate()"
> should be renamed to "getFitness()").
>
> Why should the "FitnessFunction" be stored in every chromosome?
>
> (7)
> Spurious "@since" tags: In the new code (in "commons-math-ga"
> module), none should refer to a version < 4.0.
>
> (8)
> @SuppressWarnings("unchecked")
>
> By default, I'm a bit suspicious about having to resort to these
> annotations,
> especially for the kind of algorithms we are trying to implement.
> What do you think of the alternative approach outlined in the ZIP file
> attached in MATH-1618:
>     https://issues.apache.org/jira/browse/MATH-1618
> ?
>
> (9)
> Naming of factory methods should be harmonized to match the convention
> adopted in components like [RNG] and [Numbers].
> E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
> for "value object", and "create(...)" otherwise.
>
> (10)
> o.a.c.m.ga.chromosome.AbstractListChromosome
>
> Constructor is called with an argument that is a previously instantiated
> "representation".  If the latter is mutable, the caller will be able to
> modify
> the underlying data structure of the newly created chromosome.  [The
> doc assumes immutability of the representation but this cannot be
> enforced, and mixed ownership can entail subtle bugs.]
>
> (11)
> Do we agree that, in a GA, the most time-consuming task is the fitness
> computation?  Hence IMO, it should be the focus of the multithreading
> tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> the genetic operators) within a simple sequential loop (as in class
> "GeneticAlgorithmFactory" in MATH-1618).
>
> To be continued...
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
Avijit Basak

Re: [Math] Review of "genetic algorithm" module

Reply via email to