Re: [Math] Review of "genetic algorithm" module

2022-07-16 Thread Gilles Sadowski
Hello.

Le dim. 3 juil. 2022 à 08:39, Avijit Basak  a écrit :
>
> Hi All
>
> Please find my responses. Sorry for the delay.
>
> > Comments related to new model:
> >>
> >> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> >> operators are designed hierarchically implementing a common interface and
> >> operators are accepted as a list.
> >> This enables interchangeability of operators. Library users would be able
> >> to use crossover and mutation operators interchangeably.
> >> However, in genetic algorithms or other population based search
> algorithms
> >> the operators are used for broadly two purposes, exploration and
> >> exploitation.
> >> In GA crossover is used for exploitation and mutation is used for
> >> exploration. Keeping them in a common hierarchy and allowing
> >> interchangeable operation violates that purpose.
> >
> >I'm not sure that the semantics of "exploitation" and "exploration"
> >should drive the API design.
> >Saying it differently: We don't need to know how various operators
> >will be used in order to implement them; hence IMO, there is no
> >need to discriminate at the API level.
>
> -- Even if we want to use a common abstraction, algorithm implementation
> needs to use them in different ways.
> The new proposed implementation does not consider different operators
> separately. I have mentioned this in the summary section at the end.

Please provide a concrete example of what the proposed implementation
is unable to handle in principle.  In practice, this implementation indeed
provides a "GeneticOperator" abstraction; but I do not understand how it
limits a user's possibility to select any concrete class that provides any
form of either "crossover" or "mutation".

> >>
> >> 2) Chromosome Fitness: In the new design the chromosome fitness is
> >> maintained as a hashmap where the key is the chromosome itself.
> >> Are we going to retain the fitness value in chromosome too otherwise
> >> implementation of Comparable won't be possible?
> >
> >Sorry, I don't follow.
> >
> >> Assuming the chromosome representation would be used to calculate
> hashcode,
> >> it would be a very time consuming process depending on the length of
> >> chromosome.
> >
> >Is this assumption correct?
> >For what purpose do we need to compute a custom hash code?
>
> -- A standard practice is to provide a custom implementation of hashCode
> and equals method whenever an object is used as key to any hash based data
> structure like HashMap we are using here.
> If you think it is good to rely on the JVM specific implementation, I have
> no concerns.

You are right that this should be examined carefully, to avoid unintended
behaviour.
A good start would perhaps be to check that "Population" behaves
correctly, with a minimal API.

> >
> >> E.g. in binary chromosomes we have provision to allow chromosome length
> >> upto (2^31 - 1).
> >
> >That's a lot. ;-)
> >Do you have use cases where such long representations can be handled?
>
> -- One such use case could be tuning of neural networks. Once we introduce
> genetic programming there would be a lot more use cases.

A discussion in itself, primarily focused on ensuring that the implementation
does not put undue limitations on genotype representation IIUC.
Is it the case with my proposal?

> >
> >> Along with that the chromosome implements Comparable
> >> interface.
> >> Equality by Comparable interface would be decided by fitness
> >
> >Sure.
> >
> >> however
> >> equality by equals() method would depend on representation.
> >
> >Do we need a custom "equals"?
>
> -- Mentioned above.

Yes, but without indicating whether there is a problem in my proposal.
I don't say that there isn't, but this should be confirmed with e.g. unit
tests (or "integration" tests?).

We don't want to support any unnecessary API or behaviours.
In this particular case, do we need to keep track of whether 2 genotypes
share the exact same representation?

> >
> >> As chromosomes with different representations may also have the same
> >> fitness, this would produce a conflict.
> >
> >Please provide an example.
>
> -- Think of a problem where we need to find out the minimum value of a
> function using GA where the function is a square of sinusoid like mentioned
> below.
> y = Math.pow(sin(x), 2)
>  where -180 deg <= x <= 180 deg
> We may have the same value of y for different values of x due to the
> replicative nature of the function.
> Similar situations can arrive in real life optimization problems too.
> Even the MathFunctionOptimization example we have used belongs to this
> category of problem.

I fail to see the "conflict" being referred to above (I forgot about the context
of that remark).

> >>
> >> 3) The model does not consider anything related to adaptive approaches.
> >> Would it be a separate factory?
> >
> >What I've proposed is an alternative "skeleton" for the API.  Of course,
> >more classes will provide specific functionality.
> >An adapti

Re: [Math] Review of "genetic algorithm" module

2022-07-02 Thread Avijit Basak
Hi All

Please find my responses. Sorry for the delay.

> Comments related to new model:
>>
>> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
>> operators are designed hierarchically implementing a common interface and
>> operators are accepted as a list.
>> This enables interchangeability of operators. Library users would be able
>> to use crossover and mutation operators interchangeably.
>> However, in genetic algorithms or other population based search
algorithms
>> the operators are used for broadly two purposes, exploration and
>> exploitation.
>> In GA crossover is used for exploitation and mutation is used for
>> exploration. Keeping them in a common hierarchy and allowing
>> interchangeable operation violates that purpose.
>
>I'm not sure that the semantics of "exploitation" and "exploration"
>should drive the API design.
>Saying it differently: We don't need to know how various operators
>will be used in order to implement them; hence IMO, there is no
>need to discriminate at the API level.

-- Even if we want to use a common abstraction, algorithm implementation
needs to use them in different ways.
The new proposed implementation does not consider different operators
separately. I have mentioned this in the summary section at the end.

>
>>
>> 2) Chromosome Fitness: In the new design the chromosome fitness is
>> maintained as a hashmap where the key is the chromosome itself.
>> Are we going to retain the fitness value in chromosome too otherwise
>> implementation of Comparable won't be possible?
>
>Sorry, I don't follow.
>
>> Assuming the chromosome representation would be used to calculate
hashcode,
>> it would be a very time consuming process depending on the length of
>> chromosome.
>
>Is this assumption correct?
>For what purpose do we need to compute a custom hash code?

-- A standard practice is to provide a custom implementation of hashCode
and equals method whenever an object is used as key to any hash based data
structure like HashMap we are using here.
If you think it is good to rely on the JVM specific implementation, I have
no concerns.

>
>> E.g. in binary chromosomes we have provision to allow chromosome length
>> upto (2^31 - 1).
>
>That's a lot. ;-)
>Do you have use cases where such long representations can be handled?

-- One such use case could be tuning of neural networks. Once we introduce
genetic programming there would be a lot more use cases.

>
>
>
>> Along with that the chromosome implements Comparable
>> interface.
>> Equality by Comparable interface would be decided by fitness
>
>Sure.
>
>> however
>> equality by equals() method would depend on representation.
>
>Do we need a custom "equals"?

-- Mentioned above.

>
>> As chromosomes with different representations may also have the same
>> fitness, this would produce a conflict.
>
>Please provide an example.

-- Think of a problem where we need to find out the minimum value of a
function using GA where the function is a square of sinusoid like mentioned
below.
y = Math.pow(sin(x), 2)
 where -180 deg <= x <= 180 deg
We may have the same value of y for different values of x due to the
replicative nature of the function.
Similar situations can arrive in real life optimization problems too.
Even the MathFunctionOptimization example we have used belongs to this
category of problem.

>
>>
>> 3) The model does not consider anything related to adaptive approaches.
>> Would it be a separate factory?
>
>What I've proposed is an alternative "skeleton" for the API.  Of course,
>more classes will provide specific functionality.
>An adaptive operator "is-a" specialized genetic operator (perhaps the
>notion of "Adaptive" will be defined through an interface?).

-- In "commons-math4-ga2" module separate ApplicationRate has been
introduced to handle the adaptive nature of the algorithm.
But this model always uses sorting as the default option which is
unnecessary for simple genetic algorithms. Isn't it a performance
bottleneck?

>
>>
>> 4) Comparison of chromosomes: In the current model two chromosomes are
>> compared after decoding the genotype to phenotype using the internal
>> decoder.
>> How can we address this in the new model?
>
>As mentioned above: Do we really need to compare representations?

-- This is a minor scenario and can be ignored as of now.

>
>>
>> 5) Chromosome String representation: Currently we use the toString()
method
>> to print the chromosome's phenotype. In the new model we would need to
>> avoid this approach as decoders won't be available to chromosomes.
>
>This seems like a minor issue (or perhaps no issue at all?) unless I'm
>missing something.

-- We can ignore this for now.

[...]
>> >Doing manually will be too tedious.
>> >If you are reasonably confident that you've ported everything
>> >valuable, I guess that we'll rely on coverage tools...
>> >The current log statements could also be construed as (totally)
>> >unnecessary, as they merely document statements whic

Re: [Math] Review of "genetic algorithm" module

2022-05-31 Thread Gilles Sadowski
Responding below to some of my own questions following commit
   ddfd5bf859d04cc5da604b20021ceaba9de7def6
in branch
  feature__MATH-1563__genetic_algorithm

Le mar. 24 mai 2022 à 01:54, Gilles Sadowski  a écrit :
>
> Hello.
>
> Le mer. 18 mai 2022 à 16:34, Avijit Basak  a écrit :
> >
> > Hi All
> >
> > Please find my comments below.
> >
> > Comments related to new model:
> >
> > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> > operators are designed hierarchically implementing a common interface and
> > operators are accepted as a list.
> > This enables interchangeability of operators. Library users would be able
> > to use crossover and mutation operators interchangeably.
> > However, in genetic algorithms or other population based search algorithms
> > the operators are used for broadly two purposes, exploration and
> > exploitation.
> > In GA crossover is used for exploitation and mutation is used for
> > exploration. Keeping them in a common hierarchy and allowing
> > interchangeable operation violates that purpose.
>
> I'm not sure that the semantics of "exploitation" and "exploration"
> should drive the API design.
> Saying it differently: We don't need to know how various operators
> will be used in order to implement them; hence IMO, there is no
> need to discriminate at the API level.

The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
oblivious to whether a genetic operator "is-a" crossover or mutation.

> >
> > 2) Chromosome Fitness: In the new design the chromosome fitness is
> > maintained as a hashmap where the key is the chromosome itself.
> > Are we going to retain the fitness value in chromosome too otherwise
> > implementation of Comparable won't be possible?
>
> Sorry, I don't follow.

Implementing "Comparable" is not necessary.

> > Assuming the chromosome representation would be used to calculate hashcode,
> > it would be a very time consuming process depending on the length of
> > chromosome.
>
> Is this assumption correct?
> For what purpose do we need to compute a custom hash code?

A custom "hashCode" method is not necessary.
The only consequence seems that 2 different instances of a genotype that is
logically the same (genotype-wise) could both be added to a population while
the same (in-memory) instance would only appear once in the hash map.

>
> > E.g. in binary chromosomes we have provision to allow chromosome length
> > upto (2^31 - 1).
>
> That's a lot. ;-)
> Do you have use cases where such long representations can be handled?

For now, I've use "BitSet" as the internal representation (but this could
be changed if necessary, because it is not part of the public API).

> > Along with that the chromosome implements Comparable
> > interface.
> > Equality by Comparable interface would be decided by fitness
>
> Sure.
>
> > however
> > equality by equals() method would depend on representation.
>
> Do we need a custom "equals"?

We don't.

> > As chromosomes with different representations may also have the same
> > fitness, this would produce a conflict.
>
> Please provide an example.
>
> >
> > 3) The model does not consider anything related to adaptive approaches.
> > Would it be a separate factory?
>
> What I've proposed is an alternative "skeleton" for the API.  Of course,
> more classes will provide specific functionality.
> An adaptive operator "is-a" specialized genetic operator (perhaps the
> notion of "Adaptive" will be defined through an interface?).

We don't need anything that complex (IIUC).
See interface "ApplicationRate" and its implementations in package "rate".

> >
> > 4) Comparison of chromosomes: In the current model two chromosomes are
> > compared after decoding the genotype to phenotype using the internal
> > decoder.
> > How can we address this in the new model?
>
> As mentioned above: Do we really need to compare representations?

Within the library, it does not seem necessary.

> >
> > 5) Chromosome String representation: Currently we use the toString() method
> > to print the chromosome's phenotype. In the new model we would need to
> > avoid this approach as decoders won't be available to chromosomes.
>
> This seems like a minor issue (or perhaps no issue at all?) unless I'm
> missing something.

The GA does not need to print the phenotype.
The decoder is user-defined, hence he can obviously apply it whenever
he needs a printable version of the chromosomes.

> >
> > 6) However in the new model the generic type parameters and java.util
> > packages are used in a more organized way. We can adopt the same in the
> > existing model.
>
> I don't understand what you are proposing.

I've used two generic parameters (one for genotype, one for phenotype).
The code does not use any "@SuppressWarning" annotation.

> >
> > >If we need to document software (e.g. the "examples") produced
> > >by a "Commons" component, it is preferable that we use and refer
> > >to "Log4j2".
> > >[The logger API is another matter since it do

Re: [Math] Review of "genetic algorithm" module

2022-05-23 Thread Gilles Sadowski
Hello.

Le mer. 18 mai 2022 à 16:34, Avijit Basak  a écrit :
>
> Hi All
>
> Please find my comments below.
>
> Comments related to new model:
>
> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> operators are designed hierarchically implementing a common interface and
> operators are accepted as a list.
> This enables interchangeability of operators. Library users would be able
> to use crossover and mutation operators interchangeably.
> However, in genetic algorithms or other population based search algorithms
> the operators are used for broadly two purposes, exploration and
> exploitation.
> In GA crossover is used for exploitation and mutation is used for
> exploration. Keeping them in a common hierarchy and allowing
> interchangeable operation violates that purpose.

I'm not sure that the semantics of "exploitation" and "exploration"
should drive the API design.
Saying it differently: We don't need to know how various operators
will be used in order to implement them; hence IMO, there is no
need to discriminate at the API level.

>
> 2) Chromosome Fitness: In the new design the chromosome fitness is
> maintained as a hashmap where the key is the chromosome itself.
> Are we going to retain the fitness value in chromosome too otherwise
> implementation of Comparable won't be possible?

Sorry, I don't follow.

> Assuming the chromosome representation would be used to calculate hashcode,
> it would be a very time consuming process depending on the length of
> chromosome.

Is this assumption correct?
For what purpose do we need to compute a custom hash code?

> E.g. in binary chromosomes we have provision to allow chromosome length
> upto (2^31 - 1).

That's a lot. ;-)
Do you have use cases where such long representations can be handled?

> Along with that the chromosome implements Comparable
> interface.
> Equality by Comparable interface would be decided by fitness

Sure.

> however
> equality by equals() method would depend on representation.

Do we need a custom "equals"?

> As chromosomes with different representations may also have the same
> fitness, this would produce a conflict.

Please provide an example.

>
> 3) The model does not consider anything related to adaptive approaches.
> Would it be a separate factory?

What I've proposed is an alternative "skeleton" for the API.  Of course,
more classes will provide specific functionality.
An adaptive operator "is-a" specialized genetic operator (perhaps the
notion of "Adaptive" will be defined through an interface?).

>
> 4) Comparison of chromosomes: In the current model two chromosomes are
> compared after decoding the genotype to phenotype using the internal
> decoder.
> How can we address this in the new model?

As mentioned above: Do we really need to compare representations?

>
> 5) Chromosome String representation: Currently we use the toString() method
> to print the chromosome's phenotype. In the new model we would need to
> avoid this approach as decoders won't be available to chromosomes.

This seems like a minor issue (or perhaps no issue at all?) unless I'm
missing something.

>
> 6) However in the new model the generic type parameters and java.util
> packages are used in a more organized way. We can adopt the same in the
> existing model.

I don't understand what you are proposing.

>
> >If we need to document software (e.g. the "examples") produced
> >by a "Commons" component, it is preferable that we use and refer
> >to "Log4j2".
> >[The logger API is another matter since it does not impact how the
> >software is used (from a user's POV).]
> -- I shall make the changes.

Thanks.

>
> > [...]
> >Doing manually will be too tedious.
> >If you are reasonably confident that you've ported everything
> >valuable, I guess that we'll rely on coverage tools...
> >The current log statements could also be construed as (totally)
> >unnecessary, as they merely document statements which I would
> >consider part of an "application", not the GA "library".
> >I believe that we do not agree yet on where to draw the line (my
> >proposal in MATH-1618 is related to that difference of opinion).
> >
> -- Probably I am missing something here. These log statements are more
> useful for library users. If we remove all log statements how users would
> be able to debug any issue.
> What if there are issues in any part of library code or may be anything
> fails due to some application related customization.
> It may not be necessarily a library bug.

I'm not clear about what is the intended purpose of logging.
But I'd rather defer this discussion, and first focus on the actual GA
functionality being implemented.

>
> >> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >> >> >be worth being part of the library.
> >> >> -- I would love to keep this simple implementation of convergence
> >> >> listener. This can provide a very basic overview on the nature of
> >> >> convergence.
> >> >
> >> >As a user, you'll be a

Re: [Math] Review of "genetic algorithm" module

2022-05-18 Thread Avijit Basak
Hi All

Please find my comments below.

Comments related to new model:

1) Hierarchy of GeneticOperator: In the proposed model the genetic
operators are designed hierarchically implementing a common interface and
operators are accepted as a list.
This enables interchangeability of operators. Library users would be able
to use crossover and mutation operators interchangeably.
However, in genetic algorithms or other population based search algorithms
the operators are used for broadly two purposes, exploration and
exploitation.
In GA crossover is used for exploitation and mutation is used for
exploration. Keeping them in a common hierarchy and allowing
interchangeable operation violates that purpose.

2) Chromosome Fitness: In the new design the chromosome fitness is
maintained as a hashmap where the key is the chromosome itself.
Are we going to retain the fitness value in chromosome too otherwise
implementation of Comparable won't be possible?
Assuming the chromosome representation would be used to calculate hashcode,
it would be a very time consuming process depending on the length of
chromosome.
E.g. in binary chromosomes we have provision to allow chromosome length
upto (2^31 - 1). Along with that the chromosome implements Comparable
interface.
Equality by Comparable interface would be decided by fitness however
equality by equals() method would depend on representation.
As chromosomes with different representations may also have the same
fitness, this would produce a conflict.

3) The model does not consider anything related to adaptive approaches.
Would it be a separate factory?

4) Comparison of chromosomes: In the current model two chromosomes are
compared after decoding the genotype to phenotype using the internal
decoder.
How can we address this in the new model?

5) Chromosome String representation: Currently we use the toString() method
to print the chromosome's phenotype. In the new model we would need to
avoid this approach as decoders won't be available to chromosomes.

6) However in the new model the generic type parameters and java.util
packages are used in a more organized way. We can adopt the same in the
existing model.



>If we need to document software (e.g. the "examples") produced
>by a "Commons" component, it is preferable that we use and refer
>to "Log4j2".
>[The logger API is another matter since it does not impact how the
>software is used (from a user's POV).]
-- I shall make the changes.

>
>> >
>> >> There was a discussion
>> >> earlier on this and the decision was to use console for example
modules
>> >> log messages.
>> >
>> >I recall that we want a separation between the logger API (library
>> >dependency) and logger implementation ("examples" dependency).
>> >
>> >Is it a feature of "slf4j-simple" to only print to "stderr" or can it
>> >be configured?
>> >There is no issue on the "examples" module to depend on a
>> >full-featured logger (primary candidate would be "Log4j2").
>> --Already mentioned earlier.
>
>We should change SimpleLogger -> Log4j2 equivalent.
--I shall do this.

[...]
>Doing manually will be too tedious.
>If you are reasonably confident that you've ported everything
>valuable, I guess that we'll rely on coverage tools...
>The current log statements could also be construed as (totally)
>unnecessary, as they merely document statements which I would
>consider part of an "application", not the GA "library".
>I believe that we do not agree yet on where to draw the line (my
>proposal in MATH-1618 is related to that difference of opinion).
>
-- Probably I am missing something here. These log statements are more
useful for library users. If we remove all log statements how users would
be able to debug any issue.
What if there are issues in any part of library code or may be anything
fails due to some application related customization.
It may not be necessarily a library bug.

>> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
>> >> >be worth being part of the library.
>> >> -- I would love to keep this simple implementation of convergence
>> >> listener. This can provide a very basic overview on the nature of
>> >> convergence.
>> >
>> >As a user, you'll be able to provide the feature to your application
>> >with less than 10 lines of code (by registering a "listener").
>> >Everyone else might want a slightly different output (contents and/or
>> >format) than what this class generates.
>> -- Users would always have the freedom to register their own listener.
>> PopulationStatisticsLogger provides only a very basic option to track the
>> population statistics during the convergence process.
>> This is never a very generic solution to meet the needs of all users.
>
>AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>(as a user), but as developers we should only include functionality that
>is broadly useful.  I think I provided arguments that it is not the case of
>that class.
-- So do you want this class to be rem

Re: [Math] Review of "genetic algorithm" module

2022-05-01 Thread Gilles Sadowski
Hello.

Le dim. 1 mai 2022 à 10:07, Avijit Basak  a écrit :
>
> Hi All
>
>  Please find my responses.
>
> >Currently; it prints everything on "stderr" (so that simple usage of
> >the UNIX shell's "pipe" syntax is not possible).
> -- This is the default behaviour of slf4j-simple.
> It is possible to provide specific system properties as VM args to redirect
> the log to desired location like stdout or file.
> Kindly look into the documentation provided below:
> https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html

If we need to document software (e.g. the "examples") produced
by a "Commons" component, it is preferable that we use and refer
to "Log4j2".
[The logger API is another matter since it does not impact how the
software is used (from a user's POV).]

> >
> >> There was a discussion
> >> earlier on this and the decision was to use console for example modules
> >> log messages.
> >
> >I recall that we want a separation between the logger API (library
> >dependency) and logger implementation ("examples" dependency).
> >
> >Is it a feature of "slf4j-simple" to only print to "stderr" or can it
> >be configured?
> >There is no issue on the "examples" module to depend on a
> >full-featured logger (primary candidate would be "Log4j2").
> --Already mentioned earlier.

We should change SimpleLogger -> Log4j2 equivalent.

> >
> >> [...]
> >
> >Also, IMO "examples" codes should make a difference
> >between "logging" (optional information about internals)
> >and "output" (mandatory result).
> >For example, the TSP application result (itinerary and travel
> >distance) could be saved in a file whose name is given as
> >argument to a mandatory "--output" command-line option, as
> >in module "commons-math-examples/examples-sofm/tsp".
> --I have made the changes to accommodate the --output as input argument.

Thanks.

> >
> >> >
> >> >There is still a mix between library code and application code (but
> >> >this is to be discussed in MATH-1643.
> >> -- Kindly update the JIRA with your suggestions.
> >
> >Sure, we'll come to that.  But I think we should first converge to
> >a codebase that can be merged to "master" as a replacement of
> >the "o.a.c.math4.genetics" package (to be removed).
> >Do you confirm that all the unit tests in that package have an
> >equivalent in the new "commons-math-ga" module?
> -- I have accommodated mostly. But some are altered due to changes in
> design. But it is good to have a review of the same.

Doing manually will be too tedious.
If you are reasonably confident that you've ported everything
valuable, I guess that we'll rely on coverage tools...

> >
> >> >From browsing the library code, I'm tempted to believe that the
> >> >dependency towards a logging framework is not necessary (or
> >> >underused).
> >> -- Do you mean dependency on the slf4j?
> >
> >Yes.
> >
> >>   I think that such a feature could be left to the application
> >> >layer (per the "ConvergenceListener" registry).
> >> -- ConvergenceListener is only to log the nature of convergence,
> >> not internal details of all operators. Currently we have a very low
> >> number of log messages which can be increased later.
> >
> >"Increased later" for what purpose?
> >Logging can be very useful (e.g. for debugging); however once
> >this new implementation is trusted to behave correctly, I don't see
> >that additional logging statements inside the library will be very
> >useful; they would rather be needed by the application in order to
> >e.g. display the content of the current "Population".  The "data" (to
> >be eventually displayed) would be provided by the library through
> >regular accessors.
> >Another case in point is that the library cannot know what display
> >format is suitable for the application (e.g. a GUI).
> >
> >If we really need (not clear as of the current codebase) to provide
> >runtime access to other parts of the library (operators, etc.), why
> >not generalise the "observer" pattern?
> -- IMHO fine grain log statements are a better alternative than operator
> level observer to convey internal behavior of operators for the specific
> application.
> Use of an observer for each operator is a bit unnecessary.

The current log statements could also be construed as (totally)
unnecessary, as they merely document statements which I would
consider part of an "application", not the GA "library".
I believe that we do not agree yet on where to draw the line (my
proposal in MATH-1618 is related to that difference of opinion).

> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >> >be worth being part of the library.
> >> -- I would love to keep this simple implementation of convergence
> >> listener. This can provide a very basic overview on the nature of
> >> convergence.
> >
> >As a user, you'll be able to provide the feature to your application
> >with less than 10 lines of code (by registering a "listener").
> >Everyone else might want a slightly different output (contents and/or
>

Re: [Math] Review of "genetic algorithm" module

2022-05-01 Thread Avijit Basak
Hi All

 Please find my responses.

>Currently; it prints everything on "stderr" (so that simple usage of
>the UNIX shell's "pipe" syntax is not possible).
-- This is the default behaviour of slf4j-simple.
It is possible to provide specific system properties as VM args to redirect
the log to desired location like stdout or file.
Kindly look into the documentation provided below:
https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html

>
>> There was a discussion
>> earlier on this and the decision was to use console for example modules
for
>> log messages.
>
>I recall that we want a separation between the logger API (library
>dependency) and logger implementation ("examples" dependency).
>
>Is it a feature of "slf4j-simple" to only print to "stderr" or can it
>be configured?
>There is no issue on the "examples" module to depend on a
>full-featured logger (primary candidate would be "Log4j2").
--Already mentioned earlier.

>
>> Library users would add the implementation of slf4j according
>> to their need and provide required configurations to control the log
levels
>> and messages.
>
>We (developers) also want minimal flexibility for testing purposes...
>
>Also, IMO "examples" codes should make a difference
>between "logging" (optional information about internals)
>and "output" (mandatory result).
>For example, the TSP application result (itinerary and travel
>distance) could be saved in a file whose name is given as
>argument to a mandatory "--output" command-line option, as
>in module "commons-math-examples/examples-sofm/tsp".
--I have made the changes to accommodate the --output as input argument.

>
>> >
>> >There is still a mix between library code and application code (but
>> >this is to be discussed in MATH-1643.
>> -- Kindly update the JIRA with your suggestions.
>
>Sure, we'll come to that.  But I think we should first converge to
>a codebase that can be merged to "master" as a replacement of
>the "o.a.c.math4.genetics" package (to be removed).
>Do you confirm that all the unit tests in that package have an
>equivalent in the new "commons-math-ga" module?
-- I have accommodated mostly. But some are altered due to changes in
design. But it is good to have a review of the same.

>
>> >From browsing the library code, I'm tempted to believe that the
>> >dependency towards a logging framework is not necessary (or
>> >underused).
>> -- Do you mean dependency on the slf4j?
>
>Yes.
>
>>   I think that such a feature could be left to the application
>> >layer (per the "ConvergenceListener" registry).
>> -- ConvergenceListener is only to log the nature of convergence,
>> not internal details of all operators. Currently we have a very low
number
>> of log messages which can be increased later.
>
>"Increased later" for what purpose?
>Logging can be very useful (e.g. for debugging); however once
>this new implementation is trusted to behave correctly, I don't see
>that additional logging statements inside the library will be very
>useful; they would rather be needed by the application in order to
>e.g. display the content of the current "Population".  The "data" (to
>be eventually displayed) would be provided by the library through
>regular accessors.
>Another case in point is that the library cannot know what display
>format is suitable for the application (e.g. a GUI).
>
>If we really need (not clear as of the current codebase) to provide
>runtime access to other parts of the library (operators, etc.), why
>not generalise the "observer" pattern?
-- IMHO fine grain log statements are a better alternative than operator
level observer to convey internal behavior of operators for the specific
application.
Use of an observer for each operator is a bit unnecessary.

>
>>
>> >Likewise, the "PopulationStatisticsLogger" is not general enough to
>> >be worth being part of the library.
>> -- I would love to keep this simple implementation of convergence
listener.
>> This can provide a very basic overview on the nature of convergence.
>
>As a user, you'll be able to provide the feature to your application
>with less than 10 lines of code (by registering a "listener").
>Everyone else might want a slightly different output (contents and/or
>format) than what this class generates.
-- Users would always have the freedom to register their own listener.
PopulationStatisticsLogger provides only a very basic option to track the
population statistics during the convergence process.
This is never a very generic solution to meet the needs of all users.

>Yes.
>Not everybody follows strict rules, and certainly not all "Commons"
>components have the same rules (unfortunately), but since it was
>decided that this module belongs in "Commons Math", I'd like to
>minimize inconsistent coding styles (converging on what is used in
>non-"legacy" source files).
>
>> This would
>> initiate changes in almost all classes.
>
>Do you mean "all classes" in the GA module?
>Then so be it. ;-)
-- I shall start to work on this.

>> [...]
>>

Re: [Math] Review of "genetic algorithm" module

2022-04-14 Thread Gilles Sadowski
Hello.

> > > [...]

(1)
o.a.c.m.ga.GeneticAlgorithmTestPermutations
(under "src/test")

As per your comment in that class, it is a usage example.
Given that its name does not end with "Test", it is not run by the
test suite.  Please move it to the "examples" module.

(2)
I'm missing a high-level doc that would enable a newbie to figure
out what to implement in order to get going.
E.g. what is the interplay between
 * genotype
 * allele
 * phenotype
 * decoder
 * fitness function
?
Several classes do not provide explanations (or links) about the
concept which they represent.  For example, there is no doc about
what a "RandomKeyDecoder" is, and the reason for using it (or not).

(3)
o.a.c.m.ga.utils.ChromosomeRepresentationUtils

It seems to be a "mixed-bag" kind of class (that is being frowned
upon nowadays).
Its comment refers to "random" but some methods are not using
any randomization.  Most methods are only used in unit tests.

(4)
o.a.c.m.ga.RandomProviderManager

As already discussed, this class should not be part of the public
API, namely because the "getRandomProvider()" method returns
an object that is not thread-safe.
If used internally as "syntactic sugar", it should be located in a
package named "internal"; however I'd tend to remove it
altogether, and call "ThreadLocalRandomSource.current(...)"
explicitly.

(5)
Why does a "Chromosome" need an "identifier"?
Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
that is an internal class, where it seems that the chromosome itself
(rather than its "id") could serve as the map's key.

(6)
o.a.c.m.ga.chromsome.AbstractChromosome

Field "fitness" is not "final", yet it could be: a "FitnessFunction"
object (used in "evaluate() to compute that field) is passed to the
constructor.  Is there a reason for the "lazy" evaluation?
Dropping it would make the instance immutable (and "evaluate()"
should be renamed to "getFitness()").

Why should the "FitnessFunction" be stored in every chromosome?

(7)
Spurious "@since" tags: In the new code (in "commons-math-ga"
module), none should refer to a version < 4.0.

(8)
@SuppressWarnings("unchecked")

By default, I'm a bit suspicious about having to resort to these annotations,
especially for the kind of algorithms we are trying to implement.
What do you think of the alternative approach outlined in the ZIP file
attached in MATH-1618:
https://issues.apache.org/jira/browse/MATH-1618
?

(9)
Naming of factory methods should be harmonized to match the convention
adopted in components like [RNG] and [Numbers].
E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
for "value object", and "create(...)" otherwise.

(10)
o.a.c.m.ga.chromosome.AbstractListChromosome

Constructor is called with an argument that is a previously instantiated
"representation".  If the latter is mutable, the caller will be able to modify
the underlying data structure of the newly created chromosome.  [The
doc assumes immutability of the representation but this cannot be
enforced, and mixed ownership can entail subtle bugs.]

(11)
Do we agree that, in a GA, the most time-consuming task is the fitness
computation?  Hence IMO, it should be the focus of the multithreading
tools (i.e. "ExecutorService"), probably keeping the other parts (namely
the genetic operators) within a simple sequential loop (as in class
"GeneticAlgorithmFactory" in MATH-1618).

To be continued...

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-04-13 Thread Gilles Sadowski
Hello.

> [...]
> -- Created a new PR https://github.com/apache/commons-math/pull/209.

Merged in branch "feature__MATH-1563__genetic_algorithm".

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-04-11 Thread Gilles Sadowski
Hello.

Le lun. 11 avr. 2022 à 07:36, Avijit Basak  a écrit :
>
> [...]
>
> >I can't seem to find how the logger is configured.  Currently, all
> >"INFO" messages are logged to the "standard error" console; one
> >should be able to e.g. redirect output to a file, or set the log level.
> -- There is a dependency for "slf4j-simple" in the "examples-ga" module
> which would print all log messages in the console.

Currently; it prints everything on "stderr" (so that simple usage of
the UNIX shell's "pipe" syntax is not possible).

> There was a discussion
> earlier on this and the decision was to use console for example modules for
> log messages.

I recall that we want a separation between the logger API (library
dependency) and logger implementation ("examples" dependency).

Is it a feature of "slf4j-simple" to only print to "stderr" or can it
be configured?
There is no issue on the "examples" module to depend on a
full-featured logger (primary candidate would be "Log4j2").

> Library users would add the implementation of slf4j according
> to their need and provide required configurations to control the log levels
> and messages.

We (developers) also want minimal flexibility for testing purposes...

Also, IMO "examples" codes should make a difference
between "logging" (optional information about internals)
and "output" (mandatory result).
For example, the TSP application result (itinerary and travel
distance) could be saved in a file whose name is given as
argument to a mandatory "--output" command-line option, as
in module "commons-math-examples/examples-sofm/tsp".

> >
> >There is still a mix between library code and application code (but
> >this is to be discussed in MATH-1643.
> -- Kindly update the JIRA with your suggestions.

Sure, we'll come to that.  But I think we should first converge to
a codebase that can be merged to "master" as a replacement of
the "o.a.c.math4.genetics" package (to be removed).
Do you confirm that all the unit tests in that package have an
equivalent in the new "commons-math-ga" module?

> >From browsing the library code, I'm tempted to believe that the
> >dependency towards a logging framework is not necessary (or
> >underused).
> -- Do you mean dependency on the slf4j?

Yes.

>   I think that such a feature could be left to the application
> >layer (per the "ConvergenceListener" registry).
> -- ConvergenceListener is only to log the nature of convergence,
> not internal details of all operators. Currently we have a very low number
> of log messages which can be increased later.

"Increased later" for what purpose?
Logging can be very useful (e.g. for debugging); however once
this new implementation is trusted to behave correctly, I don't see
that additional logging statements inside the library will be very
useful; they would rather be needed by the application in order to
e.g. display the content of the current "Population".  The "data" (to
be eventually displayed) would be provided by the library through
regular accessors.
Another case in point is that the library cannot know what display
format is suitable for the application (e.g. a GUI).

If we really need (not clear as of the current codebase) to provide
runtime access to other parts of the library (operators, etc.), why
not generalise the "observer" pattern?

>
> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >be worth being part of the library.
> -- I would love to keep this simple implementation of convergence listener.
> This can provide a very basic overview on the nature of convergence.

As a user, you'll be able to provide the feature to your application
with less than 10 lines of code (by registering a "listener").
Everyone else might want a slightly different output (contents and/or
format) than what this class generates.

> >A few (nit-pick) remarks about code style in general.
> >Javadoc is incomplete: All methods must be documented.
> >Please avoid redundant links like e.g.
> >---CUT---
> >/**
> > * @param crossoverPolicy  The {@link CrossoverPolicy}
> > * @param mutationPolicy   The {@link MutationPolicy}
> > * @param selectionPolicy  The {@link SelectionPolicy}
> > * @param convergenceListeners An optional collection of
> > * {@link ConvergenceListener} with
> >variable arity
> > */
> >@SafeVarargs
> >protected AbstractGeneticAlgorithm(final CrossoverPolicy
> crossoverPolicy,
> >final MutationPolicy mutationPolicy,
> >final SelectionPolicy selectionPolicy,
> >ConvergenceListener... convergenceListeners) {
> >this.crossoverPolicy = crossoverPolicy;
> >this.mutationPolicy = mutationPolicy;
> >this.selectionPolicy = selectionPolicy;
> >updateListenerRigistry(convergenceListeners);
> >}
> >---CUT---
> >Readers of the HTML-generated doc can already click on the various
> >arguments within the signature; so there is no need to add visual noi

Re: [Math] Review of "genetic algorithm" module

2022-04-10 Thread Avijit Basak
Hi All

  Please find my comments below:

>I've merged PR #208 into the feature branch (please open a
>new one for changes entailed by the comments below).
>I again had to delete the branch (and recreate it with the merged
>changes from PR #208).  [I must be missing something about the
>correct git workflow...]
-- I have taken the latest code from apache repo.

>There seems to be something wrong in the "examples-ga-tsp"
>application (fitness does not change).
-- I have fixed the issue in fitness function calculation.

>At the end of the run, one should be able to quickly assess the
>goodness of the solution; the new code prints a line with many
>"Node [...]" elements while the "--legacy" switch prints the "best"
>fitness and a list of indices.  In either case, the solution should
>consist of the list of visited cities (one per line) and the total
>distance.
-- I have made the changes.

>I can't seem to find how the logger is configured.  Currently, all
>"INFO" messages are logged to the "standard error" console; one
>should be able to e.g. redirect output to a file, or set the log level.
-- There is a dependency for "slf4j-simple" in the "examples-ga" module
which would print all log messages in the console. There was a discussion
earlier on this and the decision was to use console for example modules for
log messages. Library users would add the implementation of slf4j according
to their need and provide required configurations to control the log levels
and messages.

>
>There is still a mix between library code and application code (but
>this is to be discussed in MATH-1643.
-- Kindly update the JIRA with your suggestions.

>From browsing the library code, I'm tempted to believe that the
>dependency towards a logging framework is not necessary (or
>underused).
-- Do you mean dependency on the slf4j?

  I think that such a feature could be left to the application
>layer (per the "ConvergenceListener" registry).
-- ConvergenceListener is only to log the nature of convergence,
not internal details of all operators. Currently we have a very low number
of log messages which can be increased later.

>Likewise, the "PopulationStatisticsLogger" is not general enough to
>be worth being part of the library.
-- I would love to keep this simple implementation of convergence listener.
This can provide a very basic overview on the nature of convergence.

>A few (nit-pick) remarks about code style in general.
>Javadoc is incomplete: All methods must be documented.
>Please avoid redundant links like e.g.
>---CUT---
>/**
> * @param crossoverPolicy  The {@link CrossoverPolicy}
> * @param mutationPolicy   The {@link MutationPolicy}
> * @param selectionPolicy  The {@link SelectionPolicy}
> * @param convergenceListeners An optional collection of
> * {@link ConvergenceListener} with
>variable arity
> */
>@SafeVarargs
>protected AbstractGeneticAlgorithm(final CrossoverPolicy
crossoverPolicy,
>final MutationPolicy mutationPolicy,
>final SelectionPolicy selectionPolicy,
>ConvergenceListener... convergenceListeners) {
>this.crossoverPolicy = crossoverPolicy;
>this.mutationPolicy = mutationPolicy;
>this.selectionPolicy = selectionPolicy;
>updateListenerRigistry(convergenceListeners);
>}
>---CUT---
>Readers of the HTML-generated doc can already click on the various
>arguments within the signature; so there is no need to add visual noise
>in the source code just to be able to click from within the Javadoc part
>just above that signature.
>The Javadoc block above should be
>---CUT---
>/**
> * @param crossoverPolicy Crossover policy.
> * @param mutationPolicy Mutation policy.
> * @param selectionPolicy Selection policy.
> * @param convergenceListeners Collection of user-defined listeners.
> */
>---CUT---
>[Note the absence of "The" and the presence of a final "period".]
>
-- Are we following this standard in the commons-math project? This would
initiate changes in almost all classes.

>A blank line is welcome to separate ideas ("logical" blocks of code)
>However, there should not be an empty line after a closing brace if
>it is followed by another closing brace.
>Also, in all recent codes, there is no blank line between the instance
>fields; the (mandatory) Javadoc is enough to logically (and visually)
>separate the fields.
-- I have rectified this.

>
>In "AbstractGeneticAlgorithm":
>* There should be a single constructor (handling default values should
>   be left to the application layer).  [This would allow the removal of
>   "updateListenerRigistry" method (note: There is a typo in that name).]
--This is corrected.

>* Are annotations (@SafeVarargs, ...) necessary?  Please document.
-- This annotation is necessary for any parameterized vararg. This is also
used in legacy classes like o.a.c.m.l.a.i.FieldHermiteInterpolator and
o.a.c.m.l.o.n.RungeKuttaFie

Re: [Math] Review of "genetic algorithm" module

2022-04-03 Thread Gilles Sadowski
Hello.

Le mar. 29 mars 2022 à 17:08, Avijit Basak  a écrit :
>
> Hi All
>
>  Please find my comments below.
>
> [...]
>
> --I have made the changes and created a new PR. Kindly review the same and
> share your thoughts.
> https://github.com/apache/commons-math/pull/208

I've merged PR #208 into the feature branch (please open a
new one for changes entailed by the comments below).
I again had to delete the branch (and recreate it with the merged
changes from PR #208).  [I must be missing something about the
correct git workflow...]

There seems to be something wrong in the "examples-ga-tsp"
application (fitness does not change).

At the end of the run, one should be able to quickly assess the
goodness of the solution; the new code prints a line with many
"Node [...]" elements while the "--legacy" switch prints the "best"
fitness and a list of indices.  In either case, the solution should
consist of the list of visited cities (one per line) and the total
distance.

I can't seem to find how the logger is configured.  Currently, all
"INFO" messages are logged to the "standard error" console; one
should be able to e.g. redirect output to a file, or set the log level.

There is still a mix between library code and application code (but
this is to be discussed in MATH-1643.

>From browsing the library code, I'm tempted to believe that the
dependency towards a logging framework is not necessary (or
underused).  I think that such a feature could be left to the application
layer (per the "ConvergenceListener" registry).
Likewise, the "PopulationStatisticsLogger" is not general enough to
be worth being part of the library.

A few (nit-pick) remarks about code style in general.
Javadoc is incomplete: All methods must be documented.
Please avoid redundant links like e.g.
---CUT---
/**
 * @param crossoverPolicy  The {@link CrossoverPolicy}
 * @param mutationPolicy   The {@link MutationPolicy}
 * @param selectionPolicy  The {@link SelectionPolicy}
 * @param convergenceListeners An optional collection of
 * {@link ConvergenceListener} with
variable arity
 */
@SafeVarargs
protected AbstractGeneticAlgorithm(final CrossoverPolicy crossoverPolicy,
final MutationPolicy mutationPolicy,
final SelectionPolicy selectionPolicy,
ConvergenceListener... convergenceListeners) {
this.crossoverPolicy = crossoverPolicy;
this.mutationPolicy = mutationPolicy;
this.selectionPolicy = selectionPolicy;
updateListenerRigistry(convergenceListeners);
}
---CUT---
Readers of the HTML-generated doc can already click on the various
arguments within the signature; so there is no need to add visual noise
in the source code just to be able to click from within the Javadoc part
just above that signature.
The Javadoc block above should be
---CUT---
/**
 * @param crossoverPolicy Crossover policy.
 * @param mutationPolicy Mutation policy.
 * @param selectionPolicy Selection policy.
 * @param convergenceListeners Collection of user-defined listeners.
 */
---CUT---
[Note the absence of "The" and the presence of a final "period".]

A blank line is welcome to separate ideas ("logical" blocks of code)
However, there should not be an empty line after a closing brace if
it is followed by another closing brace.
Also, in all recent codes, there is no blank line between the instance
fields; the (mandatory) Javadoc is enough to logically (and visually)
separate the fields.

In "AbstractGeneticAlgorithm":
* There should be a single constructor (handling default values should
   be left to the application layer).  [This would allow the removal of
   "updateListenerRigistry" method (note: There is a typo in that name).]
* Are annotations (@SafeVarargs, ...) necessary?  Please document.

In "AdaptiveGeneticAlgorithm":
* There should be a single constructor (same remark as above).
* Why the use of reflection ("isAssignableFrom")?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-03-29 Thread Avijit Basak
Hi All

 Please find my comments below.

[...]
>Just quickly commenting on this point.

>IIUC, your purpose is for users to be able to run (an example
>application of) the old implementation.
>
>This can be achieved by having all the "legacy" codes within
>module
>  commons-math-examples/examples-ga/examples-ga-math-functions
>(note: No "legacy" in the module's name), within a dedicated
>  o.a.c.m.examples.ga.mathfunctions.legacy
>package.
>
>This code is then called by the exact same code/application as
>for the new implementation (with the corresponding command
>line switch):
>  $ java -jar examples-ga-app.jar --legacy ... rest of the args ...
>
>Users can thus perform 2 runs; once with "--legacy" and one
>without it, and reach some conclusions.
>
>The duplicate codes only bring maintenance burden (to ensure
>that the "legacy" and non-"legacy" modules do indeed aim at
>solving the same problem).
>Whenever we then decide that the new code has been thoroughly
>tested, removal of the
>  o.a.c.m.examples.ga.mathfunctions.legacy
>package will be a minimal change (as compared to the removal
>of a module)

--I have made the changes and created a new PR. Kindly review the same and
share your thoughts.
*https://github.com/apache/commons-math/pull/208
*


Thanks & Regards
--Avijit Basak



On Mon, 28 Mar 2022 at 18:36, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 28 mars 2022 à 10:15, Avijit Basak  a
> écrit :
> >
> > [...]
> >
> > >The various "Standalone" classes also look quite similar; consolidating
> the
> > >"examples-ga" module (including full Javadoc) is necessary.
> > -- Could you please elaborate it more. IMHO as StandAlone classes are
> > dedicated to the specific module only, it would remain separate. Since we
> > have used a single domain to show utility of the different
> > types(adaptive/simple) of GA some classes have become similar.
> >
> > >I still don't
> > >understand why there are "...-legacy" modules in module "examples-ga".
> > >If you want to offer the option of running the "old" implementation, you
> > >could add a "legacy" flag (as "@Option" in the "Standalone"
> application).
> > -- There was a discussion on this some time back. The sole purpose of
> > keeping the legacy example module is for comparison with the new
> > implementation. It will be easier for anyone to visualize the quality
> > improvement we achieved here. I don't want to mix(by legacy flag) this
> > anyway with the new implementation.
> >
>
> Just quickly commenting on this point.
>
> IIUC, your purpose is for users to be able to run (an example
> application of) the old implementation.
>
> This can be achieved by having all the "legacy" codes within
> module
>   commons-math-examples/examples-ga/examples-ga-math-functions
> (note: No "legacy" in the module's name), within a dedicated
>   o.a.c.m.examples.ga.mathfunctions.legacy
> package.
>
> This code is then called by the exact same code/application as
> for the new implementation (with the corresponding command
> line switch):
>   $ java -jar examples-ga-app.jar --legacy ... rest of the args ...
>
> Users can thus perform 2 runs; once with "--legacy" and one
> without it, and reach some conclusions.
>
> The duplicate codes only bring maintenance burden (to ensure
> that the "legacy" and non-"legacy" modules do indeed aim at
> solving the same problem).
> Whenever we then decide that the new code has been thoroughly
> tested, removal of the
>   o.a.c.m.examples.ga.mathfunctions.legacy
> package will be a minimal change (as compared to the removal
> of a module).
>
> Regards,
> Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Math] Review of "genetic algorithm" module

2022-03-28 Thread Gilles Sadowski
Hello.

Le lun. 28 mars 2022 à 10:15, Avijit Basak  a écrit :
>
> [...]
>
> >The various "Standalone" classes also look quite similar; consolidating the
> >"examples-ga" module (including full Javadoc) is necessary.
> -- Could you please elaborate it more. IMHO as StandAlone classes are
> dedicated to the specific module only, it would remain separate. Since we
> have used a single domain to show utility of the different
> types(adaptive/simple) of GA some classes have become similar.
>
> >I still don't
> >understand why there are "...-legacy" modules in module "examples-ga".
> >If you want to offer the option of running the "old" implementation, you
> >could add a "legacy" flag (as "@Option" in the "Standalone" application).
> -- There was a discussion on this some time back. The sole purpose of
> keeping the legacy example module is for comparison with the new
> implementation. It will be easier for anyone to visualize the quality
> improvement we achieved here. I don't want to mix(by legacy flag) this
> anyway with the new implementation.
>

Just quickly commenting on this point.

IIUC, your purpose is for users to be able to run (an example
application of) the old implementation.

This can be achieved by having all the "legacy" codes within
module
  commons-math-examples/examples-ga/examples-ga-math-functions
(note: No "legacy" in the module's name), within a dedicated
  o.a.c.m.examples.ga.mathfunctions.legacy
package.

This code is then called by the exact same code/application as
for the new implementation (with the corresponding command
line switch):
  $ java -jar examples-ga-app.jar --legacy ... rest of the args ...

Users can thus perform 2 runs; once with "--legacy" and one
without it, and reach some conclusions.

The duplicate codes only bring maintenance burden (to ensure
that the "legacy" and non-"legacy" modules do indeed aim at
solving the same problem).
Whenever we then decide that the new code has been thoroughly
tested, removal of the
  o.a.c.m.examples.ga.mathfunctions.legacy
package will be a minimal change (as compared to the removal
of a module).

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-03-28 Thread Avijit Basak
Hi All

   Please find my comments.

[...]
>I don't think that it's the right way to go; instantiating an
"ExecutorService"
>belongs to the GA application, not the GA library (whose relevant classes
>need "only" be thread-safe).
>There is some misunderstanding to be clarified in a dedicated discussion
>(please file a new JIRA ticket).
-- I have created a subtask under the same Jira(MATH-1563). Please share
your thoughts.
https://issues.apache.org/jira/browse/MATH-1643

>Side note: Conflicts and duplicate commits have accumulated in the
>dedicated "feature__MATH-1563__genetic_algorithm" branch.
>I did not know how to proceed in order to avoid ending up with a messy
>history in "master"; so I created a new branch (with the same name) with
>all the new GA-related files added as a single commit.
>Currently, this branch (based on your PR #205) fails the default goal,
>because of a CheckStyle issue.  You shoudl always check locally that
>running "mvn" without arguments does not generate any errors.

>I also noticed that classes in "examples-ga" use "forbidden" library
>classes: "GeneticIllegalArgumentException" is an "internal" class; we
>must not advertize such classes in the example applications.
-- I have replaced the "GeneticIllegalArgumentException" by
"IllegalArgumentException".

>In general, it seems that "examples-ga" contains several classes and
>methods that do not need to be "public".  This is especially true for
>classes like "MathFunction" and "Coordinate".  [Having those "private"
>helps users to tell what is part of the library's functionality from what
is
>just "dummy" placeholder code.]
-- I have replaced the MathFunction and CoordinateDecoder with lambda.
However, the Coordinate class is a domain object (phenotype). So this needs
to remain public. This can be used in more than one place for the entire
application.

>Finally (for now), I've just noticed that there exist several classes named
>"MathFunction", with same implementation!
>Code duplication must be avoided, especially where we purport to display
>best practices.
-- As mentioned above this has been removed.

>The various "Standalone" classes also look quite similar; consolidating the
>"examples-ga" module (including full Javadoc) is necessary.
-- Could you please elaborate it more. IMHO as StandAlone classes are
dedicated to the specific module only, it would remain separate. Since we
have used a single domain to show utility of the different
types(adaptive/simple) of GA some classes have become similar.

>I still don't
>understand why there are "...-legacy" modules in module "examples-ga".
>If you want to offer the option of running the "old" implementation, you
>could add a "legacy" flag (as "@Option" in the "Standalone" application).
-- There was a discussion on this some time back. The sole purpose of
keeping the legacy example module is for comparison with the new
implementation. It will be easier for anyone to visualize the quality
improvement we achieved here. I don't want to mix(by legacy flag) this
anyway with the new implementation.

>Please use the new branch for all these ("cleanup") changes, as the basis
>a PR (with a *single* commit).
-- I have taken the changes and will create a new PR soon with all my
changes.


Thanks & Regards
--Avijit Basak

On Sun, 13 Mar 2022 at 06:39, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 28 févr. 2022 à 07:11, Avijit Basak  a
> écrit :
> >
> > Hi All
> >
> > Please see my comments below.
> >
> > > [...]
> > >I just had a very quick look.
> > >IIUC, you always provide "convenience" methods (e.g. the various
> > >signatures for the "evolve" functionality).
> > >Prior to merging into "master", we should simplify and limit the
> > >discussion to the core functionality, i.e. not try and make decisions
> > >for the user (like default values, ...).  Please keep the API as simple
> > >as possible
> > -- I have removed the mentioned evolve method.
> > However, I had to catch two checked exceptions (InterruptedException,
> > ExecutionException) and rethrow them. As of now I have handled them using
> > the GeneticIllegalArgumentException. I think we need to introduce another
> > exception class to handle this. Please share your thought regarding this.
>
> I don't think that it's the right way to go; instantiating an
> "ExecutorService"
> belongs to the GA application, not the GA library (whose relevant classes
> need "only" be thread-safe).
> There is some misunderstanding to be clarified in a dedicated discussion
> (please file a new JIRA ticket).
>
> Side note: Conflicts and duplicate commits have accumulated in the
> dedicated "feature__MATH-1563__genetic_algorithm" branch.
> I did not know how to proceed in order to avoid ending up with a messy
> history in "master"; so I created a new branch (with the same name) with
> all the new GA-related files added as a single commit.
>
> Currently, this branch (based on your PR #205) fails the default goal,
> because of a CheckStyle issue.  Yo

Re: [Math] Review of "genetic algorithm" module

2022-03-12 Thread Gilles Sadowski
Hello.

Le lun. 28 févr. 2022 à 07:11, Avijit Basak  a écrit :
>
> Hi All
>
> Please see my comments below.
>
> > [...]
> >I just had a very quick look.
> >IIUC, you always provide "convenience" methods (e.g. the various
> >signatures for the "evolve" functionality).
> >Prior to merging into "master", we should simplify and limit the
> >discussion to the core functionality, i.e. not try and make decisions
> >for the user (like default values, ...).  Please keep the API as simple
> >as possible
> -- I have removed the mentioned evolve method.
> However, I had to catch two checked exceptions (InterruptedException,
> ExecutionException) and rethrow them. As of now I have handled them using
> the GeneticIllegalArgumentException. I think we need to introduce another
> exception class to handle this. Please share your thought regarding this.

I don't think that it's the right way to go; instantiating an "ExecutorService"
belongs to the GA application, not the GA library (whose relevant classes
need "only" be thread-safe).
There is some misunderstanding to be clarified in a dedicated discussion
(please file a new JIRA ticket).

Side note: Conflicts and duplicate commits have accumulated in the
dedicated "feature__MATH-1563__genetic_algorithm" branch.
I did not know how to proceed in order to avoid ending up with a messy
history in "master"; so I created a new branch (with the same name) with
all the new GA-related files added as a single commit.

Currently, this branch (based on your PR #205) fails the default goal,
because of a CheckStyle issue.  You shoudl always check locally that
running "mvn" without arguments does not generate any errors.

I also noticed that classes in "examples-ga" use "forbidden" library
classes: "GeneticIllegalArgumentException" is an "internal" class; we
must not advertize such classes in the example applications.
In general, it seems that "examples-ga" contains several classes and
methods that do not need to be "public".  This is especially true for
classes like "MathFunction" and "Coordinate".  [Having those "private"
helps users to tell what is part of the library's functionality from what is
just "dummy" placeholder code.]

Finally (for now), I've just noticed that there exist several classes named
"MathFunction", with same implementation!
Code duplication must be avoided, especially where we purport to display
best practices.
The various "Standalone" classes also look quite similar; consolidating the
"examples-ga" module (including full Javadoc) is necessary.  I still don't
understand why there are "...-legacy" modules in module "examples-ga".
If you want to offer the option of running the "old" implementation, you
could add a "legacy" flag (as "@Option" in the "Standalone" application).

Please use the new branch for all these ("cleanup") changes, as the basis
a PR (with a *single* commit).  Thanks.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-02-27 Thread Avijit Basak
Hi All

Please see my comments below.

> [...]
>I just had a very quick look.
>IIUC, you always provide "convenience" methods (e.g. the various
>signatures for the "evolve" functionality).
>Prior to merging into "master", we should simplify and limit the
>discussion to the core functionality, i.e. not try and make decisions
>for the user (like default values, ...).  Please keep the API as simple
>as possible
-- I have removed the mentioned evolve method.
However, I had to catch two checked exceptions (InterruptedException,
ExecutionException) and rethrow them. As of now I have handled them using
the GeneticIllegalArgumentException. I think we need to introduce another
exception class to handle this. Please share your thought regarding this.

Thanks & Regards
--Avijit Basak



On Mon, 21 Feb 2022 at 20:11, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 21 févr. 2022 à 06:56, Avijit Basak  a
> écrit :
> >
> > Hi All
> >
> > Please find my comments below:
> >
> > [...]
> > >
> > >Another misunderstanding (probably); we must figure out where
> > >the parallelism will be implemented.
> > >IIUC the current state of the code, optimizing multiple populations
> > >in parallel would be the same as launching multiple JVMs; I'd want
> > >to explore low-level parallelism (i.e. at the "Chromosome" level).
> > -- I have implemented both muti-threading and multi-population
> parallelism.
>
> I just had a very quick look.
> IIUC, you always provide "convenience" methods (e.g. the various
> signatures for the "evolve" functionality).
> Prior to merging into "master", we should simplify and limit the
> discussion to the core functionality, i.e. not try and make decisions
> for the user (like default values, ...).  Please keep the API as simple
> as possible.
>
> Thanks,
> Gilles
>
>  [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Math] Review of "genetic algorithm" module

2022-02-21 Thread Gilles Sadowski
Hello.

Le lun. 21 févr. 2022 à 06:56, Avijit Basak  a écrit :
>
> Hi All
>
> Please find my comments below:
>
> [...]
> >
> >Another misunderstanding (probably); we must figure out where
> >the parallelism will be implemented.
> >IIUC the current state of the code, optimizing multiple populations
> >in parallel would be the same as launching multiple JVMs; I'd want
> >to explore low-level parallelism (i.e. at the "Chromosome" level).
> -- I have implemented both muti-threading and multi-population parallelism.

I just had a very quick look.
IIUC, you always provide "convenience" methods (e.g. the various
signatures for the "evolve" functionality).
Prior to merging into "master", we should simplify and limit the
discussion to the core functionality, i.e. not try and make decisions
for the user (like default values, ...).  Please keep the API as simple
as possible.

Thanks,
Gilles

 [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-02-21 Thread Gilles Sadowski
Hello.

Le lun. 21 févr. 2022 à 06:56, Avijit Basak  a écrit :
>
> Hi All
>
> Please find my comments below:
>
> >The build fails because of CheckStyle errors:
> >https://app.travis-ci.com/github/apache/commons-math/builds/246683712
> --Fixed the issues

Could you please squash all the new commits?
[Of course, it's great to spell out the various changes, in the
"long" description of that commit.  Note that this grouping is
in contrast to what would be done in "master", where each
type of change should have its own commit.  Here the
intention is to keep the history clean until the agreed-on
code is merged to "master".]

Thanks,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] Review of "genetic algorithm" module

2022-02-20 Thread Avijit Basak
Hi All

Please find my comments below:

>The build fails because of CheckStyle errors:
>https://app.travis-ci.com/github/apache/commons-math/builds/246683712
--Fixed the issues

 [...]
>> >> >
>> >> >I did not suggest to remove any Javadoc, only to rephrase it as:
[...]
>> >As hinted by my comment is the previous message, I've still to
>> >clarify my own expectations; but I vaguely sense some lost
>> >opportunity for simpler usage simpler and increased performance
>> >through the caller just needs to specify the number of "worker
>> >threads".
>> -- We should have both options. Users can execute the algorithm by
>> specifying only the number of worker threads with a single population as
>> well as optimize multiple populations in a parallel fashion.
>
>Another misunderstanding (probably); we must figure out where
>the parallelism will be implemented.
>IIUC the current state of the code, optimizing multiple populations
>in parallel would be the same as launching multiple JVMs; I'd want
>to explore low-level parallelism (i.e. at the "Chromosome" level).
-- I have implemented both muti-threading and multi-population parallelism.

>
>> For the multi-population option the common thread pool would be reused
for
>> all populations.
>> >
>> >Do we at least agree that
>> >1. Adding/retrieving a "Chromosome" to/from a "Population"
>> >must be thread-safe (and is not trivial)
>> >2. Fitness computation is where most time is usually spent
>> >(so that multi-threading must be achieved at that granularity)
>> >?
>> --The way I am thinking of designing a task is that it should accept the
>> current population and return an instance of ChromosomePair.
>> The chromosomes within this pair would be added to the population by the
>> caller thread. Population won't be updated by multiple threads.
>
>Then, it would not be a multi-threaded library.
>This kind of parallelism does not need support from the library,
>and can be implemented at the application level.
>
>> The code snippet below shows the body of the method which will be
executed
>> inside the task.
>> --CUT--
>>
>> //selection
>> ChromosomePair pair = getSelectionPolicy().select(current);
>>
>> // crossover
>> if (randGen.nextDouble() < getCrossoverRate()) {
>> // apply crossover policy to create two offspringoport
>> pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
>> }
>>
>> // mutation
>> if (randGen.nextDouble() < getMutationRate()) {
>> // apply mutation policy to the chromosomes
>> pair = new ChromosomePair(
>> getMutationPolicy().mutate(pair.getFirst()),
>> getMutationPolicy().mutate(pair.getSecond()));
>> }
>>
>> return pair;
>>
>> --CUT--
>
>One of the issue with above code is, again, that "randGen"
>must be thread-safe (and it is not, usually).
>Also, it doesn't say anything about how to ensure that
>the fitness computation is thread-safe (and if you assume
>that it will be computed outside that "task", then the
>performance gain will be very low).
>
-- Current implementation is using a Thread local version of random number
generator.

>> >
>> >I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
>> >is an application concern; unless I'm missing something, it's
>> >not what I've in mind when talking about multi-threading.
>> >Actually, I was wondering whether we could implement the
>> >analog of what is in the "commons-math-neuralnet" module,
>> >where
>> >* "Neuron" is the counterpart "Chromosome"
>> >* "Network" is the counterpart of "Population".
>> --"multiple instance of AbstractGeneticAlgorithm" is related to parallel
GA
>> with multiple populations not multi-threading.
>
>Yes, as I also mentioned above.
>But I'm interested in where multi-threading can be implemented to
>be used in both cases (single population and multiple populations).
>
>> Users can also implement parallel GA in a synchronous manner although
that
>> won't be a recommended way.
>> Multi-threading is only a way to improve performance using a user's multi
>> core CPU.
>> The threads in the thread pool would only be used to execute the task as
>> mentioned in the previous comment.
>
>That's where I've some doubt.
>But be free to implement benchmarks that demonstrate the
>expected performance improvement.
>
>> I think we have some misunderstanding over here. It is better to do an
>> implementation first and start the discussion.
>> It would be more productive.
>
>Agreed. ;-)

--I have checked-in the code for multi-threading along with parallel GA.
Kindly review and share your thoughts.


Thanks & Regards
--Avijit Basak

On Sun, 20 Feb 2022 at 05:25, Gilles Sadowski  wrote:

> Hello.
>
> Le ven. 18 févr. 2022 à 11:32, Avijit Basak  a
> écrit :
> >
> > Hi All
> >
> > Please see my comments below.
> >
> >> [...]
> >
> > --I have modified the exception class name to
> > GeneticIllegalArgumentException.
>
> The build fails because of CheckStyle errors:
> https://app.travis-ci.com/github/apache/commons-math/builds/246683712
>
> >>> [..

Re: [Math] Review of "genetic algorithm" module

2022-02-19 Thread Gilles Sadowski
Hello.

Le ven. 18 févr. 2022 à 11:32, Avijit Basak  a écrit :
>
> Hi All
>
> Please see my comments below.
>
>> [...]
>
> --I have modified the exception class name to
> GeneticIllegalArgumentException.

The build fails because of CheckStyle errors:
https://app.travis-ci.com/github/apache/commons-math/builds/246683712

>>> [...]
> >> >
> >> >I did not suggest to remove any Javadoc, only to rephrase it as:
> >> >---CUT---
> >> >/** "Message template". */
> >> >---CUT---
> >> >
> >> -- Just for confirmation. Do you need the comment to be changed from
> >> ---CUT---
> >> /** Error message for "out of range" condition. */
> >> ---CUT---
> >> *to*
> >> ---CUT---
> >> /** "out of range" */
> >> ---CUT---
> >
> >No, sorry, I just meant that e.g.
> >---CUT---
> >/** Message template. */
> >private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
> >---CUT---
> >is sufficient. Since the field is private, the comment will mostly ever
> >be read while looking at the source code.
>
> --I think we have a misunderstanding here. The field is public, not
> private.

Sorry, my mistake.

> Will a hard coded "Message Template" comment be sufficient?

You are right, better be somewhat more specific.
However, it would even better if we can manage to keep
the internal classes private. [This can be explored further
when the design has stabilized.]

>
> >>   JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
> >> ...
> >> }
> >> ---CUT---
> >> I am not sure how to achieve this for our algorithm classes.
> >
> >I'm a bit lost here; I don't get the relationship of this with my
> >remark above (constant vs adaptive)...  Maybe you could file
> >JIRA report to clarify and further discussion?
>
> --Let me try to clarify my original understanding. I think the proposal is
> to introduce an Enum and declare each type of algorithm classes as
> instances of enum. So it might follow this template:
> --CUT--
> enum GeneticAlgorithms {
> SGA([args...]),//Simple genetic algorithm equivalent to
> GeneticAlgorithm class
> AGA([args...]) //Adaptive genetic algorithm.
> }
> --CUT--
> Where [args...] are selection, crossover, mutation operators and
> probalities.

I didn't mean anything like that; AFAICT there may have
been some mixing between two comments. ;-)

>
> As the AdaptiveGeneticAlgorithm is a generalization of GeneticAlgorithm it
> would be possible to represent the later by former using constant crossover
> and mutation rate generators.
> But the problem lies in a different area.
> In enumeration the required arguments ([args...]) need to be specified
> during the declaration using constant values.
> But for our implementation those arguments need to be provided by the
> client programs. I don't know how to achieve that using enum.
> Let me know if you have any different views regarding this.

We can revisit this later.

> >> [...]
> >
> >As hinted by my comment is the previous message, I've still to
> >clarify my own expectations; but I vaguely sense some lost
> >opportunity for simpler usage simpler and increased performance
> >through the caller just needs to specify the number of "worker
> >threads".
> -- We should have both options. Users can execute the algorithm by
> specifying only the number of worker threads with a single population as
> well as optimize multiple populations in a parallel fashion.

Another misunderstanding (probably); we must figure out where
the parallelism will be implemented.
IIUC the current state of the code, optimizing multiple populations
in parallel would be the same as launching multiple JVMs; I'd want
to explore low-level parallelism (i.e. at the "Chromosome" level).

> For the multi-population option the common thread pool would be reused for
> all populations.
> >
> >Do we at least agree that
> >1. Adding/retrieving a "Chromosome" to/from a "Population"
> >must be thread-safe (and is not trivial)
> >2. Fitness computation is where most time is usually spent
> >(so that multi-threading must be achieved at that granularity)
> >?
> --The way I am thinking of designing a task is that it should accept the
> current population and return an instance of ChromosomePair.
> The chromosomes within this pair would be added to the population by the
> caller thread. Population won't be updated by multiple threads.

Then, it would not be a multi-threaded library.
This kind of parallelism does not need support from the library,
and can be implemented at the application level.

> The code snippet below shows the body of the method which will be executed
> inside the task.
> --CUT--
>
> //selection
> ChromosomePair pair = getSelectionPolicy().select(current);
>
> // crossover
> if (randGen.nextDouble() < getCrossoverRate()) {
> // apply crossover policy to create two offspringoport
> pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
> }
>
> // mutation
> if (randGen.nextDouble() < getMutationRate()) {
> // apply mutation policy to the chromosomes
> pair = new C

Re: [Math] Review of "genetic algorithm" module

2022-02-18 Thread Avijit Basak
Hi All

Please see my comments below.

>> Please find my comments.
>>
>> >> (2)
>> >> >The "GeneticException" class seems to mostly deal with "illegal"
>> >> >arguments; hence it should be a subclass of the JDK's standard
>> >> >"IllegalArgumentException" (and be renamed accordingly).
>[...]
>> >
>> -- I shall make the changes. What could be the name for the Exception.
>> GeneticIllegalArgumentException would be fine?
>
>I guess so.
>
>The fool-proof, preferred approach, is to have package-private
>exception classes.
>
>Another suggestion would be to create a container class (within
>a package duly marked as "internal, not part of the public API"):
>---CUT---
>public GeneticException {
>
>public static class IllegalArgument extends IllegalArgumentException {
>// ...
>}
>
>// etc.
>}
>---CUT---
>
>The important thing is that the "@throws" Javadoc tags only
>document the base classes (i.e. the JDK's standard exception).
>

--I have modified the exception class name to
GeneticIllegalArgumentException.


>> >>
>> >> >[Exception messages need review for spelling and formatting.]
>> >> -- It will be really helpful if you can point out some specific
examples.
>> >
>> >We can fix this when the PR has reached some stability.
>> >
>> >>
>> >> (3)
>> >> >IMO Javadoc should avoid redundant phrases like "This class" as
>> >> >the first words of a class description.
>> >> --Refractored the javadoc comments. Please review and mention if you
need
>> >> any further changes.
>> >
>> >I've not looked yet, but thanks for taking it into account.
>> >Similarly to the previous point, these clean-ups can happen later.
>> >
>> >> >A similar remark holds for fields in "GeneticException" class: Since
>> >> >the name of the field is self-documenting, duplication in the Javadoc
>> >> >is visual noise ("Message template" is concise and clear enough).
>> >> --Removal of the javadoc comments produces a checkstyle error.
>> >
>> >I did not suggest to remove any Javadoc, only to rephrase it as:
>> >---CUT---
>> >/** "Message template". */
>> >---CUT---
>> >
>> -- Just for confirmation. Do you need the comment to be changed from
>> ---CUT---
>> /** Error message for "out of range" condition. */
>> ---CUT---
>> *to*
>> ---CUT---
>> /** "out of range" */
>> ---CUT---
>
>No, sorry, I just meant that e.g.
>---CUT---
>/** Message template. */
>private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
>---CUT---
>is sufficient. Since the field is private, the comment will mostly ever
>be read while looking at the source code.

--I think we have a misunderstanding here. The field is public, not
private. Will a hard coded "Message Template" comment be sufficient?


>>   JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
>> ...
>> }
>> ---CUT---
>> I am not sure how to achieve this for our algorithm classes.
>
>I'm a bit lost here; I don't get the relationship of this with my
>remark above (constant vs adaptive)...  Maybe you could file
>JIRA report to clarify and further discussion?

--Let me try to clarify my original understanding. I think the proposal is
to introduce an Enum and declare each type of algorithm classes as
instances of enum. So it might follow this template:
--CUT--
enum GeneticAlgorithms {
SGA([args...]),//Simple genetic algorithm equivalent to
GeneticAlgorithm class
AGA([args...]) //Adaptive genetic algorithm.
}
--CUT--
Where [args...] are selection, crossover, mutation operators and
probalities.

As the AdaptiveGeneticAlgorithm is a generalization of GeneticAlgorithm it
would be possible to represent the later by former using constant crossover
and mutation rate generators.
But the problem lies in a different area.
In enumeration the required arguments ([args...]) need to be specified
during the declaration using constant values.
But for our implementation those arguments need to be provided by the
client programs. I don't know how to achieve that using enum.
Let me know if you have any different views regarding this.

>
>> >
>> >>
>> >> (7)
>> >> >The currently available GA implementations are sequential.
>> >> >IIUC, the "nextGeneration" methods should provide an option
>> >> >that processes the population using multiple threads.
>> >> --This needs to be done. However,  I would like to address this along
>> with
>> >> parallel GA i.e. convergence of multiple populations together.
>> >
>> >The two features (multi-thread vs multiple populations) should
>> >be implemented independently:  Users that only need the "basic"
>> >GA should also be able to take advantage of their machine's
>> >multiple CPUs.
>> >[This is related to the design issue which I mentioned previously.]
>> >
>> -- I am thinking to leverage user's multiple CPUs for doing
>> multi-population GA.
>
>OK (sort-of, since "the devil is in the details", and I'm not sure
>that we mean the same thing by "multi", see below).
>
>> It would a global approach where same thread pool
>> would be used for both purposes

Re: [Math] Review of "genetic algorithm" module

2022-02-16 Thread Gilles Sadowski
Hello.

Le mer. 16 févr. 2022 à 17:31, Avijit Basak  a écrit :
>
> Hi All
>
> Please find my comments.
>
> >> (2)
> >> >The "GeneticException" class seems to mostly deal with "illegal"
> >> >arguments; hence it should be a subclass of the JDK's standard
> >> >"IllegalArgumentException" (and be renamed accordingly).
> >> >If other condition types are needed, then another internal class
> >> >should be defined with the corresponding standard semantics.
> >> --IMHO if we think of a single exception class we should extend it only
> >> from RuntimeException.
>
> >"single exception class" is not a requirement (it cannot be since
> >we agreed some time ago that it was better to align with the JDK's
> >delineation of error conditions (IAE, NPE, ILSE, AE, ...).
>
> >> If we think of multiple exception classes in one
> >> module we may need to think of a base exception class. Other classes
> would
> >> extend the same.
> >
> >Please no.  We'd taken that approach in "Commons Math" (cf.
> >base class now in module "commons-math-legacy-exception"),
> >as I've mentioned already IIRC: It was a failed experiment IMO.
> >[For more details, please refer to the archive of the "dev" ML.]
> >
> >> The approach mentioned above would mix up these two.
> >> Please share your opinion regarding this.
> >
> >Eventually, all new components ([RNG], [Number], [Geometry], ...)
> >adopted the simple approach of non-public API (ideally private
> >or package-private) exception classes only for the developer's
> >use (and the purpose of which is limited to avoiding duplication).
> >
> -- I shall make the changes. What could be the name for the Exception.
> GeneticIllegalArgumentException would be fine?

I guess so.

The fool-proof, preferred approach, is to have package-private
exception classes.

Another suggestion would be to create a container class (within
a package duly marked as "internal, not part of the public API"):
---CUT---
public GeneticException {

public static class IllegalArgument extends IllegalArgumentException {
// ...
}

// etc.
}
---CUT---

The important thing is that the "@throws" Javadoc tags only
document the base classes (i.e. the JDK's standard exception).

> >>
> >> >[Exception messages need review for spelling and formatting.]
> >> -- It will be really helpful if you can point out some specific examples.
> >
> >We can fix this when the PR has reached some stability.
> >
> >>
> >> (3)
> >> >IMO Javadoc should avoid redundant phrases like "This class" as
> >> >the first words of a class description.
> >> --Refractored the javadoc comments. Please review and mention if you need
> >> any further changes.
> >
> >I've not looked yet, but thanks for taking it into account.
> >Similarly to the previous point, these clean-ups can happen later.
> >
> >> >A similar remark holds for fields in "GeneticException" class: Since
> >> >the name of the field is self-documenting, duplication in the Javadoc
> >> >is visual noise ("Message template" is concise and clear enough).
> >> --Removal of the javadoc comments produces a checkstyle error.
> >
> >I did not suggest to remove any Javadoc, only to rephrase it as:
> >---CUT---
> >/** "Message template". */
> >---CUT---
> >
> -- Just for confirmation. Do you need the comment to be changed from
> ---CUT---
> /** Error message for "out of range" condition. */
> ---CUT---
> *to*
> ---CUT---
> /** "out of range" */
> ---CUT---

No, sorry, I just meant that e.g.
---CUT---
/** Message template. */
private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
---CUT---
is sufficient.  Since the field is private, the comment will mostly ever
be read while looking at the source code.

> >> (6)
> >> >Why support inheritance for "AbstractGeneticAlgorithm"?
> >> >Why would users need their own subclass, rather than call those
> >> >implemented within the library (currently, "GeneticAlgorithm" and
> >> >"AdaptiveGeneticAlgorithm")?
> >> >Couldn't we encapsulate the choice of algorithm in an "enum",
> >> >similar to "RandomSource" in [RNG].
> >> >Do I understand correctly that the (only?) difference between the
> >> >two classes is the ability to adapt crossover and mutation rates?
> >> -- The difference between GeneticAlgorithm and AdaptiveGeneticAlgorithm
> is
> >> the ability to adapt crossover and mutation probability. However,  as per
> >> my understanding enum encapsulation is appropriate with the same set and
> >> type of constructor arguments, where the arguments can be provided during
> >> enum declaration. In our case the arguments would be provided by the
> client
> >> program and cannot be pre-initialized as part of an enum declaration.
> >
> >Why not?
> >A constant rate seems like a (trivial) type of adaptive rate.
>
> -- Our algorithm classes accept various user provided arguments like
> crossover, mutation and selection operators. Enum declaration requires all
> of the arguments to be provided at the declaration time in the class itself
> li

Re: [Math] Review of "genetic algorithm" module

2022-02-16 Thread Avijit Basak
Hi All

Please find my comments.

>> (2)
>> >The "GeneticException" class seems to mostly deal with "illegal"
>> >arguments; hence it should be a subclass of the JDK's standard
>> >"IllegalArgumentException" (and be renamed accordingly).
>> >If other condition types are needed, then another internal class
>> >should be defined with the corresponding standard semantics.
>> --IMHO if we think of a single exception class we should extend it only
>> from RuntimeException.

>"single exception class" is not a requirement (it cannot be since
>we agreed some time ago that it was better to align with the JDK's
>delineation of error conditions (IAE, NPE, ILSE, AE, ...).

>> If we think of multiple exception classes in one
>> module we may need to think of a base exception class. Other classes
would
>> extend the same.
>
>Please no.  We'd taken that approach in "Commons Math" (cf.
>base class now in module "commons-math-legacy-exception"),
>as I've mentioned already IIRC: It was a failed experiment IMO.
>[For more details, please refer to the archive of the "dev" ML.]
>
>> The approach mentioned above would mix up these two.
>> Please share your opinion regarding this.
>
>Eventually, all new components ([RNG], [Number], [Geometry], ...)
>adopted the simple approach of non-public API (ideally private
>or package-private) exception classes only for the developer's
>use (and the purpose of which is limited to avoiding duplication).
>
-- I shall make the changes. What could be the name for the Exception.
GeneticIllegalArgumentException would be fine?

>>
>> >[Exception messages need review for spelling and formatting.]
>> -- It will be really helpful if you can point out some specific examples.
>
>We can fix this when the PR has reached some stability.
>
>>
>> (3)
>> >IMO Javadoc should avoid redundant phrases like "This class" as
>> >the first words of a class description.
>> --Refractored the javadoc comments. Please review and mention if you need
>> any further changes.
>
>I've not looked yet, but thanks for taking it into account.
>Similarly to the previous point, these clean-ups can happen later.
>
>> >A similar remark holds for fields in "GeneticException" class: Since
>> >the name of the field is self-documenting, duplication in the Javadoc
>> >is visual noise ("Message template" is concise and clear enough).
>> --Removal of the javadoc comments produces a checkstyle error.
>
>I did not suggest to remove any Javadoc, only to rephrase it as:
>---CUT---
>/** "Message template". */
>---CUT---
>
-- Just for confirmation. Do you need the comment to be changed from
---CUT---
/** Error message for "out of range" condition. */
---CUT---
*to*
---CUT---
/** "out of range" */
---CUT---
[...]
>> (6)
>> >Why support inheritance for "AbstractGeneticAlgorithm"?
>> >Why would users need their own subclass, rather than call those
>> >implemented within the library (currently, "GeneticAlgorithm" and
>> >"AdaptiveGeneticAlgorithm")?
>> >Couldn't we encapsulate the choice of algorithm in an "enum",
>> >similar to "RandomSource" in [RNG].
>> >Do I understand correctly that the (only?) difference between the
>> >two classes is the ability to adapt crossover and mutation rates?
>> -- The difference between GeneticAlgorithm and AdaptiveGeneticAlgorithm
is
>> the ability to adapt crossover and mutation probability. However,  as per
>> my understanding enum encapsulation is appropriate with the same set and
>> type of constructor arguments, where the arguments can be provided during
>> enum declaration. In our case the arguments would be provided by the
client
>> program and cannot be pre-initialized as part of an enum declaration.
>
>Why not?
>A constant rate seems like a (trivial) type of adaptive rate.

-- Our algorithm classes accept various user provided arguments like
crossover, mutation and selection operators. Enum declaration requires all
of the arguments to be provided at the declaration time in the class itself
like
---CUT---
public enum RandomSource {
  JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
...
}
---CUT---
I am not sure how to achieve this for our algorithm classes.

>
>>
>> (7)
>> >The currently available GA implementations are sequential.
>> >IIUC, the "nextGeneration" methods should provide an option
>> >that processes the population using multiple threads.
>> --This needs to be done. However,  I would like to address this along
with
>> parallel GA i.e. convergence of multiple populations together.
>
>The two features (multi-thread vs multiple populations) should
>be implemented independently:  Users that only need the "basic"
>GA should also be able to take advantage of their machine's
>multiple CPUs.
>[This is related to the design issue which I mentioned previously.]
>
-- I am thinking to leverage user's multiple CPUs for doing
multi-population GA. It would a global approach where same thread pool
would be used for both purposes. Another class would be introduced for
executing parallel genetic algor

Re: [Math] Review of "genetic algorithm" module

2022-02-14 Thread Gilles Sadowski
Hello.

Le lun. 14 févr. 2022 à 08:03, Avijit Basak  a écrit :
>
> Hi All
>
> Thanks for the review comments. Please find my comments below.
>
> (1)
> [...]
>
> (2)
> >The "GeneticException" class seems to mostly deal with "illegal"
> >arguments; hence it should be a subclass of the JDK's standard
> >"IllegalArgumentException" (and be renamed accordingly).
> >If other condition types are needed, then another internal class
> >should be defined with the corresponding standard semantics.
> --IMHO if we think of a single exception class we should extend it only
> from RuntimeException.

"single exception class" is not a requirement (it cannot be since
we agreed some time ago that it was better to align with the JDK's
delineation of error conditions (IAE, NPE, ILSE, AE, ...).

> If we think of multiple exception classes in one
> module we may need to think of a base exception class. Other classes would
> extend the same.

Please no.  We'd taken that approach in "Commons Math" (cf.
base class now in module "commons-math-legacy-exception"),
as I've mentioned already IIRC: It was a failed experiment IMO.
[For more details, please refer to the archive of the "dev" ML.]

> The approach mentioned above would mix up these two.
> Please share your opinion regarding this.

Eventually, all new components ([RNG], [Number], [Geometry], ...)
adopted the simple approach of non-public API (ideally private
or package-private) exception classes only for the developer's
use (and the purpose of which is limited to avoiding duplication).

>
> >[Exception messages need review for spelling and formatting.]
> -- It will be really helpful if you can point out some specific examples.

We can fix this when the PR has reached some stability.

>
> (3)
> >IMO Javadoc should avoid redundant phrases like "This class" as
> >the first words of a class description.
> --Refractored the javadoc comments. Please review and mention if you need
> any further changes.

I've not looked yet, but thanks for taking it into account.
Similarly to the previous point, these clean-ups can happen later.

> >A similar remark holds for fields in "GeneticException" class: Since
> >the name of the field is self-documenting, duplication in the Javadoc
> >is visual noise ("Message template" is concise and clear enough).
> --Removal of the javadoc comments produces a checkstyle error.

I did not suggest to remove any Javadoc, only to rephrase it as:
---CUT---
/** "Message template". */
---CUT---

> [...]
>
> (4)
> >Class "ConvergenceListenerRegistry" is generic but its code
> >contains undocumented "@SuppressWarnings" annotations.
> >Moreover, it is a singleton, and not thread-safe.
> >Why should there be such a global "registry"?
> >Since it is only accessed by the "AbstractGeneticAlgorithm" class,
> >it could be defined as a private inner class.
> --Made it a private inner class.

Thanks.
[We should nevertheless address the other issues mentioned in
the above paragraph.]

>
> (5)
> >In class "AbstractGeneticAlgorithm", methods "getCrossoverPolicy"
> >"getMutationPolicy", "getElitismRate" are public, yet they are only
> >ever called by a subclass.
> --Modified the public to protected.

Thanks.
I nevertheless suspect (as hinted at while mentioning the
expected multi-thread feature) that there is a design issue
here.  [I may be wrong, so I'll try to make my point stronger
after a more in-depth review.]

>
> (6)
> >Why support inheritance for "AbstractGeneticAlgorithm"?
> >Why would users need their own subclass, rather than call those
> >implemented within the library (currently, "GeneticAlgorithm" and
> >"AdaptiveGeneticAlgorithm")?
> >Couldn't we encapsulate the choice of algorithm in an "enum",
> >similar to "RandomSource" in [RNG].
> >Do I understand correctly that the (only?) difference between the
> >two classes is the ability to adapt crossover and mutation rates?
> -- The difference between GeneticAlgorithm and AdaptiveGeneticAlgorithm is
> the ability to adapt crossover and mutation probability. However,  as per
> my understanding enum encapsulation is appropriate with the same set and
> type of constructor arguments, where the arguments can be provided during
> enum declaration. In our case the arguments would be provided by the client
> program and cannot be pre-initialized as part of an enum declaration.

Why not?
A constant rate seems like a (trivial) type of adaptive rate.

>
> (7)
> >The currently available GA implementations are sequential.
> >IIUC, the "nextGeneration" methods should provide an option
> >that processes the population using multiple threads.
> --This needs to be done. However,  I would like to address this along with
> parallel GA i.e. convergence of multiple populations together.

The two features (multi-thread vs multiple populations) should
be implemented independently:  Users that only need the "basic"
GA should also be able to take advantage of their machine's
multiple CPUs.
[This is related to the design issue which I me

Re: [Math] Review of "genetic algorithm" module

2022-02-13 Thread Avijit Basak
Hi All

Thanks for the review comments. Please find my comments below.

(1)
>A commit log message should strive to be informative
>for the reviewer; saying the like of "fixed minor bugs" does
>not convey anything.
>Even minor changes, like e.g. formatting cleanup, should be
>designated as such.
>For this PR, the message (which I've amended) was misleading
>because the change was not about bugs, but about removing
>GUI code (and its dependency).
--I have maintained a detailed commit message this time.

(2)
>The "GeneticException" class seems to mostly deal with "illegal"
>arguments; hence it should be a subclass of the JDK's standard
>"IllegalArgumentException" (and be renamed accordingly).
>If other condition types are needed, then another internal class
>should be defined with the corresponding standard semantics.
--IMHO if we think of a single exception class we should extend it only
from RuntimeException. If we think of multiple exception classes in one
module we may need to think of a base exception class. Other classes would
extend the same. The approach mentioned above would mix up these two.
Please share your opinion regarding this.

>[Exception messages need review for spelling and formatting.]
-- It will be really helpful if you can point out some specific examples.

(3)
>IMO Javadoc should avoid redundant phrases like "This class" as
>the first words of a class description.
--Refractored the javadoc comments. Please review and mention if you need
any further changes.
>A similar remark holds for fields in "GeneticException" class: Since
>the name of the field is self-documenting, duplication in the Javadoc
>is visual noise ("Message template" is concise and clear enough).
--Removal of the javadoc comments produces a checkstyle error.
>Similarly, simple accessors don't need the exact same sentence
>repeated twice (a single "@return ..." tag is sufficient).
--Modified.

(4)
>Class "ConvergenceListenerRegistry" is generic but its code
>contains undocumented "@SuppressWarnings" annotations.
>Moreover, it is a singleton, and not thread-safe.
>Why should there be such a global "registry"?
>Since it is only accessed by the "AbstractGeneticAlgorithm" class,
>it could be defined as a private inner class.
--Made it a private inner class.

(5)
>In class "AbstractGeneticAlgorithm", methods "getCrossoverPolicy"
>"getMutationPolicy", "getElitismRate" are public, yet they are only
>ever called by a subclass.
--Modified the public to protected.

(6)
>Why support inheritance for "AbstractGeneticAlgorithm"?
>Why would users need their own subclass, rather than call those
>implemented within the library (currently, "GeneticAlgorithm" and
>"AdaptiveGeneticAlgorithm")?
>Couldn't we encapsulate the choice of algorithm in an "enum",
>similar to "RandomSource" in [RNG].
>Do I understand correctly that the (only?) difference between the
>two classes is the ability to adapt crossover and mutation rates?
-- The difference between GeneticAlgorithm and AdaptiveGeneticAlgorithm is
the ability to adapt crossover and mutation probability. However,  as per
my understanding enum encapsulation is appropriate with the same set and
type of constructor arguments, where the arguments can be provided during
enum declaration. In our case the arguments would be provided by the client
program and cannot be pre-initialized as part of an enum declaration.

(7)
>The currently available GA implementations are sequential.
>IIUC, the "nextGeneration" methods should provide an option
>that processes the population using multiple threads.
--This needs to be done. However,  I would like to address this along with
parallel GA i.e. convergence of multiple populations together.

(8)
>Do not use explicit "\n" and "\r" characters.[1]
--Done


Thanks & Regards
--Avijit Basak


On Mon, 7 Feb 2022 at 07:57, Gilles Sadowski  wrote:

> Hello.
>
> A few remarks (as of PR #205) and questions:
>
> (1)
> A commit log message should strive to be informative
> for the reviewer; saying the like of "fixed minor bugs" does
> not convey anything.
> Even minor changes, like e.g. formatting cleanup, should be
> designated as such.
> For this PR, the message (which I've amended) was misleading
> because the change was not about bugs, but about removing
> GUI code (and its dependency).
>
> (2)
> The "GeneticException" class seems to mostly deal with "illegal"
> arguments; hence it should be a subclass of the JDK's standard
> "IllegalArgumentException" (and be renamed accordingly).
> If other condition types are needed, then another internal class
> should be defined with the corresponding standard semantics.
> [Exception messages need review for spelling and formatting.]
>
> (3)
> IMO Javadoc should avoid redundant phrases like "This class" as
> the first words of a class description.
> A similar remark holds for fields in "GeneticException" class: Since
> the name of the field is self-documenting, duplication in the Javadoc
> is visual noise ("Message template" i

[Math] Review of "genetic algorithm" module

2022-02-06 Thread Gilles Sadowski
Hello.

A few remarks (as of PR #205) and questions:

(1)
A commit log message should strive to be informative
for the reviewer; saying the like of "fixed minor bugs" does
not convey anything.
Even minor changes, like e.g. formatting cleanup, should be
designated as such.
For this PR, the message (which I've amended) was misleading
because the change was not about bugs, but about removing
GUI code (and its dependency).

(2)
The "GeneticException" class seems to mostly deal with "illegal"
arguments; hence it should be a subclass of the JDK's standard
"IllegalArgumentException" (and be renamed accordingly).
If other condition types are needed, then another internal class
should be defined with the corresponding standard semantics.
[Exception messages need review for spelling and formatting.]

(3)
IMO Javadoc should avoid redundant phrases like "This class" as
the first words of a class description.
A similar remark holds for fields in "GeneticException" class: Since
the name of the field is self-documenting, duplication in the Javadoc
is visual noise ("Message template" is concise and clear enough).
Similarly, simple accessors don't need the exact same sentence
repeated twice (a single "@return ..." tag is sufficient).

(4)
Class "ConvergenceListenerRegistry" is generic but its code
contains undocumented "@SuppressWarnings" annotations.
Moreover, it is a singleton, and not thread-safe.
Why should there be such a global "registry"?
Since it is only accessed by the "AbstractGeneticAlgorithm" class,
it could be defined as a private inner class.

(5)
In class "AbstractGeneticAlgorithm", methods "getCrossoverPolicy"
"getMutationPolicy", "getElitismRate" are public, yet they are only
ever called by a subclass.

(6)
Why support inheritance for "AbstractGeneticAlgorithm"?
Why would users need their own subclass, rather than call those
implemented within the library (currently, "GeneticAlgorithm" and
"AdaptiveGeneticAlgorithm")?
Couldn't we encapsulate the choice of algorithm in an "enum",
similar to "RandomSource" in [RNG].
Do I understand correctly that the (only?) difference between the
two classes is the ability to adapt crossover and mutation rates?

(7)
The currently available GA implementations are sequential.
IIUC, the "nextGeneration" methods should provide an option
that processes the population using multiple threads.

(8)
Do not use explicit "\n" and "\r" characters.[1]


Regards,
Gilles

[1] See 
https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#lineSeparator--

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org