Re: [Vote] Create a "machine learning" component

2021-02-09 Thread Ralph Goers
-1 on commons-ml as the name. My first thought is such a repo would hold stuff 
related to mailing lists. Then again maybe it contains stuff relating to markup 
languages. Maybe it is Apache’s version of the ML Programming Language [1].

However, I wouldn’t be -1 on commons-math-ml, although at best I would be +0 
since it is still not obvious what it would contain.

Ralph

1. http://web.cecs.pdx.edu/~black/CS311/ML.html

> On Feb 9, 2021, at 3:43 PM, Gilles Sadowski  wrote:
> 
> Hi.
> 
> Because of an offered contribution, a discussion happened on
> JIRA[1] and in another thread[2] about improving the genetic
> algorithm (GA) implementation currently in the
>   org.apache.commons.math4.genetic
> package of the "Commons Math" component.
> It would make sense to group "machine learning" algorithms[3]
> (to which GA belongs) within a single component, where codes from
>  org.apache.commons.math4.ml.neuralnet
>  org.apache.commons.math4.ml.clustering
> would be moved too.
> This would be the fifth (and last) component resulting from my proposal
> (see e.g. [4] among other threads) for the reorganization of the "Commons
> Math"[5] code base into more maintainable components[6][7][8][9], each
> focused on actually related functionalities (thus *not* the wide expertise
> necessary for the maintenance of a full-fledged math library).
> 
> I suggest "ML" for the name of the component.
> 
> Regards,
> Gilles
> 
> [1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1563
> [2] https://markmail.org/message/dnujdcxuaq5bwuwe
> [3] https://en.wikipedia.org/wiki/Machine_learning
> [4] https://markmail.org/message/75vuyhzblfadc5op
> [5] http://commons.apache.org/proper/commons-math/
> [6] http://commons.apache.org/proper/commons-rng/
> [7] http://commons.apache.org/proper/commons-numbers/
> [8] http://commons.apache.org/proper/commons-geometry/
> [9] http://commons.apache.org/proper/commons-statistics/
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-10 Thread Emmanuel Bourg

-1 for commons-ml for the same reasons.

What about commons-machine-learning or commons-math-learning? The latter 
is as long as commons-configuration.


Emmanuel Bourg


Le 2021-02-10 03:27, Ralph Goers a écrit :

-1 on commons-ml as the name. My first thought is such a repo would
hold stuff related to mailing lists. Then again maybe it contains
stuff relating to markup languages. Maybe it is Apache’s version of
the ML Programming Language [1].

However, I wouldn’t be -1 on commons-math-ml, although at best I would
be +0 since it is still not obvious what it would contain.

Ralph


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-10 Thread sebb
Likewise, commons-ml is too cryptic.

Also, the Spark project has a machine-learning library:

https://spark.apache.org/mllib/

Maybe that would be better home?

I'm also a bit concerned as to whether there are sufficient developers
here with knowledge of the ML domain to be able to support the code in
the future.

On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg  wrote:
>
> -1 for commons-ml for the same reasons.
>
> What about commons-machine-learning or commons-math-learning? The latter
> is as long as commons-configuration.
>
> Emmanuel Bourg
>
>
> Le 2021-02-10 03:27, Ralph Goers a écrit :
> > -1 on commons-ml as the name. My first thought is such a repo would
> > hold stuff related to mailing lists. Then again maybe it contains
> > stuff relating to markup languages. Maybe it is Apache’s version of
> > the ML Programming Language [1].
> >
> > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > be +0 since it is still not obvious what it would contain.
> >
> > Ralph
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-10 Thread Gilles Sadowski
Le mer. 10 févr. 2021 à 09:27, Emmanuel Bourg  a écrit :
>
> -1 for commons-ml for the same reasons.
>
> What about commons-machine-learning or commons-math-learning? The latter
> is as long as commons-configuration.

Java users should be used to lengthy names.
It should thus be "commons-machinelearning" as hyphens, by convention,
separate items that become sub-packages in the Java code.

>
> Emmanuel Bourg
>
>
> Le 2021-02-10 03:27, Ralph Goers a écrit :
> > -1 on commons-ml as the name. My first thought is such a repo would
> > hold stuff related to mailing lists. Then again maybe it contains
> > stuff relating to markup languages. Maybe it is Apache’s version of
> > the ML Programming Language [1].

Strange rationale.  As if someone would not read the full name of a
libary before deciding whether it provides what he needs...

> >
> > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > be +0 since it is still not obvious what it would contain.

As explained, this is not a useful or descriptive name: ML is not part
of what mathematicians would consider a part of mathematics.
ML is an area of computer science, inspired by biological processes.

Gilles

> >
> > Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-10 Thread Gilles Sadowski
Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
>
> Likewise, commons-ml is too cryptic.
>
> Also, the Spark project has a machine-learning library:
>
> https://spark.apache.org/mllib/

Thanks for the pointer.

>
> Maybe that would be better home?

On the face of it, probably.
[For sure, Avijit should comment on the suggestion.]

On the other hand, "Commons" is the place where one can pick "bare
bone" implementations, and add the functionality to one's application
without necessarily comply with an overarching framework.
[I don't mean that framework compliance is bad; quite the contrary, it is
hopefully the result of a thorough reflection by experts.  But ... cf. the
numerous "no-dependency" discussions ...]

Actually, concerning Avijit's proposed contribution, didn't I say:[1]
---CUT---
Thus, I think that we must assess whether the "genetic algorithms"
functionality has a reasonable future within "Apache Commons" (i.e.
potential users and contributors) while there exist other libraries that
seem much more advanced for any serious usage.
---CUT---

> I'm also a bit concerned as to whether there are sufficient developers
> here with knowledge of the ML domain to be able to support the code in
> the future.

An interesting point; by all means not a new one (see e.g. [2]).

Isn't it the same point I've been making about "Commons Math" (CM)?
There has been no releases because nobody here is able (or is willing
to) support it.

Concerning the support of the purported "machinelearning" component:
1. Package
org.apache.commons.math4.ml.neuralnet
* I've written it entirely and I have applications that depend on it (and I
  cannot assume that I could easily switch to, or port it to, Spark), so I
  can reasonably ensure that it would be supported.
2. Package
org.apache.commons.math4.ml.clustering
* Functionality is mentioned in Spark's "mllib" user guide.
* When a new feature was last contributed[3], it was noticed[4][5][6]
  that improvement were needed (but there was no follow-up).
* I've an application that depend on it (from CM v3.6.1) but I wouldn't
  support it if shipped in CM v4.0.
3. Package
org.apache.commons.math4.genetics
* Part of my "end-of-study" project consisted in a GA implementation.
  I've never used the CM implementation, and I don't deny that there
  could be perfectly fine uses of it but, just looking at the code, it seems
  obvious that it cannot compete feature-wise with other libraries
out there.
* I've suggested long ago that, without anyone supporting it actively (and
  no known user community), it should be dropped from CM.
* Avijit expressed a willingness to improve the functionality:  Is
this enough
  for the PMC to create a new component?  From the experience with the
  "clustering" package mentioned above, I'd tend to think (unfortunately)
  that it isn't.  He should first explore whether the Spark community is
  interested, that the GA functionality be moved over there.

Gilles

[1] https://issues.apache.org/jira/browse/MATH-1563
[2] https://markmail.org/message/26yxj5vhysdsoety
[3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
[4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
[5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
[6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526

>
> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg  wrote:
> >
> > -1 for commons-ml for the same reasons.
> >
> > What about commons-machine-learning or commons-math-learning? The latter
> > is as long as commons-configuration.
> >
> > Emmanuel Bourg
> >
> >
> > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > -1 on commons-ml as the name. My first thought is such a repo would
> > > hold stuff related to mailing lists. Then again maybe it contains
> > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > the ML Programming Language [1].
> > >
> > > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > > be +0 since it is still not obvious what it would contain.
> > >
> > > Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-14 Thread Avijit Basak
Hi

   I would like to mention a few points here. Genetic Algorithm has a
vast range of applications in optimization and search problems. Machine
learning is only one of those.
   If we couple the new GA library with any specific domain like ml it
would be meaningless for people working in other domains. They have to
incorporate the entire ml library which may be completely unrelated to
their project. Coupling it with any technology like spark might also limit
it's usability.
   If a separate component is not approved for this change then we can
incorporate the changes as part of *commons.math* library.
   The same library can be reused in ml or neural network libraries as
a dependency.
   Kindly share further views on this.

Thanks & Regards
--Avijit Basak

On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski  wrote:

> Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
> >
> > Likewise, commons-ml is too cryptic.
> >
> > Also, the Spark project has a machine-learning library:
> >
> > https://spark.apache.org/mllib/
>
> Thanks for the pointer.
>
> >
> > Maybe that would be better home?
>
> On the face of it, probably.
> [For sure, Avijit should comment on the suggestion.]
>
> On the other hand, "Commons" is the place where one can pick "bare
> bone" implementations, and add the functionality to one's application
> without necessarily comply with an overarching framework.
> [I don't mean that framework compliance is bad; quite the contrary, it is
> hopefully the result of a thorough reflection by experts.  But ... cf. the
> numerous "no-dependency" discussions ...]
>
> Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> ---CUT---
> Thus, I think that we must assess whether the "genetic algorithms"
> functionality has a reasonable future within "Apache Commons" (i.e.
> potential users and contributors) while there exist other libraries that
> seem much more advanced for any serious usage.
> ---CUT---
>
> > I'm also a bit concerned as to whether there are sufficient developers
> > here with knowledge of the ML domain to be able to support the code in
> > the future.
>
> An interesting point; by all means not a new one (see e.g. [2]).
>
> Isn't it the same point I've been making about "Commons Math" (CM)?
> There has been no releases because nobody here is able (or is willing
> to) support it.
>
> Concerning the support of the purported "machinelearning" component:
> 1. Package
> org.apache.commons.math4.ml.neuralnet
> * I've written it entirely and I have applications that depend on it
> (and I
>   cannot assume that I could easily switch to, or port it to, Spark),
> so I
>   can reasonably ensure that it would be supported.
> 2. Package
> org.apache.commons.math4.ml.clustering
> * Functionality is mentioned in Spark's "mllib" user guide.
> * When a new feature was last contributed[3], it was noticed[4][5][6]
>   that improvement were needed (but there was no follow-up).
> * I've an application that depend on it (from CM v3.6.1) but I wouldn't
>   support it if shipped in CM v4.0.
> 3. Package
> org.apache.commons.math4.genetics
> * Part of my "end-of-study" project consisted in a GA implementation.
>   I've never used the CM implementation, and I don't deny that there
>   could be perfectly fine uses of it but, just looking at the code, it
> seems
>   obvious that it cannot compete feature-wise with other libraries
> out there.
> * I've suggested long ago that, without anyone supporting it actively
> (and
>   no known user community), it should be dropped from CM.
> * Avijit expressed a willingness to improve the functionality:  Is
> this enough
>   for the PMC to create a new component?  From the experience with the
>   "clustering" package mentioned above, I'd tend to think
> (unfortunately)
>   that it isn't.  He should first explore whether the Spark community
> is
>   interested, that the GA functionality be moved over there.
>
> Gilles
>
> [1] https://issues.apache.org/jira/browse/MATH-1563
> [2] https://markmail.org/message/26yxj5vhysdsoety
> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
>
> >
> > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg  wrote:
> > >
> > > -1 for commons-ml for the same reasons.
> > >
> > > What about commons-machine-learning or commons-math-learning? The
> latter
> > > is as long as commons-configuration.
> > >
> > > Emmanuel Bourg
> > >
> > >
> > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > -1 on commons-ml as the name. My first thought is such a repo would
> > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > > the ML Pro

Re: [Vote] Create a "machine learning" component

2021-02-14 Thread Gilles Sadowski
Le dim. 14 févr. 2021 à 09:06, Avijit Basak  a écrit :
>
> Hi
>
>I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains.

Isn't "meaningless" a slight overstatement?
We might have an issue of terminology: There is no necessary "coupling"
but maybe "acquaintance" (for lack of a better word), as a set of tools that
might come in handy for solving certain types of problems.  [For example,
the Traveling Salesman Problem can be tackled by GA and SOFM, both
of which are candidate for inclusion in the new component, although they
don't share any code.]

If the name "machine learning" is not the most appropriate one to convey
the intended scope, do you have another idea?
["AI" would perhaps be more correct if we consider a strict hierarchy, but
would obviously be far too presumptuous.]

> They have to
> incorporate the entire ml library

No, they won't.  Given the stated goal of "modularity": the "ga" module
will be available as a dedicated JAR (possibly with a dependency to
codes that can be reused in other modules provided by the component).

> which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.

You may be right; I have no idea about the "restrictions" imposed by
Spark.  [It seems that in this case, one would have to indeed depend
on Spark's "mllib" (?).  This would be one reason, as I already stated,
for having something in "Commons".]

Could you elaborate on a concrete use-case where one would be
starting to develop an application with the specific requirement that
Spark could not be used?
In particular, IIRC Spark has multi-threading built in.  Don't you see
it as a huge problem that CM would not provide such a feature?

>If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.

Of course, if somebody wants to do that, he's welcome.
[That will not be me, for all the reasons which I've explained.  In the last
5 years I've been pretty much alone in handling bug reports about CM;
I'm unwilling to assume implicit support for even more codes.]

Also, with this solution, you'd now be willing to accept what you weren't
above: Anyone wanting to use the GA functionality would indeed have to
"incorporate" the whole of "Commons Math" (CM).
Of course, the latter could be modularized, but this will only mitigate the
issue, as any release of the GA functionality will potentially be then held
off by potential issues in other parts of CM (which nobody has been able
to consistently support for more than 5 years now).

>The same library can be reused in ml or neural network libraries as
> a dependency.

It is the other way around:  The development version of CM currently
depends on "lower-level" components.
Furthermore, right now its (embryonic) "machine learning" functionality
hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
package.

>Kindly share further views on this.

In summary, to be clarified:
 (1) Why not Spark?  [At least post over there (?).]
 (2) Further develop a monolithic CM?  [Who will do it?]
 (3) Modularize CM? [Who will do it?]
 (4) New component (with another name) with the proposed contents?

To make things clear from my side:  As a *user*, I've currently some
stake at having a clean, independent "ml" component or an independent
"sofm" module.  So I could do (4).  Or help with (3), on the condition that
*other* people get things moving.

Regards,
Gilles

>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski  wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a thorough reflection by experts.  But ... cf. the
> > numerous "no-dependency" discussions ...]
> >
> > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > ---CUT---
> > Thus, I think that we must assess whether the "genetic algorithms"
> > functionality has a reasonable future within "Apache Commons" (i.e.
> > potential users and contributors) while there exist other libraries that
> > seem m

Re: [Vote] Create a "machine learning" component

2021-04-12 Thread Avijit Basak
Hi

 Sorry for the delayed response. Thanks for your patience. Please
find my comments below:

 (1) Why not Spark?  [At least post over there (?).]
  --We can move to Spark. But it will be very much useful if the things
can also run without Spark. The use of Spark would make more sense in a
production environment. But the portability of the library will be more
useful for the non-prod environment. Definitely, we can reach the Spark
team and query.
 (2) Further develop a monolithic CM?  [Who will do it?]
   --I can help with the upgrade of the existing library related to GA
functionality.
 (3) Modularize CM? [Who will do it?]
   --I can help with the upgrade of the existing library related to GA
functionality.
 (4) New component (with another name) with the proposed contents?
   --This is the best option if permitted.

  The code which I have written can be reused with minor modifications.
So it won't take too much effort for this activity.
  Kindly share further thoughts.

Thanks & Regards
--Avijit Basak


On Sun, 14 Feb 2021 at 19:56, Gilles Sadowski  wrote:

> Le dim. 14 févr. 2021 à 09:06, Avijit Basak  a
> écrit :
> >
> > Hi
> >
> >I would like to mention a few points here. Genetic Algorithm has a
> > vast range of applications in optimization and search problems. Machine
> > learning is only one of those.
> >If we couple the new GA library with any specific domain like ml
> it
> > would be meaningless for people working in other domains.
>
> Isn't "meaningless" a slight overstatement?
> We might have an issue of terminology: There is no necessary "coupling"
> but maybe "acquaintance" (for lack of a better word), as a set of tools
> that
> might come in handy for solving certain types of problems.  [For example,
> the Traveling Salesman Problem can be tackled by GA and SOFM, both
> of which are candidate for inclusion in the new component, although they
> don't share any code.]
>
> If the name "machine learning" is not the most appropriate one to convey
> the intended scope, do you have another idea?
> ["AI" would perhaps be more correct if we consider a strict hierarchy, but
> would obviously be far too presumptuous.]
>
> > They have to
> > incorporate the entire ml library
>
> No, they won't.  Given the stated goal of "modularity": the "ga" module
> will be available as a dedicated JAR (possibly with a dependency to
> codes that can be reused in other modules provided by the component).
>
> > which may be completely unrelated to
> > their project. Coupling it with any technology like spark might also
> limit
> > it's usability.
>
> You may be right; I have no idea about the "restrictions" imposed by
> Spark.  [It seems that in this case, one would have to indeed depend
> on Spark's "mllib" (?).  This would be one reason, as I already stated,
> for having something in "Commons".]
>
> Could you elaborate on a concrete use-case where one would be
> starting to develop an application with the specific requirement that
> Spark could not be used?
> In particular, IIRC Spark has multi-threading built in.  Don't you see
> it as a huge problem that CM would not provide such a feature?
>
> >If a separate component is not approved for this change then we
> can
> > incorporate the changes as part of *commons.math* library.
>
> Of course, if somebody wants to do that, he's welcome.
> [That will not be me, for all the reasons which I've explained.  In the
> last
> 5 years I've been pretty much alone in handling bug reports about CM;
> I'm unwilling to assume implicit support for even more codes.]
>
> Also, with this solution, you'd now be willing to accept what you weren't
> above: Anyone wanting to use the GA functionality would indeed have to
> "incorporate" the whole of "Commons Math" (CM).
> Of course, the latter could be modularized, but this will only mitigate the
> issue, as any release of the GA functionality will potentially be then held
> off by potential issues in other parts of CM (which nobody has been able
> to consistently support for more than 5 years now).
>
> >The same library can be reused in ml or neural network libraries
> as
> > a dependency.
>
> It is the other way around:  The development version of CM currently
> depends on "lower-level" components.
> Furthermore, right now its (embryonic) "machine learning" functionality
> hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
> package.
>
> >Kindly share further views on this.
>
> In summary, to be clarified:
>  (1) Why not Spark?  [At least post over there (?).]
>  (2) Further develop a monolithic CM?  [Who will do it?]
>  (3) Modularize CM? [Who will do it?]
>  (4) New component (with another name) with the proposed contents?
>
> To make things clear from my side:  As a *user*, I've currently some
> stake at having a clean, independent "ml" component or an independent
> "sofm" module.  So I could do (4).  Or help with (3), on the condition that
> *

Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Hello.

Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a écrit :
>
> Hi
>
>  Sorry for the delayed response. Thanks for your patience. Please
> find my comments below:
>
>  (1) Why not Spark?  [At least post over there (?).]
>   --We can move to Spark. But it will be very much useful if the things
> can also run without Spark. The use of Spark would make more sense in a
> production environment. But the portability of the library will be more
> useful for the non-prod environment.

I don't follow the distinction "prod" vs "non-prod".

> Definitely, we can reach the Spark
> team and query.

That would be a good idea...

>  (2) Further develop a monolithic CM?  [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

Sure, but nobody is currently working on (2).

>  (3) Modularize CM? [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

I don't doubt it; but the question was actually whether you are willing
to modularize CM (that is: in addition to, and before, contributing to
the GA functionality).

>  (4) New component (with another name) with the proposed contents?
>--This is the best option if permitted.

Currently, only the two of us are in favour of this alternative.

Nobody, by their action, is really in favour of any of the other alternatives.
So, as a way forward, I would suggest that you create a project on GitHub
(copying all the settings from a Commons modular component, such as
"Commons Numbers"), to be eventually integrated here, once its potential
has been demonstrated.

>   The code which I have written can be reused with minor modifications.
> So it won't take too much effort for this activity.

You did not expand about the usability/performance (e.g. the issue of
multi-threading)...

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Avijit Basak
Hi

  Please find my comments below.

>> I don't follow the distinction "prod" vs "non-prod".
 -- Actually in Prod we really need a very high performing system. So
use of implicit parallelism in spark would help us to achieve it. But for
other types of work like POC or R&D we may not need such performance.
>> the question was actually whether you are willing to modularize CM
 -- I am not much aware of other ml components in commons. I would look
into it.
>>You did not expand about the usability/performance (e.g. the issue of
multi-threading)
 -- Are we planning to incorporate parallel GA. Then multi-threading
would be a more appropriate option.
>> So, as a way forward, I would suggest that you create a project on
GitHub (copying all the settings from a *Commons modular* component, such as
"Commons Numbers")
 -- Could you kindly share the GitHub repository URL for any Commons
modular component.

Thanks & Regards
--Avijit Basak


On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> écrit :
> >
> > Hi
> >
> >  Sorry for the delayed response. Thanks for your patience. Please
> > find my comments below:
> >
> >  (1) Why not Spark?  [At least post over there (?).]
> >   --We can move to Spark. But it will be very much useful if the
> things
> > can also run without Spark. The use of Spark would make more sense in a
> > production environment. But the portability of the library will be more
> > useful for the non-prod environment.
>
> I don't follow the distinction "prod" vs "non-prod".
>
> > Definitely, we can reach the Spark
> > team and query.
>
> That would be a good idea...
>
> >  (2) Further develop a monolithic CM?  [Who will do it?]
> >--I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> Sure, but nobody is currently working on (2).
>
> >  (3) Modularize CM? [Who will do it?]
> >--I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> I don't doubt it; but the question was actually whether you are willing
> to modularize CM (that is: in addition to, and before, contributing to
> the GA functionality).
>
> >  (4) New component (with another name) with the proposed contents?
> >--This is the best option if permitted.
>
> Currently, only the two of us are in favour of this alternative.
>
> Nobody, by their action, is really in favour of any of the other
> alternatives.
> So, as a way forward, I would suggest that you create a project on GitHub
> (copying all the settings from a Commons modular component, such as
> "Commons Numbers"), to be eventually integrated here, once its potential
> has been demonstrated.
>
> >   The code which I have written can be reused with minor
> modifications.
> > So it won't take too much effort for this activity.
>
> You did not expand about the usability/performance (e.g. the issue of
> multi-threading)...
>
> Regards,
> Gilles
>
> >> [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Le mar. 13 avr. 2021 à 18:21, Avijit Basak  a écrit :
>
> Hi
>
>   Please find my comments below.
>
> >> I don't follow the distinction "prod" vs "non-prod".
>  -- Actually in Prod we really need a very high performing system. So
> use of implicit parallelism in spark would help us to achieve it. But for
> other types of work like POC or R&D we may not need such performance.

Isn't a GA inherently parallel?
If so, why not take advantage of the concurrency tools provided by the JDK?

> >> the question was actually whether you are willing to modularize CM
>  -- I am not much aware of other ml components in commons. I would look
> into it.

I've mentioned them in earlier messages:
 * Self-organizing feature map (artificial neural net)
 * Clustering

The former is multi-threaded; the latter should be refactored to
take advantage of multi-threading.

> >>You did not expand about the usability/performance (e.g. the issue of
> multi-threading)
>  -- Are we planning to incorporate parallel GA.

Aren't you?

> Then multi-threading
> would be a more appropriate option.

IMHO, a necessary one.

> >> So, as a way forward, I would suggest that you create a project on
> GitHub (copying all the settings from a *Commons modular* component, such as
> "Commons Numbers")
>  -- Could you kindly share the GitHub repository URL for any Commons
> modular component.

https://github.com/apache/commons-rng
https://github.com/apache/commons-numbers
https://github.com/apache/commons-geometry
https://github.com/apache/commons-statistics

>
> Thanks & Regards
> --Avijit Basak
>
>
> On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> > écrit :
> > >
> > > Hi
> > >
> > >  Sorry for the delayed response. Thanks for your patience. Please
> > > find my comments below:
> > >
> > >  (1) Why not Spark?  [At least post over there (?).]
> > >   --We can move to Spark. But it will be very much useful if the
> > things
> > > can also run without Spark. The use of Spark would make more sense in a
> > > production environment. But the portability of the library will be more
> > > useful for the non-prod environment.
> >
> > I don't follow the distinction "prod" vs "non-prod".
> >
> > > Definitely, we can reach the Spark
> > > team and query.
> >
> > That would be a good idea...
> >
> > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > Sure, but nobody is currently working on (2).
> >
> > >  (3) Modularize CM? [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > I don't doubt it; but the question was actually whether you are willing
> > to modularize CM (that is: in addition to, and before, contributing to
> > the GA functionality).
> >
> > >  (4) New component (with another name) with the proposed contents?
> > >--This is the best option if permitted.
> >
> > Currently, only the two of us are in favour of this alternative.
> >
> > Nobody, by their action, is really in favour of any of the other
> > alternatives.
> > So, as a way forward, I would suggest that you create a project on GitHub
> > (copying all the settings from a Commons modular component, such as
> > "Commons Numbers"), to be eventually integrated here, once its potential
> > has been demonstrated.
> >
> > >   The code which I have written can be reused with minor
> > modifications.
> > > So it won't take too much effort for this activity.
> >
> > You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)...
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-18 Thread Avijit Basak
Hi

>Isn't a GA inherently parallel?
>If so, why not take advantage of the concurrency tools provided by the JDK?
  -- Are we planning to implement multi-threading for GA operations even as
part of a single population or only for multi-population parallel GA.
  -- We can implement different types of co-evolution as part of parallel
GA. Need to decide on the corresponding strategies we are going to
incorporate.

Thanks & Regards
--Avijit Basak

On Wed, 14 Apr 2021 at 05:53, Gilles Sadowski  wrote:

> Le mar. 13 avr. 2021 à 18:21, Avijit Basak  a
> écrit :
> >
> > Hi
> >
> >   Please find my comments below.
> >
> > >> I don't follow the distinction "prod" vs "non-prod".
> >  -- Actually in Prod we really need a very high performing system. So
> > use of implicit parallelism in spark would help us to achieve it. But for
> > other types of work like POC or R&D we may not need such performance.
>
> Isn't a GA inherently parallel?
> If so, why not take advantage of the concurrency tools provided by the JDK?
>
> > >> the question was actually whether you are willing to modularize CM
> >  -- I am not much aware of other ml components in commons. I would
> look
> > into it.
>
> I've mentioned them in earlier messages:
>  * Self-organizing feature map (artificial neural net)
>  * Clustering
>
> The former is multi-threaded; the latter should be refactored to
> take advantage of multi-threading.
>
> > >>You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)
> >  -- Are we planning to incorporate parallel GA.
>
> Aren't you?
>
> > Then multi-threading
> > would be a more appropriate option.
>
> IMHO, a necessary one.
>
> > >> So, as a way forward, I would suggest that you create a project on
> > GitHub (copying all the settings from a *Commons modular* component,
> such as
> > "Commons Numbers")
> >  -- Could you kindly share the GitHub repository URL for any Commons
> > modular component.
>
> https://github.com/apache/commons-rng
> https://github.com/apache/commons-numbers
> https://github.com/apache/commons-geometry
> https://github.com/apache/commons-statistics
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> >
> > On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski 
> wrote:
> >
> > > Hello.
> > >
> > > Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> > > écrit :
> > > >
> > > > Hi
> > > >
> > > >  Sorry for the delayed response. Thanks for your patience.
> Please
> > > > find my comments below:
> > > >
> > > >  (1) Why not Spark?  [At least post over there (?).]
> > > >   --We can move to Spark. But it will be very much useful if the
> > > things
> > > > can also run without Spark. The use of Spark would make more sense
> in a
> > > > production environment. But the portability of the library will be
> more
> > > > useful for the non-prod environment.
> > >
> > > I don't follow the distinction "prod" vs "non-prod".
> > >
> > > > Definitely, we can reach the Spark
> > > > team and query.
> > >
> > > That would be a good idea...
> > >
> > > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > > >--I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > Sure, but nobody is currently working on (2).
> > >
> > > >  (3) Modularize CM? [Who will do it?]
> > > >--I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > I don't doubt it; but the question was actually whether you are willing
> > > to modularize CM (that is: in addition to, and before, contributing to
> > > the GA functionality).
> > >
> > > >  (4) New component (with another name) with the proposed contents?
> > > >--This is the best option if permitted.
> > >
> > > Currently, only the two of us are in favour of this alternative.
> > >
> > > Nobody, by their action, is really in favour of any of the other
> > > alternatives.
> > > So, as a way forward, I would suggest that you create a project on
> GitHub
> > > (copying all the settings from a Commons modular component, such as
> > > "Commons Numbers"), to be eventually integrated here, once its
> potential
> > > has been demonstrated.
> > >
> > > >   The code which I have written can be reused with minor
> > > modifications.
> > > > So it won't take too much effort for this activity.
> > >
> > > You did not expand about the usability/performance (e.g. the issue of
> > > multi-threading)...
> > >
> > > Regards,
> > > Gilles
> > >
> > > >> [...]
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Vote] Create a "machine learning" component

2021-04-19 Thread Gilles Sadowski
Hello.

Le lun. 19 avr. 2021 à 08:35, Avijit Basak  a écrit :
>
> Hi
>
> >Isn't a GA inherently parallel?
> >If so, why not take advantage of the concurrency tools provided by the JDK?
>   -- Are we planning to implement multi-threading for GA operations even as
> part of a single population

This seems an obvious improvement to our current implementation
(in case a chromosome's evaluation is not population-dependent).

> or only for multi-population parallel GA.
>   -- We can implement different types of co-evolution as part of parallel
> GA. Need to decide on the corresponding strategies we are going to
> incorporate.

The discussion is still about the "administrative" question of whether
any of this should be implemented in the "Commons" project...

Did you ask "Spark" people about their opinion about it?

As I said, if you are confident that you can bring our implementation to
a state where it can be used in real-life (performance-wise) applications,
then you should demonstrate it (in order to convince other people from
the Commons PMC that it is worth engaging in long-term maintenance).
AFAICT, a way to do it would be to create a GitHub project (aimed at
becoming a new "machine learning" component, or a maven/JPMS
module within Commons Math).

Best regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Avijit Basak
Hi

  > Did you ask "Spark" people about their opinion about it?
-- Not yet. I am not sure what would be the right option for
this communication. It will be good if you can approach them.
  > where it can be used in real-life (performance-wise)
applications, then you should demonstrate it
-- Do we have any kind of performance benchmark or use case
regarding this? Once that is decided, then I can proceed with this.


Thanks & Regards
--Avijit Basak

On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 19 avr. 2021 à 08:35, Avijit Basak  a
> écrit :
> >
> > Hi
> >
> > >Isn't a GA inherently parallel?
> > >If so, why not take advantage of the concurrency tools provided by the
> JDK?
> >   -- Are we planning to implement multi-threading for GA operations even
> as
> > part of a single population
>
> This seems an obvious improvement to our current implementation
> (in case a chromosome's evaluation is not population-dependent).
>
> > or only for multi-population parallel GA.
> >   -- We can implement different types of co-evolution as part of parallel
> > GA. Need to decide on the corresponding strategies we are going to
> > incorporate.
>
> The discussion is still about the "administrative" question of whether
> any of this should be implemented in the "Commons" project...
>
> Did you ask "Spark" people about their opinion about it?
>
> As I said, if you are confident that you can bring our implementation to
> a state where it can be used in real-life (performance-wise) applications,
> then you should demonstrate it (in order to convince other people from
> the Commons PMC that it is worth engaging in long-term maintenance).
> AFAICT, a way to do it would be to create a GitHub project (aimed at
> becoming a new "machine learning" component, or a maven/JPMS
> module within Commons Math).
>
> Best regards,
> Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Gilles Sadowski
Le mar. 20 avr. 2021 à 16:09, Avijit Basak  a écrit :
>
> Hi
>
>   > Did you ask "Spark" people about their opinion about it?
> -- Not yet. I am not sure what would be the right option for
> this communication. It will be good if you can approach them.

You are the one who proposes a functionality that might be of interest
to the "Spark" project, perhaps on some condition on their part which
*you* are going to have to accept (or not).

In other words: It would be useless that *I* go and tell them there exist
some code in Commons Math which they could take an adapt for their
project (they can always do that).
What might be of value to them (as to the Commons project, too), is a
contributor willing to do the necessary work to create or improve a
community-supported feature.

>   > where it can be used in real-life (performance-wise)
> applications, then you should demonstrate it
> -- Do we have any kind of performance benchmark or use case
> regarding this?

Please assume that *you* are the person with the most GA expertise
in this forum.
There certainly are unit tests for the GA functionality, but I don't think
there are benchmarks; certainly, one task would be to set up a module
for (JMH-based) experimentation.

> Once that is decided,

One mantra of ASF communities is that "those who do the work get
to decide".
[The PMC can decide (by vote) whether to accept a new component;
but it's up to you to show that it's worth it (with the risk that the PMC
won't accurately judge the contribution, unfortunately)...]

> then I can proceed with this.

There is already a long list of things that can be done.

You don't *have* to contact "Spark" if you don't feel that it's the
right project for your work.  You could just hope for the best, and
start somewhere else (modularization of Commons Math, a fork
on GitHub of of CM ML-related codes, and so on).

The one thing which I won't be helping with is merging ad-hoc
GA-related changes into the current CM codebase.
This doesn't preclude that other committers might want to do that
for you; however judging by the last 5 years, I wouldn't count too
much on it. ;-)

Regards,
Gilles

>
>
> Thanks & Regards
> --Avijit Basak
>
> On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le lun. 19 avr. 2021 à 08:35, Avijit Basak  a
> > écrit :
> > >
> > > Hi
> > >
> > > >Isn't a GA inherently parallel?
> > > >If so, why not take advantage of the concurrency tools provided by the
> > JDK?
> > >   -- Are we planning to implement multi-threading for GA operations even
> > as
> > > part of a single population
> >
> > This seems an obvious improvement to our current implementation
> > (in case a chromosome's evaluation is not population-dependent).
> >
> > > or only for multi-population parallel GA.
> > >   -- We can implement different types of co-evolution as part of parallel
> > > GA. Need to decide on the corresponding strategies we are going to
> > > incorporate.
> >
> > The discussion is still about the "administrative" question of whether
> > any of this should be implemented in the "Commons" project...
> >
> > Did you ask "Spark" people about their opinion about it?
> >
> > As I said, if you are confident that you can bring our implementation to
> > a state where it can be used in real-life (performance-wise) applications,
> > then you should demonstrate it (in order to convince other people from
> > the Commons PMC that it is worth engaging in long-term maintenance).
> > AFAICT, a way to do it would be to create a GitHub project (aimed at
> > becoming a new "machine learning" component, or a maven/JPMS
> > module within Commons Math).
> >
> > Best regards,
> > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Paul King
Hi Avijit Basak,

+1 to thanking you for your offer. Just a couple of comments from
someone who is only a marginal contributor to the commons project.

I would be keen to see a new commons component incorporating various
machine learning/data science components. The other main contenders
that seem to be reasonably actively developed are Smile[1] and Weka[2]
which are licensed under GPL or LGPL. Such a component would be a
natural fit for the algorithm you propose. If you look at Apache
Spark[3] and Apache Ignite[4], they both offer some "machine learning"
offerings but they tend to only support algorithms which are either
"embarrassingly" parallel or inherently parallel. They tend not to
include sequential by nature algorithms. Even "embarrassingly"
parallel algorithms are often not included since they can typically
already be used already by Spark, Ignite, Beam, Wayang, or home-grown
threads/fibres.

There has been previous research into PGA with Hadoop, Spark and
Ignite[5][6] but so far, none of that has made it into those
distributions as far as I know. I don't know how customisable the
Ignite GA algorithm[7] is but it might be worth looking into.

With respect to component naming, you either go very broad with "math"
or something like "datascience", or potentially too narrow with
something like "ml" or "machinelearning". Of the latter two, "ml" is
most common when bundled into some other framework. The other
alternative is to simply come up with another name but the typical
convention within commons is to use a descriptive to purpose name.
Numerous "ml" libraries also bundle things like regression into them,
so there is precedence for such libraries to be algorithms broadly in
the topic space. On the commons math front, I think regression is
currently earmarked for statistics but not sure it has made the jump
as of yet. An "ml" home would be equally suitable in my mind.

Having said all of that, as others have pointed out, the volunteer
space in commons is somewhat lean at the moment. I would be happy to
help a little from the ASF side of things but machine learning/data
science isn't my principal area of expertise nor a major aspect in my
"day job" activities, it probably takes others with interest to fully
give this the effort it deserves. But sometimes someone has to get the
ball rolling before other interested parties show up.

Cheers, Paul

[1] https://haifengl.github.io/
[2] https://www.cs.waikato.ac.nz/ml/weka/
[3] https://spark.apache.org/mllib/
[4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning
[5] https://hajirajabeen.github.io/publications/SGA.pdf
[6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite
[7] 
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html

On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak  wrote:
>
> Hi
>
>I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains. They have to
> incorporate the entire ml library which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.
>If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.
>The same library can be reused in ml or neural network libraries as
> a dependency.
>Kindly share further views on this.
>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski  wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a thorough reflection by experts.  But ... cf. the
> > numerous "no-dependency" discussions ...]
> >
> > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > ---CUT---
> > Thus, I think that we must assess whether the "genetic algorithms"
> > functionality has a reasonable future within "Apache Commons" (i.e.
> > potential users and contributors) while there exist other libraries that
> > seem much more advanced for any serious usage.
> > ---CUT---
> >
> > > I'm also a bit concerned as to whether there are sufficient

Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Ralph Goers
Why are y’all having a long discussion on Vote thread?

Ralph

> On Apr 20, 2021, at 10:33 PM, Paul King  wrote:
> 
> Hi Avijit Basak,
> 
> +1 to thanking you for your offer. Just a couple of comments from
> someone who is only a marginal contributor to the commons project.
> 
> I would be keen to see a new commons component incorporating various
> machine learning/data science components. The other main contenders
> that seem to be reasonably actively developed are Smile[1] and Weka[2]
> which are licensed under GPL or LGPL. Such a component would be a
> natural fit for the algorithm you propose. If you look at Apache
> Spark[3] and Apache Ignite[4], they both offer some "machine learning"
> offerings but they tend to only support algorithms which are either
> "embarrassingly" parallel or inherently parallel. They tend not to
> include sequential by nature algorithms. Even "embarrassingly"
> parallel algorithms are often not included since they can typically
> already be used already by Spark, Ignite, Beam, Wayang, or home-grown
> threads/fibres.
> 
> There has been previous research into PGA with Hadoop, Spark and
> Ignite[5][6] but so far, none of that has made it into those
> distributions as far as I know. I don't know how customisable the
> Ignite GA algorithm[7] is but it might be worth looking into.
> 
> With respect to component naming, you either go very broad with "math"
> or something like "datascience", or potentially too narrow with
> something like "ml" or "machinelearning". Of the latter two, "ml" is
> most common when bundled into some other framework. The other
> alternative is to simply come up with another name but the typical
> convention within commons is to use a descriptive to purpose name.
> Numerous "ml" libraries also bundle things like regression into them,
> so there is precedence for such libraries to be algorithms broadly in
> the topic space. On the commons math front, I think regression is
> currently earmarked for statistics but not sure it has made the jump
> as of yet. An "ml" home would be equally suitable in my mind.
> 
> Having said all of that, as others have pointed out, the volunteer
> space in commons is somewhat lean at the moment. I would be happy to
> help a little from the ASF side of things but machine learning/data
> science isn't my principal area of expertise nor a major aspect in my
> "day job" activities, it probably takes others with interest to fully
> give this the effort it deserves. But sometimes someone has to get the
> ball rolling before other interested parties show up.
> 
> Cheers, Paul
> 
> [1] https://haifengl.github.io/ 
> [2] https://www.cs.waikato.ac.nz/ml/weka/ 
> 
> [3] https://spark.apache.org/mllib/ 
> [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning 
> 
> [5] https://hajirajabeen.github.io/publications/SGA.pdf 
> 
> [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite 
> 
> [7] 
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html
>  
> 
> 
> On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak  > wrote:
>> 
>> Hi
>> 
>>   I would like to mention a few points here. Genetic Algorithm has a
>> vast range of applications in optimization and search problems. Machine
>> learning is only one of those.
>>   If we couple the new GA library with any specific domain like ml it
>> would be meaningless for people working in other domains. They have to
>> incorporate the entire ml library which may be completely unrelated to
>> their project. Coupling it with any technology like spark might also limit
>> it's usability.
>>   If a separate component is not approved for this change then we can
>> incorporate the changes as part of *commons.math* library.
>>   The same library can be reused in ml or neural network libraries as
>> a dependency.
>>   Kindly share further views on this.
>> 
>> Thanks & Regards
>> --Avijit Basak
>> 
>> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski > > wrote:
>> 
>>> Le mer. 10 févr. 2021 à 13:19, sebb >> > a écrit :
 
 Likewise, commons-ml is too cryptic.
 
 Also, the Spark project has a machine-learning library:
 
 https://spark.apache.org/mllib/ 
>>> 
>>> Thanks for the pointer.
>>> 
 
 Maybe that would be better home?
>>> 
>>> On the face of it, probably.
>>> [For sure, Avijit should comment on the suggestion.]
>>> 
>>> On the other hand, "Commons" is the place where one

Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Paul King
On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers  wrote:
>
> Why are y’all having a long discussion on Vote thread?

Fair enough. I am +1 (non-binding).

Cheers, Paul.

> > On Apr 20, 2021, at 10:33 PM, Paul King  wrote:
> >
> > Hi Avijit Basak,
> >
> > +1 to thanking you for your offer. Just a couple of comments from
> > someone who is only a marginal contributor to the commons project.
> >
> > I would be keen to see a new commons component incorporating various
> > machine learning/data science components. The other main contenders
> > that seem to be reasonably actively developed are Smile[1] and Weka[2]
> > which are licensed under GPL or LGPL. Such a component would be a
> > natural fit for the algorithm you propose. If you look at Apache
> > Spark[3] and Apache Ignite[4], they both offer some "machine learning"
> > offerings but they tend to only support algorithms which are either
> > "embarrassingly" parallel or inherently parallel. They tend not to
> > include sequential by nature algorithms. Even "embarrassingly"
> > parallel algorithms are often not included since they can typically
> > already be used already by Spark, Ignite, Beam, Wayang, or home-grown
> > threads/fibres.
> >
> > There has been previous research into PGA with Hadoop, Spark and
> > Ignite[5][6] but so far, none of that has made it into those
> > distributions as far as I know. I don't know how customisable the
> > Ignite GA algorithm[7] is but it might be worth looking into.
> >
> > With respect to component naming, you either go very broad with "math"
> > or something like "datascience", or potentially too narrow with
> > something like "ml" or "machinelearning". Of the latter two, "ml" is
> > most common when bundled into some other framework. The other
> > alternative is to simply come up with another name but the typical
> > convention within commons is to use a descriptive to purpose name.
> > Numerous "ml" libraries also bundle things like regression into them,
> > so there is precedence for such libraries to be algorithms broadly in
> > the topic space. On the commons math front, I think regression is
> > currently earmarked for statistics but not sure it has made the jump
> > as of yet. An "ml" home would be equally suitable in my mind.
> >
> > Having said all of that, as others have pointed out, the volunteer
> > space in commons is somewhat lean at the moment. I would be happy to
> > help a little from the ASF side of things but machine learning/data
> > science isn't my principal area of expertise nor a major aspect in my
> > "day job" activities, it probably takes others with interest to fully
> > give this the effort it deserves. But sometimes someone has to get the
> > ball rolling before other interested parties show up.
> >
> > Cheers, Paul
> >
> > [1] https://haifengl.github.io/ 
> > [2] https://www.cs.waikato.ac.nz/ml/weka/ 
> > 
> > [3] https://spark.apache.org/mllib/ 
> > [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning 
> > 
> > [5] https://hajirajabeen.github.io/publications/SGA.pdf 
> > 
> > [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite 
> > 
> > [7] 
> > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html
> >  
> > 
> >
> > On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak  > > wrote:
> >>
> >> Hi
> >>
> >>   I would like to mention a few points here. Genetic Algorithm has a
> >> vast range of applications in optimization and search problems. Machine
> >> learning is only one of those.
> >>   If we couple the new GA library with any specific domain like ml it
> >> would be meaningless for people working in other domains. They have to
> >> incorporate the entire ml library which may be completely unrelated to
> >> their project. Coupling it with any technology like spark might also limit
> >> it's usability.
> >>   If a separate component is not approved for this change then we can
> >> incorporate the changes as part of *commons.math* library.
> >>   The same library can be reused in ml or neural network libraries as
> >> a dependency.
> >>   Kindly share further views on this.
> >>
> >> Thanks & Regards
> >> --Avijit Basak
> >>
> >> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski  >> > wrote:
> >>
> >>> Le mer. 10 févr. 2021 à 13:19, sebb  >>> > a écrit :
> 
>  Likewise, commons-ml is too cryptic.
> 
>  Also, the Spark project has a machine-learning library:
> 
>  https://spark.apache.org/mlli

Re: [Vote] Create a "machine learning" component

2021-04-21 Thread Gilles Sadowski
Le mer. 21 avr. 2021 à 08:56, Paul King  a écrit :
>
> On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers  
> wrote:
> >
> > Why are y’all having a long discussion on Vote thread?

Paul King's comments is interesting information that could
bear on people's decision on the proposal (especially the
licence's issue).
As for the question of whether the purported functionality would
find a better home elsewhere with the ASF, I'm sure what would
be the conclusion (apart from Avijit Bask's plain preference (?) to
develop a standalone component, as per Commons' requirement).

>
> Fair enough. I am +1 (non-binding).

So currently, IIRC the tally (on creating a dedicated component) is
  Gilles Sadowski +1
  Avijit Basak +1
  Paul King +1
And several -1 on the initially suggested name; but the proposed
name has been changed early on to "commons-machinelearning"
(in order to comply with Commons' tradition of full words and
descriptive names).
[Please correct if it doesn't reflect what has been expressed.]

Where does that lead us?

Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-21 Thread Ralph Goers



> On Apr 21, 2021, at 2:25 AM, Gilles Sadowski  wrote:
> 
> Le mer. 21 avr. 2021 à 08:56, Paul King  a écrit :
>> 
>> On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers  
>> wrote:
>>> 
>>> Why are y’all having a long discussion on Vote thread?
> 
> Paul King's comments is interesting information that could
> bear on people's decision on the proposal (especially the
> licence's issue).

The point is that discussions shouldn’t happen on a vote thread. The thread 
should be forked into its own  [DISCUSS][VOTE].

> As for the question of whether the purported functionality would
> find a better home elsewhere with the ASF, I'm sure what would
> be the conclusion (apart from Avijit Bask's plain preference (?) to
> develop a standalone component, as per Commons' requirement).
> 
>> 
>> Fair enough. I am +1 (non-binding).
> 
> So currently, IIRC the tally (on creating a dedicated component) is
>  Gilles Sadowski +1
>  Avijit Basak +1
>  Paul King +1
> And several -1 on the initially suggested name; but the proposed
> name has been changed early on to "commons-machinelearning"
> (in order to comply with Commons' tradition of full words and
> descriptive names).
> [Please correct if it doesn't reflect what has been expressed.]
> 
> Where does that lead us?

With a vote thread that has been open for over 2 months that apparently should 
have been a discussion thread.  I would suggest you cancel this vote and create 
a new Vote thread proposing commons-machinelearning.

Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-23 Thread Paul King
I added some more comments relevant to if the proposed algorithm
belongs somewhere in the commons "math" area back in the Jira:

https://issues.apache.org/jira/browse/MATH-1563

Cheers, Paul.

On Wed, Apr 21, 2021 at 7:26 PM Gilles Sadowski  wrote:
>
> Le mer. 21 avr. 2021 à 08:56, Paul King  a écrit :
> >
> > On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers  
> > wrote:
> > >
> > > Why are y’all having a long discussion on Vote thread?
>
> Paul King's comments is interesting information that could
> bear on people's decision on the proposal (especially the
> licence's issue).
> As for the question of whether the purported functionality would
> find a better home elsewhere with the ASF, I'm sure what would
> be the conclusion (apart from Avijit Bask's plain preference (?) to
> develop a standalone component, as per Commons' requirement).
>
> >
> > Fair enough. I am +1 (non-binding).
>
> So currently, IIRC the tally (on creating a dedicated component) is
>   Gilles Sadowski +1
>   Avijit Basak +1
>   Paul King +1
> And several -1 on the initially suggested name; but the proposed
> name has been changed early on to "commons-machinelearning"
> (in order to comply with Commons' tradition of full words and
> descriptive names).
> [Please correct if it doesn't reflect what has been expressed.]
>
> Where does that lead us?
>
> Regards,
> Gilles
>
> >>> [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-30 Thread Avijit Basak
Hi

 I would like to vote for *commons-ml*.

Thanks & Regards
--Avijit Basak

On Sat, 24 Apr 2021 at 08:12, Paul King  wrote:

> I added some more comments relevant to if the proposed algorithm
> belongs somewhere in the commons "math" area back in the Jira:
>
> https://issues.apache.org/jira/browse/MATH-1563
>
> Cheers, Paul.
>
> On Wed, Apr 21, 2021 at 7:26 PM Gilles Sadowski 
> wrote:
> >
> > Le mer. 21 avr. 2021 à 08:56, Paul King  a
> écrit :
> > >
> > > On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <
> ralph.go...@dslextreme.com> wrote:
> > > >
> > > > Why are y’all having a long discussion on Vote thread?
> >
> > Paul King's comments is interesting information that could
> > bear on people's decision on the proposal (especially the
> > licence's issue).
> > As for the question of whether the purported functionality would
> > find a better home elsewhere with the ASF, I'm sure what would
> > be the conclusion (apart from Avijit Bask's plain preference (?) to
> > develop a standalone component, as per Commons' requirement).
> >
> > >
> > > Fair enough. I am +1 (non-binding).
> >
> > So currently, IIRC the tally (on creating a dedicated component) is
> >   Gilles Sadowski +1
> >   Avijit Basak +1
> >   Paul King +1
> > And several -1 on the initially suggested name; but the proposed
> > name has been changed early on to "commons-machinelearning"
> > (in order to comply with Commons' tradition of full words and
> > descriptive names).
> > [Please correct if it doesn't reflect what has been expressed.]
> >
> > Where does that lead us?
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Vote] Create a "machine learning" component

2021-04-30 Thread Gilles Sadowski
Le ven. 30 avr. 2021 à 18:00, Avijit Basak  a écrit :
>
> Hi
>
>  I would like to vote for *commons-ml*.

Wrong thread (the vote on this one has been cancelled due to being
idle for too long):  The new vote is there:
   https://markmail.org/message/g5gwof3qdkzyvedc

>>>  [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org