On Fri, 16 Jan 2015 10:09:02 +0100, Thomas Neidhart wrote:
On 01/16/2015 01:30 AM, Gilles wrote:
On Thu, 15 Jan 2015 15:41:11 -0700, Phil Steitz wrote:
On 1/15/15 2:24 PM, Thomas Neidhart wrote:
On 01/08/2015 12:34 PM, Gilles wrote:
Hi.

Raising this issue once again.
Are we going to upgrade the requirement for the next major release?

  [ ] Java 5
  [x] Java 6
  [x] Java 7
  [ ] Java 8
  [ ] Java 9

A while ago I thought that it would be cool to switch to Java 7/8 for some of the nice new features (mainly fork/join, lambda expressions and diamond operator, the rest is more or less unimportant for math imho).

But after some thoughts I think they are not really needed for the
following reasons:

* the main focus of math is on developing high-quality, well tested and documented algorithms, the existing language features are more than
enough for this

Sure.
Not so long ago, some people were claiming that nothing beats
programming in "assembly" language.

+1

* coming up with multi-threaded algorithms might be appealing but it is also hard work and I wonder if it really makes sense in the times of
projects like mahout / hadoop / ... which aim for even better
scalability

+1

Hard work / easy work.  Yes and no.  It depends on the motivation
of the contributor. Or we have to (re)define clearly the scope of
CM, and start some serious clean-up.
It's not all black or white; I'm quite convinced that it's better
to handle multi-threading externally when the core computation is
sequential.  But CM already contains algorithms that are inherently
parallel (a.o. genetic algorithms) and improvement in those areas
would undoubtedly benefit from (internal) parallel processing.

I think the better approach is to support external parallelization
rather than trying to do it yourself. From a user POV, I would be scared to use a library that does some kind of parallelization internally which
I can not control.

Some recent examples show how it can be done better: there were some
requests to make some of the statistics related classes map/reducable so
that they can be used in Java 8 parallel streams.

@genetic algorithms: there are far more better libraries out there for this area and the support we have in math is really very simplistic. You can basically do just a few demo examples with it and I am more in favor
to deprecate the package.

I pointed that out quite some time ago, but the deprecation idea was
outwardly rejected. [And further work was done on the package.]
This is IMO a major problem with CM: too many things are kept even
though there are no known users.
No user = no real-world testing = no improvement

My HO is we should focus on getting the best single-threaded
implementations we can and, where possible, setting things up to be
executed in parallel by other engines.  Spawning and managing
threads internal to [math] actually *reduces* the range of
applicability of our stuff.

Examples?

because not everybody wants a library to do parallel stuff internally. Just imagine math being used in a web-application deployed together with
many other applications. It is clearly not an option that one
application might take over most/all of the available processors.

I agree, but this a practical problem.
Is there a inherent impossibility to find a solution?

 Much better to let Hadoop / Mahout et
al parallelize using fast and accurate piece parts that we can
provide.

Do they really do that?
[Or do they implement their own algorithms knowing that they must
be thread-safe (which is something we don't focus a lot on).]

I guess they have mainly their own algorithms, but there are examples of
our stuff being used (using the map/reduce paradigm).

OK. Then, I would conclude that implementing the correct interface(s)
to allow this usage _must_ be among the top (yet unwritten) rules
for new contributions to, and refactoring of, CM.


 If there are parallel algorithms that we are really dying
to implement directly, I would rather see that done in a way that
encapsulates and enables externalization of the thread management.

* staying at Java 6/7 does not block users to use math in a Java 8
environment if wanted

+1 - the examples I have seen thus far are all things that could be
done fairly easily with client code. I know we don't all agree with
this, but I think the biggest service we can provide to our user
base is good, tested, supported implementations of standard
algorithms.  I wish we could find a way to focus more on that and
less on fiddling with the API or language features.

+1, I have the impressions that they more we try to *optimize* an API we
end up with an inferior solution (with a few exceptions).

There is too much discussion about API design. We should have our best
practices and use them to implement rock-solid algorithms, which is
already difficult enough.

I agree.

In the end it does not matter so much if you
have a fluent API or whatever, as long as it calculates the correct
result, and is easy to use, imho.

I don't agree. Maybe it doesn't matter for the users (although it should),
but it certainly does for the developers (maintainance, etc. etc.).

[If the "form" did not matter, why do several programming languages
exist?]

The problem is that those discussions constantly mix considerations
about contents, with political moves that do not necessarily match.
For example, a statement about contents would be: CM only provides
implementations of sequential mathematical algorithms.
But recent political moves, like changing the version control system
or advertizing "free for all" commit rights, aim at increasing the
contributor base.

I think these considerations are orthogonal:

It would be so easy if it were true, but there are interactions...


 * what you want to do? aka scope of the projects
 * how you want to do it?
 * what infrastructure do you provide to your users/collaborators

I try to point to that the stated goal of trying to gather more
contributors does not match the overly cautious policy with regard
to the language evolution.

What about those people interested in API fixing and new language
features?  You'll make them want to contribute to another project.
Now that Java is, at last, beginning to catch up with other
languages incomparably more widely used in the scientific community,
Commons Math is discussing how far behind it is going to lag!

Afaik the scientific community uses mainly python with its abundance of
great tools. I think Java is better suited in an engineering context.

That's a digression.

The point is to find the right balance (make users happy, make developers not too unhappy). But we must have facts to help determine a real balance,
not just a balance between opinions (which is unlikely to happen).
For example, I'd propose that we advertize a poll with several precise
questions, to collect a statistics on various aspects that can influence
a roadmap, like:
What package(s) of CM are you directly "import"ing in your applications?
 Which Java version are you using to develop applications that use CM?
Are you going to upgrade your applications with each new release of CM?
 What do you miss most in CM?
 etc.

Short of doing it seriously, we might as well skip the divination
part. [That, IMHO, prevents CM from making progress (even through
mistakes, that's fine).]


Gilles



Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to