On 01/16/2015 03:09 AM, Thomas Neidhart wrote:
On 01/16/2015 01:30 AM, Gilles wrote:
On Thu, 15 Jan 2015 15:41:11 -0700, Phil Steitz wrote:
On 1/15/15 2:24 PM, Thomas Neidhart wrote:
On 01/08/2015 12:34 PM, Gilles wrote:
Hi.

Raising this issue once again.
Are we going to upgrade the requirement for the next major release?

   [ ] Java 5
   [x] Java 6
   [x] Java 7
   [ ] Java 8
   [ ] Java 9

A while ago I thought that it would be cool to switch to Java 7/8 for
some of the nice new features (mainly fork/join, lambda expressions and
diamond operator, the rest is more or less unimportant for math imho).

But after some thoughts I think they are not really needed for the
following reasons:

  * the main focus of math is on developing high-quality, well tested and
documented algorithms, the existing language features are more than
enough for this
Sure.
Not so long ago, some people were claiming that nothing beats
programming in "assembly" language.

+1
  * coming up with multi-threaded algorithms might be appealing but it is
also hard work and I wonder if it really makes sense in the times of
projects like mahout / hadoop / ... which aim for even better
scalability
+1
Hard work / easy work.  Yes and no.  It depends on the motivation
of the contributor. Or we have to (re)define clearly the scope of
CM, and start some serious clean-up.
It's not all black or white; I'm quite convinced that it's better
to handle multi-threading externally when the core computation is
sequential.  But CM already contains algorithms that are inherently
parallel (a.o. genetic algorithms) and improvement in those areas
would undoubtedly benefit from (internal) parallel processing.
I think the better approach is to support external parallelization
rather than trying to do it yourself. From a user POV, I would be scared
to use a library that does some kind of parallelization internally which
I can not control.

Some recent examples show how it can be done better: there were some
requests to make some of the statistics related classes map/reducable so
that they can be used in Java 8 parallel streams.

@genetic algorithms: there are far more better libraries out there for
this area and the support we have in math is really very simplistic. You
can basically do just a few demo examples with it and I am more in favor
to deprecate the package.

My HO is we should focus on getting the best single-threaded
implementations we can and, where possible, setting things up to be
executed in parallel by other engines.  Spawning and managing
threads internal to [math] actually *reduces* the range of
applicability of our stuff.
A year ago yes.  These days it's simple to create a docker container that wraps 
a Java service that can be deployed anywhere and scaled that it's really 
attractive to just have access to fast classes that are decoupled from a 
massive framework.

Also the Stream API takes care of the spawning.  The number of threads is 
initialized to the number of cores.

|ForkJoinPool  commonPool  =  ForkJoinPool.commonPool();
System.out.println(commonPool.getParallelism());     // 3

We can increase or decrease the number of cores available by setting:
|
||-Djava.util.concurrent.ForkJoinPool.common.parallelism=5|


|


Examples?
because not everybody wants a library to do parallel stuff internally.
Just imagine math being used in a web-application deployed together with
many other applications. It is clearly not an option that one
application might take over most/all of the available processors.
So if we have a webapp that has the potential to do this deploy it in it's own 
docker container on it's own subdomain.  Spring Boot for example makes it 
simple to generate a jar that is executable and contains the server(Tomcat, 
undertow, jetty, etc.), that can then be dockerized.

I was sitting next to the founder of meetup.io at starbucks the other day, and 
he deployed a server to a new subdomain in 20 seconds using dnsimple and 
digitalocean, at the cost of 10 cents an hour.

Incidentally what he wants for each app is speed and simple code.


  Much better to let Hadoop / Mahout et
al parallelize using fast and accurate piece parts that we can
provide.
Do they really do that?
[Or do they implement their own algorithms knowing that they must
be thread-safe (which is something we don't focus a lot on).]
I guess they have mainly their own algorithms, but there are examples of
our stuff being used (using the map/reduce paradigm).

  If there are parallel algorithms that we are really dying
to implement directly, I would rather see that done in a way that
encapsulates and enables externalization of the thread management.
It's just so nice to be able to use the other features that come with Java 8 
right out of the box though, and lets face,
we all take a step back and smile when we cut down a few lines of code, or 
Thingy runs 10% faster :).

  * staying at Java 6/7 does not block users to use math in a Java 8
environment if wanted
+1 - the examples I have seen thus far are all things that could be
done fairly easily with client code.  I know we don't all agree with
this, but I think the biggest service we can provide to our user
base is good, tested, supported implementations of standard
algorithms.  I wish we could find a way to focus more on that and
less on fiddling with the API or language features.
+1, I have the impressions that they more we try to *optimize* an API we
end up with an inferior solution (with a few exceptions).

There is too much discussion about API design. We should have our best
practices and use them to implement rock-solid algorithms, which is
already difficult enough. In the end it does not matter so much if you
have a fluent API or whatever, as long as it calculates the correct
result, and is easy to use, imho.

The problem is that those discussions constantly mix considerations
about contents, with political moves that do not necessarily match.
For example, a statement about contents would be: CM only provides
implementations of sequential mathematical algorithms.
But recent political moves, like changing the version control system
or advertizing "free for all" commit rights, aim at increasing the
contributor base.
I think these considerations are orthogonal:

  * what you want to do? aka scope of the projects
  * how you want to do it?
  * what infrastructure do you provide to your users/collaborators
If someone needs a fast Thingy, and NodeJS has Thingy (And it's getting a lot 
of Thingys every day) then maybe commons math
just lost out on a few collaborators because what was preferred was:
- the community is growing / plenty of pulse
- functional programming structures
- shorter more concise code
- easier to scale

Cheers,
Ole

Reply via email to