from:"Gilles"

Re: [All] Alpha/beta releases

2019-06-09 Thread Gilles Sadowski

Hi.

Le jeu. 6 juin 2019 à 15:18, sebb  a écrit :
>
> On Wed, 5 Jun 2019 at 23:40, Gary Gregory  wrote:
> >
> > Hi All:
> >
> > I see two lines of usage IRL from people:
> >
> > - I use whatever is "released" on Maven Central. I quote the word released
> > since that includes ANY artifacts, pre 1.0 like a 0.87 or -alpha, and
> > -betas.
>
> N.B. This by definition excludes SNAPSHOTs
>
> > - I am not allowed to use alpha, beta, or SNAPSHOT versions.
> >
> > The reality ends up being that you see some stacks that have a mix of both
> > of the above.
> >
> > We all know that Jar hell is created when breaking BC within the same
> > package name (and Maven coordinates.)
>
> Or changing Maven coordinates but not changing the package name.
>
> > We have clear rules of engagement of major, minor and maintenance releases.
> >
> > The question for me is how should we treat other kinds of releases: alphas
> > and betas. This is assuming that we want to keep on releasing alphas and
> > betas.
> >
> > Jar hell is, well, hellish. I like to avoid it.
>
> +1
>
> > Since the very nature of alphas and betas is that changing APIs should be
> > allowed, even encouraged in order to get the API in the right shape before
> > a x.y.z release, I am warming to using alpha and beta in package names...
>
> I thought API changes were restricted to alpha releases and beta for
> behaviour changes?
> But this is a minor detail.
>
> > If you are to be so bold as to use such a thing, then reflecting that in
> > the import states what you are doing clearly, and avoid any jar hell.
>
> Agreed, it's clearly the user choice here since they have to change
> their code (and POM) to use the new package.
>
> Note: this would also require use of new Maven coordinates.
>
> > That said, it should be left to each component to decide whether or not to
> > opt in such naming.
>
> +1

Ultimately the PMC still needs to vote on the release, no?
Hence I don't see what advantage there is in allowing different
beta policies.  [Of course, no component is required to provide
a beta release...]
What the proposal aims to avoid is JAR hell because of beta
releases that did not change the maven coordinates.

>
> I think it would be worth documenting step by step how the proposal
> works overall, to make sure that nothing has been overlooked.
> One can then look at whether any additional tooling is needed, or if
> it already exists.

If assuming the release process described for [RNG]:

https://gitbox.apache.org/repos/asf?p=commons-rng.git;a=blob;f=doc/release/release.howto.txt
there would be additional steps before step (1):
  * create a branch (say "1.0-beta1") and switch to it
  * change the version (to "1.0-beta1") in all the POM files
  * change the top-level package names in the POM files
  * modify/move the source files accordingly

Ideally, all this would happen automagically by adding
  -Dcommons.release.pre=beta1
to the command referred to in step (1) in the release howto.
[The setting would be picked-up by the release profile or
build plugin, I guess.]

The rest of the process is unchanged.

Please note that the initial idea was whether we could do
away with part of the regular review of the release (no
compatibility requirement, no web site, no archiving, ...), on
the basis that such alpha/beta releases would not benefit
from any of the usual support, except for beta-testing.  [It
was the purpose of asking whether this could be handled by
just "shading" the whole library and create the executable
maven artefacts.]

If this is too brittle to get accepted because there is no
Apache policy for this, then we'd use the usual process
(explicitly create the modified sources and keep all the
beta branches indefinitely...), hopefully automated by the
build-plugin as per the above suggestion.


Regards,
Gilles

> > Gary
> >
> >
> > On Wed, Jun 5, 2019 at 6:25 PM Gilles Sadowski  wrote:
> >
> > > Le mer. 5 juin 2019 à 23:14, sebb  a écrit :
> > > >
> > > > On Wed, 5 Jun 2019 at 17:16, Gilles Sadowski 
> > > wrote:
> > > > >
> > > > > Le mer. 5 juin 2019 à 17:57, James Carman 
> > > a écrit :
> > > > > >
> > > > > > I’m having a hard time understanding the comparing APIs use case.
> > > If I
> > > > > > were to want to try that, I’d create a branch and import the new
> > > dependency
> > > > > > version and see what breaks.  The performance part I wouldn’t think
> > > I’d use
> > > > > > one code base either.  I’m not suggesting my way is

Re: [math] MATH-1486 and release 3.6.2

2019-06-08 Thread Gilles Sadowski

> [We could set up a build on Jenkins.]

Done:
https://builds.apache.org/job/commons-math-unsupported

But the build fails due to an error during Javadoc generation:
https://builds.apache.org/job/commons-math-unsupported/6/console

Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] MATH-1486 and release 3.6.2

2019-06-08 Thread Gilles Sadowski

Hi.

Le ven. 7 juin 2019 à 17:05, Bernd Eckenfels  a écrit :
>
> Gilles, did you meant to use JAVA_HOME not ANT_HOME?

Yes, my mistake indeed; sorry.

Thanks,
Gilles

> Maybe you have been building with Java9+?

>
> Gruss
> Bernd
>
> --
> http://bernd.eckenfels.net
>
> 
> Von: Alex Herbert 
> Gesendet: Freitag, Juni 7, 2019 4:51 PM
> An: Commons Developers List
> Betreff: Re: [math] MATH-1486 and release 3.6.2
>
>
> On 07/06/2019 15:16, Gilles Sadowski wrote:
> > Hello.
> >
> > Le ven. 7 juin 2019 à 11:54, Stephen Colebourne  a 
> > écrit :
> >> On Thu, 6 Jun 2019 at 23:21, Gilles Sadowski  wrote:
> >>> I was about to merge the PR but, on my machine, the build fails.
> >>> Did you try?
> >> `mvn clean verify` works for me (maven running on Java 7 and on Java 8).
> > It doesn't for me:
> > $ ANT_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ mvn clean verify
> > [... skipped...]
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile
> > (default-compile) on project commons-math3: Compilation failure:
> > Compilation failure:
> > [ERROR] Source option 5 is no longer supported. Use 6 or later.
> > [ERROR] Target option 1.5 is no longer supported. Use 1.6 or later.
> >
> > That one is easy to fix, but when done, there is another error.
> > I'm no maven expert...
> >
> > [We could set up a build on Jenkins.]
>
> Maybe there is something strange in your set-up Gilles.
>
> I've just run through the default GitHub merge instructions and the
> build works on two of my machines:
>
> git checkout MATH_3_X
> git checkout -b jodastephen-auto-module-name-MATH-1486 MATH_3_X
> git pull https://github.com/jodastephen/commons-math.git 
> auto-module-name-MATH-1486
> mvn clean verify
>
> This is fine on JDK 8 and 7:
>
> mvn -v
>
> Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 
> 2018-10-24T19:41:47+01:00)
> Maven home: /usr/local/apache-maven-3.6.0
> Java version: 1.8.0_212, vendor: Oracle Corporation, runtime: 
> /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-148-generic", arch: "amd64", family: "unix"
>
> mvn -v
>
> Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 
> 2018-10-24T19:41:47+01:00)
> Maven home: /usr/local/apache-maven-3.6.0
> Java version: 1.7.0_201, vendor: Oracle Corporation, runtime: 
> /usr/lib/jvm/java-7-openjdk-amd64/jre
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "linux", version: "3.13.0-91-generic", arch: "amd64", family: "unix"
>
>
> >>> Back then (pre-fork), I was in favour of maintaining both lines (3.X
> >>> and 4.X); but the 3.X branch has not been maintained for more than
> >>> 3 years, and it shows. Now (post-fork), my opinion is that the effort
> >>> would be better placed in getting the new dependencies of the
> >>> development version of Commons Math released, and release CM
> >>> 4.0 thereafter.
> >> Its great that there is a plan to move forward. But that doesn't solve
> >> the key issue here. Commons-Math 3 is used by over 2300 open source
> >> repos on GitHub [1]. Of course not all are significant projects, but
> >> some are. While some of those projects may be able to move to
> >> Commons-Math 4 when it completes, others will not be able to (because
> >> of their own compatibility constraints). And some of those projects
> >> may want/need to use Java 9 modules, but can't because Commons-Math 3
> >> doesn't have a module name. I'm trying to provide a minimum effort way
> >> for you or another release manager to satisfy that need. I'm very
> >> definitely NOT trying to fix bugs or maintain the branch - in fact my
> >> proposed approach is closer to a security patch in scope.
> > It's how I had understood it, and you are most welcome to
> > drive such a maintenance/security release.
> > If the build process works on your machine, you are a better
> > RM candidate. ;-)
> >
> > Regards,
> > Gilles
> >
> >> Stephen
> >>
> >> [1] https://github.com/apache/commons-math/network/dependents
> >>
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] MATH-1486 and release 3.6.2

2019-06-07 Thread Gilles Sadowski

Le ven. 7 juin 2019 à 17:21, Stephen Colebourne  a écrit :
>
> On Fri, 7 Jun 2019 at 15:16, Gilles Sadowski  wrote:
> > drive such a maintenance/security release.
> > If the build process works on your machine, you are a better
> > RM candidate. ;-)
>
> Given I haven't committed to commons for 10+ years (at a guess),

An opportunity to break the spell. ;-)

> I'm
> not a PMC member

Not necessary.

> and probably don't have permission to push anymore,

https://people.apache.org/phonebook.html?uid=scolebourne

> I
> don't see how it is realistic for me to be RM.

How about just try first?

Regards,
Gilles

> Stephen
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] MATH-1486 and release 3.6.2

2019-06-07 Thread Gilles Sadowski

Hello.

Le ven. 7 juin 2019 à 11:54, Stephen Colebourne  a écrit :
>
> On Thu, 6 Jun 2019 at 23:21, Gilles Sadowski  wrote:
> > I was about to merge the PR but, on my machine, the build fails.
> > Did you try?
>
> `mvn clean verify` works for me (maven running on Java 7 and on Java 8).

It doesn't for me:
$ ANT_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ mvn clean verify
[... skipped...]
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.3:compile
(default-compile) on project commons-math3: Compilation failure:
Compilation failure:
[ERROR] Source option 5 is no longer supported. Use 6 or later.
[ERROR] Target option 1.5 is no longer supported. Use 1.6 or later.

That one is easy to fix, but when done, there is another error.
I'm no maven expert...

[We could set up a build on Jenkins.]

> > Back then (pre-fork), I was in favour of maintaining both lines (3.X
> > and 4.X); but the 3.X branch has not been maintained for more than
> > 3 years, and it shows.  Now (post-fork), my opinion is that the effort
> > would be better placed in getting the new dependencies of the
> > development version of Commons Math released, and release CM
> > 4.0 thereafter.
>
> Its great that there is a plan to move forward. But that doesn't solve
> the key issue here. Commons-Math 3 is used by over 2300 open source
> repos on GitHub [1]. Of course not all are significant projects, but
> some are. While some of those projects may be able to move to
> Commons-Math 4 when it completes, others will not be able to (because
> of their own compatibility constraints). And some of those projects
> may want/need to use Java 9 modules, but can't because Commons-Math 3
> doesn't have a module name. I'm trying to provide a minimum effort way
> for you or another release manager to satisfy that need. I'm very
> definitely NOT trying to fix bugs or maintain the branch - in fact my
> proposed approach is closer to a security patch in scope.

It's how I had understood it, and you are most welcome to
drive such a maintenance/security release.
If the build process works on your machine, you are a better
RM candidate. ;-)

Regards,
Gilles

>
> Stephen
>
> [1] https://github.com/apache/commons-math/network/dependents
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] MATH-1486 and release 3.6.2

2019-06-06 Thread Gilles Sadowski

Hello.

Le jeu. 6 juin 2019 à 18:14, Stephen Colebourne  a écrit :
>
> I've raised a GitHub PR [1] to add the Java 9 module name to [math] on
> the MATH_3_X branch. Assuming that is merged, I'm willing to raise
> another PR with the necessary bits to prepare the repo to release
> v3.6.2.
>
> This approach sidesteps all issues with commons-4 and does the minimum
> necessary for downstream users to use the project as a module in Java
> 9 onwards. (At my day job we produce open source that depends on
> commons-math, which means I can't add a module-info.java until
> commons-math has a module name.)
>
> While I'm technically still a commons committer, I think it would be
> highly innappropriate for me to try and shepherd the actual v3.6.2
> release. Is anyone willing to work with me to do the release? A v3.6.2
> release would contain just the module name change and one performance
> improvement that was added to the repo in 2016, so it should be a case
> of cranking the handle providing not too much has changed in the
> process since 2016.

I was about to merge the PR but, on my machine, the build fails.
Did you try?

Back then (pre-fork), I was in favour of maintaining both lines (3.X
and 4.X); but the 3.X branch has not been maintained for more than
3 years, and it shows.  Now (post-fork), my opinion is that the effort
would be better placed in getting the new dependencies of the
development version of Commons Math released, and release CM
4.0 thereafter.

Regards,
Gilles

> thanks
> Stephen
> [1] https://github.com/apache/commons-math/pull/107

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[RNG] Re: [commons-rng] 04/11: RNG-75: Use SplitMix64.next()

2019-06-06 Thread Gilles Sadowski

Hi.

Not sure about the changes below.
It seems to me that "nextLong()" ensures that a "long" is generated,
while "next()" could, if the RNG type is later changed, return an "int"
cast to "long" (i.e. half its bits set to zero).

Regards,
Gilles

Le jeu. 6 juin 2019 à 10:00,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> aherbert pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-rng.git
>
> commit aa246979feb8c880c60c972faf7c9ffb9174f4cd
> Author: Alex Herbert 
> AuthorDate: Fri May 31 22:35:25 2019 +0100
>
> RNG-75: Use SplitMix64.next()
> ---
>  .../java/org/apache/commons/rng/simple/internal/Long2IntArray.java| 4 
> ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git 
> a/commons-rng-simple/src/main/java/org/apache/commons/rng/simple/internal/Long2IntArray.java
>  
> b/commons-rng-simple/src/main/java/org/apache/commons/rng/simple/internal/Long2IntArray.java
> index d98a77c..2f7660c 100644
> --- 
> a/commons-rng-simple/src/main/java/org/apache/commons/rng/simple/internal/Long2IntArray.java
> +++ 
> b/commons-rng-simple/src/main/java/org/apache/commons/rng/simple/internal/Long2IntArray.java
> @@ -62,11 +62,11 @@ public class Long2IntArray implements 
> Seed2ArrayConverter {
>  int i = 0;
>  // Handle an odd size
>  if ((size & 1) == 1) {
> -out[i++] = NumberFactory.extractHi(rng.nextLong());
> +out[i++] = NumberFactory.extractHi(rng.next());
>  }
>  // Fill the remaining pairs
>  while (i < size) {
> -final long v = rng.nextLong();
> +final long v = rng.next();
>  out[i] = NumberFactory.extractHi(v);
>  out[i + 1] = NumberFactory.extractLo(v);
>  i += 2;
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 23:14, sebb  a écrit :
>
> On Wed, 5 Jun 2019 at 17:16, Gilles Sadowski  wrote:
> >
> > Le mer. 5 juin 2019 à 17:57, James Carman  a 
> > écrit :
> > >
> > > I’m having a hard time understanding the comparing APIs use case.  If I
> > > were to want to try that, I’d create a branch and import the new 
> > > dependency
> > > version and see what breaks.  The performance part I wouldn’t think I’d 
> > > use
> > > one code base either.  I’m not suggesting my way is the only or best way,
> > > just explaining why I’m having a hard time understanding what you’re
> > > doing.  Maybe this will be a learning opportunity for me! :)
> >
> > Case mainly in point is getting to the first release of new components.
> > This is happening now for [Imaging], and will be soon (hopefully) for
> > [Numbers], [Statistics] and [Geometry].
> >
> > IIUC, the former is going to release a beta version without modifying
> > the top-level package name.  This will create the possibility of JAR
> > hell (when 1.0 will be out).
> >
> > Since we don't have that much review/feedback on these new and/or
> > refactored codes, I thought we could be on a safer ground if we first
> > release beta version(s) that
> >  * won't be subject to JAR hell and
> >  * will be easy (i.e. just add the dependency in the POM file) to
> >integrate for people kind enough to give it a try.
> >[If it's not easy[1], they won't do it.]
> >
> > Regards,
> > Gilles
> >
> > [1] Like: You "just" have to install "git", check out the source, install
> > "maven", run the "package" goal, then move the "target/whatever.jar"
> > file to where your code will look for it.
>
> No need to install Git; can just download the source archive and unpack it.
> I think we can assume they already have Maven, otherwise why are we
> worried about releasing to Maven?
>
> Note that the suggestion of using different package names will force
> users to edit their code.

So what; this is the purpose of beta-testing features that don't
exist in previous releases or in the previous beta version.

> They will then have to compile their source, probably using Maven.
>
> Seems to me the suggestion creates more work for end users.

People will have to do something.
When they raise an issue, it is easier for me and for them to point
to one-line change in their dependencies  (and the corresponding
change in their code), then to start explaining that they should
build from source.

>From the discussion, I'm still missing the opinion stating explicitly
that "we don't care about JAR hell produced by a beta release".
My suggestion is only to avoid that.  Is the PMC fine releasing
*incompatible* beta releases (and of course incompatible with the
"stable" release that will follow) with the same package name?

Gilles

> > >
> > > On Wed, Jun 5, 2019 at 11:33 AM Gilles Sadowski 
> > > wrote:
> > >
> > > > Le mer. 5 juin 2019 à 17:04, James Carman  a
> > > > écrit :
> > > > >
> > > > > What sort of comparison are you looking to do within the same code?
> > > > > Performance?
> > > >
> > > > Yes, that's one possibility; another is comparing different APIs.
> > > >
> > > > Gilles
> > > >
> > > > >>>> [...]
> > > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Update ProviderBuilder factory methods

2019-06-05 Thread Gilles Sadowski

Hi.

Le lun. 3 juin 2019 à 16:49, Alex Herbert  a écrit :
>
> Can I get a review of a PR that changes the ProviderBuilder [1]?
>
> The aim is to move the creation of the seed and the RNG into the
> RandomSourceInternal.
>
> This allows for customisation of the seed creation on a per-RNG basis.
> The reason is that some new generators in the library for v1.3 require
> the seed is non-zero. For example this is a realistic possibility when
> the seed is an array of ints of size 2.
>
> The code change has required updating the routines using the
> SeedConverter interface and adding a Seed2ArrayConverter interface:
>
> public interface Seed2ArrayConverter extends SeedConverter {
>  /**
>   * Converts seed from input type to output type. The output type is 
> expected to be an array.
>   *
>   * @param seed Original seed value.
>   * @param outputSize Output size.
>   * @return the converted seed value.
>   */
>  OUT convert(IN seed, int outputSize);
> }
>
> I've moved conversion to a new enum class for seed creation and
> conversion. Previously the ProviderBuilder used maps to store all the
> conversions. These are now explicitly written as conversion methods in
> the enum. The amount of code is the same and the conversions are the
> same. However use of the enum for conversion has removed the need to
> support the Seed2ArrayConverter interface in all the converters. This
> has deprecated some methods and classes. I've not yet marked them as so
> in the PR.
>
> i.e.
>
> private static final Map, SeedConverter> CONV_LONG_ARRAY =
>  new ConcurrentHashMap, SeedConverter>();
>
> does not have to become:
>
> private static final Map, Seed2ArrayConverter> 
> CONV_LONG_ARRAY =
>  new ConcurrentHashMap, Seed2ArrayConverter>();
>
> with the corresponding conversion as:
>
> nativeSeed = CONV_LONG_ARRAY.get(source.getSeed()).convert((long[]) seed, 
> source.getSeedSize());
>
>
> Currently conversions of arrays to arrays ignore the seed size. Creation
> of a new seed is limited to a maximum of 128.

+0
See below.

> This matches the previous
> functionality. It could be changed to create the full length seed.

+0
See below.

> The
> impact would be more work done within synchronized blocks in the
> SeedFactory. I would expect the generation to be slower but the seed
> quality will be higher.

As per the previous discussion, special needs should be covered
by the user.
However, it could be construed that 128 is large enough for a casual
user, and a larger seed could also pass as a special need that the
user can provide explicitly...

So either way is fine I guess.

>
> Only creation of arrays from int/long seeds will use the seed size. This
> does not operate within a synchronized block.
>
>
> Note that RNG constructor is obtained using reflection. Previously was
> done on each invocation. However now that the method is within the
> RandomSourceInternal enum caching the constructor is a natural
> modification since it is always the same.

+1
But please rename the local variable in method "getConstructor()".[1]
;-)

> I have not done this in the
> constructor for the RandomSourceInternal enum as:
>
> - it adds overhead when building all the enums (e.g.
> RandomSourceInternal.values())
>
> - it may throw lots of different types of exception. Rather than
> catching them all in the constructor for the enum they can now be thrown
> during creation of the RNG instance so the user gets the appropriate
> stack trace.
>
>
> The change to create an array using the correct size has performance
> implications (see [2]):
>
> - For small array sizes the creation is faster
>
> - For large array sizes built using an int/long the creation is
> marginally slower (but the seed should be better as it uses the SplitMix
> algorithm rather than the self-seeding strategy of the BaseProvider [3])

Where is it done?

>
> Caching the constructor for use with reflection has improved performance.

+1

Thanks,
Gilles

>
>
> [1] https://github.com/apache/commons-rng/pull/46
>
> [2] https://issues.apache.org/jira/browse/RNG-75
>
> [3] This could be tested by creating a new generator that implements the
> self-seeding strategy as its output and running it through the stress
> test applications.
>

Gilles

[1] https://www.linguee.com/french-english/translation/con.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 17:57, James Carman  a écrit :
>
> I’m having a hard time understanding the comparing APIs use case.  If I
> were to want to try that, I’d create a branch and import the new dependency
> version and see what breaks.  The performance part I wouldn’t think I’d use
> one code base either.  I’m not suggesting my way is the only or best way,
> just explaining why I’m having a hard time understanding what you’re
> doing.  Maybe this will be a learning opportunity for me! :)

Case mainly in point is getting to the first release of new components.
This is happening now for [Imaging], and will be soon (hopefully) for
[Numbers], [Statistics] and [Geometry].

IIUC, the former is going to release a beta version without modifying
the top-level package name.  This will create the possibility of JAR
hell (when 1.0 will be out).

Since we don't have that much review/feedback on these new and/or
refactored codes, I thought we could be on a safer ground if we first
release beta version(s) that
 * won't be subject to JAR hell and
 * will be easy (i.e. just add the dependency in the POM file) to
   integrate for people kind enough to give it a try.
   [If it's not easy[1], they won't do it.]

Regards,
Gilles

[1] Like: You "just" have to install "git", check out the source, install
"maven", run the "package" goal, then move the "target/whatever.jar"
file to where your code will look for it.

>
> On Wed, Jun 5, 2019 at 11:33 AM Gilles Sadowski 
> wrote:
>
> > Le mer. 5 juin 2019 à 17:04, James Carman  a
> > écrit :
> > >
> > > What sort of comparison are you looking to do within the same code?
> > > Performance?
> >
> > Yes, that's one possibility; another is comparing different APIs.
> >
> > Gilles
> >
> > >>>> [...]
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 17:04, James Carman  a écrit :
>
> What sort of comparison are you looking to do within the same code?
> Performance?

Yes, that's one possibility; another is comparing different APIs.

Gilles

>>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 17:02, James Carman  a écrit :
>
> Wouldn’t you have a package collision between two different alpha releases?

Ah, I got it:
In "1.0-alpha1", class "o.a.c.somecomp.alpha1.Foo".
In "1.2-alpha1", class "o.a.c.somecomp.alpha1.Foo".

But those 2 classes can very well be different and incompatible.
However, we can consider that after the release of "1.0", all
"1.0-alphaX" releases are obsolete and serve zero purpose.
The beta releases are "comparable" only within the same base
(unreleased) version.

Gilles

>
> On Wed, Jun 5, 2019 at 10:56 AM Gilles Sadowski 
> wrote:
>
> > Le mer. 5 juin 2019 à 16:47, James Carman  a
> > écrit :
> > >
> > > Ok, what about 1.2?
> >
> > How is it different?
> >
> > Gilles
> >
> > >
> > > On Wed, Jun 5, 2019 at 10:44 AM Gilles Sadowski 
> > > wrote:
> > >
> > > > Le mer. 5 juin 2019 à 16:18, James Carman 
> > a
> > > > écrit :
> > > > >
> > > > > What happens if/when you want to release a 2.0-alpha1 in the future?
> > > >
> > > > Hmm, what happens?
> > > > [At point, we'd have renamed "o.a.c.compid" to ""o.a.c.compid2".]
> > > >
> > > > Gilles
> > > >
> > > > >
> > > > > On Tue, Jun 4, 2019 at 6:53 AM Gilles Sadowski  > >
> > > > wrote:
> > > > >
> > > > > > Hello.
> > > > > >
> > > > > > Does someone see a practical way to automate package names
> > > > > > and source files conversions so that each all alpha/beta releases
> > > > > > can be used together (e.g. to compare their behaviours).
> > > > > >
> > > > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > > > >
> > > > > > This would also solve issues with compatibility checkers (with the
> > > > > > added bonus that JAR hell could never happen).
> > > > > >
> > > > > > Couldn't the "shade" plugin be put to use (so that all artefacts
> > have
> > > > > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > > > > and all the tools operate on that)?
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Gilles
> > > > > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 16:47, James Carman  a écrit :
>
> Ok, what about 1.2?

How is it different?

Gilles

>
> On Wed, Jun 5, 2019 at 10:44 AM Gilles Sadowski 
> wrote:
>
> > Le mer. 5 juin 2019 à 16:18, James Carman  a
> > écrit :
> > >
> > > What happens if/when you want to release a 2.0-alpha1 in the future?
> >
> > Hmm, what happens?
> > [At point, we'd have renamed "o.a.c.compid" to ""o.a.c.compid2".]
> >
> > Gilles
> >
> > >
> > > On Tue, Jun 4, 2019 at 6:53 AM Gilles Sadowski 
> > wrote:
> > >
> > > > Hello.
> > > >
> > > > Does someone see a practical way to automate package names
> > > > and source files conversions so that each all alpha/beta releases
> > > > can be used together (e.g. to compare their behaviours).
> > > >
> > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > >
> > > > This would also solve issues with compatibility checkers (with the
> > > > added bonus that JAR hell could never happen).
> > > >
> > > > Couldn't the "shade" plugin be put to use (so that all artefacts have
> > > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > > and all the tools operate on that)?
> > > >
> > > >
> > > > Regards,
> > > > Gilles
> > > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 16:22, sebb  a écrit :
>
> On Wed, 5 Jun 2019 at 15:06, Gilles Sadowski  wrote:
> >
> > Le mer. 5 juin 2019 à 15:59, sebb  a écrit :
> > >
> > > On Wed, 5 Jun 2019 at 14:33, Gilles Sadowski  wrote:
> > > >
> > > > Le mer. 5 juin 2019 à 15:18, sebb  a écrit :
> > > > >
> > > > > I'm not sure what problem this is trying to solve.
> > > > >
> > > > > How is it intended to use the facility?
> > > >
> > > > Ideally:
> > > > $ mvn -Pbetarelease [... other settings ...] -Dbetasubversion=alpha1
> > > > where the latter profile would take care of changing the
> > > > toplevel package name
> > > > o.a.c.somecomp
> > > > to
> > > > o.a.c.somecomp.alpha1
> > > >
> > > > And, if the upcoming version is, say, "2.3", the generated
> > > > artefact(s) would be:
> > > >   commons-somecomp-2.3-alpha1
> > >
> > > That's not what I intended to ask.
> > >
> > > What problem does the ability to readily change the package name actually 
> > > solve?
> > > And how are the amended packages going to be used?
> >
> > Maybe, I don't understand the question.
> > The purpose is to be able to quickly produce several beta releases that
> > don't have to be compatible with other beta releases but that can coexist
> > for the purpose of allowing users to compare the impact of the changes.
>
> I don't understand how the user can actually test the release unless
> they also produce code that is likewise shaded to invoke the
> appropriate version of the package.

Of course, if they want to test "alpha1", they need to depend on it,
and modify their code accordingly.

> Surely it would be easier to test the code if it used the standard
> package names, i.e. no need to change the user code?

Yes, but that means that we cannot compare "alpha1" and "alpha2" in
the same code.

> i.e. take their app, and run it against the relevant alpha- or beta-release.

Then the main concern is the possibility of JAR hell (e.g. when several
"alpha" are in the classpath).

> This is already possible if the user has the ability to compile the
> component from source.

I think that If we hope to get help from users, we should provide a JAR.

Regards,
Gilles

>
> > Gilles
> >
> > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > >
> > > > > On Tue, 4 Jun 2019 at 17:35, Matt Sicker  wrote:
> > > > > >
> > > > > > This sounds like a shade feature, yes. However, in order to
> > > > > > automatically extract the version extra data and detect a version
> > > > > > keyword like "alpha" may require some additional code, though maybe
> > > > > > the shade plugin already supports that.
> > > > > >
> > > > > > Alternatively, JUnit 5.x uses a tool called API Guardian for marking
> > > > > > which APIs are stable or not:
> > > > > > https://github.com/apiguardian-team/apiguardian
> > > > > >
> > > > > > On Tue, 4 Jun 2019 at 05:53, Gilles Sadowski  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hello.
> > > > > > >
> > > > > > > Does someone see a practical way to automate package names
> > > > > > > and source files conversions so that each all alpha/beta releases
> > > > > > > can be used together (e.g. to compare their behaviours).
> > > > > > >
> > > > > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > > > > >
> > > > > > > This would also solve issues with compatibility checkers (with the
> > > > > > > added bonus that JAR hell could never happen).
> > > > > > >
> > > > > > > Couldn't the "shade" plugin be put to use (so that all artefacts 
> > > > > > > have
> > > > > > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > > > > > and all the tools operate on that)?
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Gilles
> > > > > > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 16:18, James Carman  a écrit :
>
> What happens if/when you want to release a 2.0-alpha1 in the future?

Hmm, what happens?
[At point, we'd have renamed "o.a.c.compid" to ""o.a.c.compid2".]

Gilles

>
> On Tue, Jun 4, 2019 at 6:53 AM Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Does someone see a practical way to automate package names
> > and source files conversions so that each all alpha/beta releases
> > can be used together (e.g. to compare their behaviours).
> >
> > I mean, for release version "1.0-alpha1", the top-level package
> > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> >
> > This would also solve issues with compatibility checkers (with the
> > added bonus that JAR hell could never happen).
> >
> > Couldn't the "shade" plugin be put to use (so that all artefacts have
> > their top-level package transparently set to "o.a.c.compid.alpha1"
> > and all the tools operate on that)?
> >
> >
> > Regards,
> > Gilles
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 16:18, Gary Gregory  a écrit :
>
> On Wed, Jun 5, 2019 at 10:06 AM Gilles Sadowski 
> wrote:
>
> > Le mer. 5 juin 2019 à 15:59, sebb  a écrit :
> > >
> > > On Wed, 5 Jun 2019 at 14:33, Gilles Sadowski 
> > wrote:
> > > >
> > > > Le mer. 5 juin 2019 à 15:18, sebb  a écrit :
> > > > >
> > > > > I'm not sure what problem this is trying to solve.
> > > > >
> > > > > How is it intended to use the facility?
> > > >
> > > > Ideally:
> > > > $ mvn -Pbetarelease [... other settings ...]
> > -Dbetasubversion=alpha1
> > > > where the latter profile would take care of changing the
> > > > toplevel package name
> > > > o.a.c.somecomp
> > > > to
> > > > o.a.c.somecomp.alpha1
> > > >
> > > > And, if the upcoming version is, say, "2.3", the generated
> > > > artefact(s) would be:
> > > >   commons-somecomp-2.3-alpha1
> > >
> > > That's not what I intended to ask.
> > >
> > > What problem does the ability to readily change the package name
> > actually solve?
> > > And how are the amended packages going to be used?
> >
> > Maybe, I don't understand the question.
> > The purpose is to be able to quickly produce several beta releases that
> > don't have to be compatible with other beta releases but that can coexist
> > for the purpose of allowing users to compare the impact of the changes.
> >
>
> This is over the top IMO. That's what JApiCmp is for unless I am missing
> something.

Seems so.  Or I am.
I'm talking about producing official releases; no idea how japicmp
is related...

>
> Gayr
>
>
> >
> > Gilles
> >
> > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > >
> > > > > On Tue, 4 Jun 2019 at 17:35, Matt Sicker  wrote:
> > > > > >
> > > > > > This sounds like a shade feature, yes. However, in order to
> > > > > > automatically extract the version extra data and detect a version
> > > > > > keyword like "alpha" may require some additional code, though maybe
> > > > > > the shade plugin already supports that.
> > > > > >
> > > > > > Alternatively, JUnit 5.x uses a tool called API Guardian for
> > marking
> > > > > > which APIs are stable or not:
> > > > > > https://github.com/apiguardian-team/apiguardian
> > > > > >
> > > > > > On Tue, 4 Jun 2019 at 05:53, Gilles Sadowski 
> > wrote:
> > > > > > >
> > > > > > > Hello.
> > > > > > >
> > > > > > > Does someone see a practical way to automate package names
> > > > > > > and source files conversions so that each all alpha/beta releases
> > > > > > > can be used together (e.g. to compare their behaviours).
> > > > > > >
> > > > > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > > > > >
> > > > > > > This would also solve issues with compatibility checkers (with
> > the
> > > > > > > added bonus that JAR hell could never happen).
> > > > > > >
> > > > > > > Couldn't the "shade" plugin be put to use (so that all artefacts
> > have
> > > > > > > their top-level package transparently set to
> > "o.a.c.compid.alpha1"
> > > > > > > and all the tools operate on that)?
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Gilles
> > > > > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 16:04, Gary Gregory  a écrit :
>
> On Wed, Jun 5, 2019 at 9:59 AM sebb  wrote:
>
> > On Wed, 5 Jun 2019 at 14:33, Gilles Sadowski  wrote:
> > >
> > > Le mer. 5 juin 2019 à 15:18, sebb  a écrit :
> > > >
> > > > I'm not sure what problem this is trying to solve.
> > > >
> > > > How is it intended to use the facility?
> > >
> > > Ideally:
> > > $ mvn -Pbetarelease [... other settings ...] -Dbetasubversion=alpha1
> > > where the latter profile would take care of changing the
> > > toplevel package name
> > > o.a.c.somecomp
> > > to
> > > o.a.c.somecomp.alpha1
> > >
> > > And, if the upcoming version is, say, "2.3", the generated
> > > artefact(s) would be:
> > >   commons-somecomp-2.3-alpha1
> >
> > That's not what I intended to ask.
> >
> > What problem does the ability to readily change the package name actually
> > solve?
> > And how are the amended packages going to be used?
> >
>
> Also, the renamed sources would need to be in git as well.

The script/profile/whatever could be:
 1. create a "beta-release-2.3-alpha1" branch
 2. perform the top-level package name change
 3. commit
 4. proceed as usual

Gilles

> Gary
>
>
> >
> > > Regards,
> > > Gilles
> > >
> > > >
> > > > On Tue, 4 Jun 2019 at 17:35, Matt Sicker  wrote:
> > > > >
> > > > > This sounds like a shade feature, yes. However, in order to
> > > > > automatically extract the version extra data and detect a version
> > > > > keyword like "alpha" may require some additional code, though maybe
> > > > > the shade plugin already supports that.
> > > > >
> > > > > Alternatively, JUnit 5.x uses a tool called API Guardian for marking
> > > > > which APIs are stable or not:
> > > > > https://github.com/apiguardian-team/apiguardian
> > > > >
> > > > > On Tue, 4 Jun 2019 at 05:53, Gilles Sadowski 
> > wrote:
> > > > > >
> > > > > > Hello.
> > > > > >
> > > > > > Does someone see a practical way to automate package names
> > > > > > and source files conversions so that each all alpha/beta releases
> > > > > > can be used together (e.g. to compare their behaviours).
> > > > > >
> > > > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > > > >
> > > > > > This would also solve issues with compatibility checkers (with the
> > > > > > added bonus that JAR hell could never happen).
> > > > > >
> > > > > > Couldn't the "shade" plugin be put to use (so that all artefacts
> > have
> > > > > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > > > > and all the tools operate on that)?
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 15:59, sebb  a écrit :
>
> On Wed, 5 Jun 2019 at 14:33, Gilles Sadowski  wrote:
> >
> > Le mer. 5 juin 2019 à 15:18, sebb  a écrit :
> > >
> > > I'm not sure what problem this is trying to solve.
> > >
> > > How is it intended to use the facility?
> >
> > Ideally:
> > $ mvn -Pbetarelease [... other settings ...] -Dbetasubversion=alpha1
> > where the latter profile would take care of changing the
> > toplevel package name
> > o.a.c.somecomp
> > to
> > o.a.c.somecomp.alpha1
> >
> > And, if the upcoming version is, say, "2.3", the generated
> > artefact(s) would be:
> >   commons-somecomp-2.3-alpha1
>
> That's not what I intended to ask.
>
> What problem does the ability to readily change the package name actually 
> solve?
> And how are the amended packages going to be used?

Maybe, I don't understand the question.
The purpose is to be able to quickly produce several beta releases that
don't have to be compatible with other beta releases but that can coexist
for the purpose of allowing users to compare the impact of the changes.

Gilles

>
> > Regards,
> > Gilles
> >
> > >
> > > On Tue, 4 Jun 2019 at 17:35, Matt Sicker  wrote:
> > > >
> > > > This sounds like a shade feature, yes. However, in order to
> > > > automatically extract the version extra data and detect a version
> > > > keyword like "alpha" may require some additional code, though maybe
> > > > the shade plugin already supports that.
> > > >
> > > > Alternatively, JUnit 5.x uses a tool called API Guardian for marking
> > > > which APIs are stable or not:
> > > > https://github.com/apiguardian-team/apiguardian
> > > >
> > > > On Tue, 4 Jun 2019 at 05:53, Gilles Sadowski  
> > > > wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > Does someone see a practical way to automate package names
> > > > > and source files conversions so that each all alpha/beta releases
> > > > > can be used together (e.g. to compare their behaviours).
> > > > >
> > > > > I mean, for release version "1.0-alpha1", the top-level package
> > > > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > > > >
> > > > > This would also solve issues with compatibility checkers (with the
> > > > > added bonus that JAR hell could never happen).
> > > > >
> > > > > Couldn't the "shade" plugin be put to use (so that all artefacts have
> > > > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > > > and all the tools operate on that)?
> > > > >
> > > > >
> > > > > Regards,
> > > > > Gilles
> > > > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Alpha/beta releases

2019-06-05 Thread Gilles Sadowski

Le mer. 5 juin 2019 à 15:18, sebb  a écrit :
>
> I'm not sure what problem this is trying to solve.
>
> How is it intended to use the facility?

Ideally:
$ mvn -Pbetarelease [... other settings ...] -Dbetasubversion=alpha1
where the latter profile would take care of changing the
toplevel package name
o.a.c.somecomp
to
o.a.c.somecomp.alpha1

And, if the upcoming version is, say, "2.3", the generated
artefact(s) would be:
  commons-somecomp-2.3-alpha1

Regards,
Gilles

>
> On Tue, 4 Jun 2019 at 17:35, Matt Sicker  wrote:
> >
> > This sounds like a shade feature, yes. However, in order to
> > automatically extract the version extra data and detect a version
> > keyword like "alpha" may require some additional code, though maybe
> > the shade plugin already supports that.
> >
> > Alternatively, JUnit 5.x uses a tool called API Guardian for marking
> > which APIs are stable or not:
> > https://github.com/apiguardian-team/apiguardian
> >
> > On Tue, 4 Jun 2019 at 05:53, Gilles Sadowski  wrote:
> > >
> > > Hello.
> > >
> > > Does someone see a practical way to automate package names
> > > and source files conversions so that each all alpha/beta releases
> > > can be used together (e.g. to compare their behaviours).
> > >
> > > I mean, for release version "1.0-alpha1", the top-level package
> > > name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".
> > >
> > > This would also solve issues with compatibility checkers (with the
> > > added bonus that JAR hell could never happen).
> > >
> > > Couldn't the "shade" plugin be put to use (so that all artefacts have
> > > their top-level package transparently set to "o.a.c.compid.alpha1"
> > > and all the tools operate on that)?
> > >
> > >
> > > Regards,
> > > Gilles
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> >
> >
> > --
> > Matt Sicker 
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Geometry] Build aborted on Jenkins

2019-06-05 Thread Gilles Sadowski

Hello.

Le ven. 31 mai 2019 à 20:54, Karl Heinz Marbaise  a écrit :
>
> Hi,
>
> I have created an INFRA ticket, cause one module is stuck and for all
> builds which blocks the whole build process..
>
> https://issues.apache.org/jira/browse/INFRA-18546

Ticket has been closed but the problem hasn't been fixed.
See

https://builds.apache.org/view/A-D/view/Commons/job/commons-geometry/65/console

>
> Kind regards
> Karl Heinz Marbaise
> On 31.05.19 20:10, Karl Heinz Marbaise wrote:
> > Hi,
> >
> > On 31.05.19 20:03, Gilles Sadowski wrote:
> >> Hi.
> >>
> >> All builds fail:
> >>
> >> https://builds.apache.org/view/A-D/view/Commons/job/commons-geometry/
> >
> > Based on a hard configured Time-Out Strategy of 15 minutes...
> >
> > There are currently modules hanging...
> >
> > I will try to kill job ...
> >
> >
> > Kind regards
> > Karl Heinz Marbaise
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-statistics] branch STATISTICS-14 updated: WIP - [...]

2019-06-04 Thread Gilles Sadowski

Hi.

Is there a purpose to having mails from "WIP" commits?
It's difficult to know when/what to comment about.

Regards,
Gilles

P.S. In the below commits, I don't understand the purpose of
"UNKNOWN".  Also, I'd say that having a named constant ZERO
for the value "0" is overkill.

Le mar. 4 juin 2019 à 22:53,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> khmarbaise pushed a commit to branch STATISTICS-14
> in repository https://gitbox.apache.org/repos/asf/commons-statistics.git
>
>
> The following commit(s) were added to refs/heads/STATISTICS-14 by this push:
>  new c634122  WIP - [STATISTICS-14] - BigDecimalStatistics - Continued.  
> - Fixed PMD issues. Only one left:PMD Failure: 
> BigDecimalSummaryStatistics:25 Rule:DataClass Priority:3The class 
> 'BigDecimalSummaryStatistics' is suspected to be a Data Class (WOC=25.000%, 
> NOPA=0, NOAM=4, WMC=27).
> c634122 is described below
>
> commit c634122fda93aa4bfaae2f3ce5b523a9f8e6b6d4
> Author: Karl Heinz Marbaise 
> AuthorDate: Tue Jun 4 22:50:35 2019 +0200
>
> WIP - [STATISTICS-14] - BigDecimalStatistics - Continued.
>  - Fixed PMD issues. Only one left:
>PMD Failure: BigDecimalSummaryStatistics:25 Rule:DataClass 
> Priority:3
>The class 'BigDecimalSummaryStatistics' is suspected to be a Data 
> Class (WOC=25.000%, NOPA=0, NOAM=4, WMC=27).
> ---
>  .../descriptive/BigDecimalSummaryStatistics.java   | 36 
> +++---
>  1 file changed, 25 insertions(+), 11 deletions(-)
>
> diff --git 
> a/commons-statistics-bigdecimal/src/main/java/org/apache/commons/statistics/bigdecimal/descriptive/BigDecimalSummaryStatistics.java
>  
> b/commons-statistics-bigdecimal/src/main/java/org/apache/commons/statistics/bigdecimal/descriptive/BigDecimalSummaryStatistics.java
> index ad5f05a..8eae837 100644
> --- 
> a/commons-statistics-bigdecimal/src/main/java/org/apache/commons/statistics/bigdecimal/descriptive/BigDecimalSummaryStatistics.java
> +++ 
> b/commons-statistics-bigdecimal/src/main/java/org/apache/commons/statistics/bigdecimal/descriptive/BigDecimalSummaryStatistics.java
> @@ -25,6 +25,10 @@ import java.util.function.Consumer;
>  public class BigDecimalSummaryStatistics implements Consumer {
>
>  /**
> + * This is used to assign min/max a useful value which is not {@code 
> null}.
> + */
> +private static final BigDecimal UNKNOWN = BigDecimal.ZERO;
> +/**
>   * The count value for zero.
>   */
>  private static final long ZERO_COUNT = 0L;
> @@ -46,14 +50,21 @@ public class BigDecimalSummaryStatistics implements 
> Consumer {
>  private BigDecimal max;
>
>  /**
> + * This keeps the information if min/max have been assigned a correct 
> value or not.
> + */
> +private boolean minMaxAssigned;
> +
> +/**
>   * Create an instance of BigDecimalSummaryStatistics. {@code count = 0} 
> and sum = {@link
>   * BigDecimal#ZERO}
>   */
>  public BigDecimalSummaryStatistics() {
>  this.count = ZERO_COUNT;
>  this.sum = BigDecimal.ZERO;
> -this.max = null;
> -this.min = null;
> +
> +this.minMaxAssigned = false;
> +this.max = UNKNOWN;
> +this.min = UNKNOWN;
>  }
>
>  /**
> @@ -102,6 +113,7 @@ public class BigDecimalSummaryStatistics implements 
> Consumer {
>
>  this.min = min;
>  this.max = max;
> +this.minMaxAssigned = true;
>  }
>
>  }
> @@ -121,12 +133,13 @@ public class BigDecimalSummaryStatistics implements 
> Consumer {
>  count++;
>  sum = sum.add(value);
>
> -if (min == null) {
> -min = value;
> -max = value;
> -} else {
> +if (minMaxAssigned) {
>  min = min.min(value);
>  max = max.max(value);
> +} else {
> +min = value;
> +max = value;
> +minMaxAssigned = true;
>  }
>  }
>
> @@ -144,12 +157,13 @@ public class BigDecimalSummaryStatistics implements 
> Consumer {
>  count += other.count;
>  sum = sum.add(other.sum);
>
> -if (min == null) {
> -min = other.min;
> -max = other.max;
> -} else {
> +if (minMaxAssigned) {
>  min = min.min(other.min);
>  max = max.max(other.max);
> +} else {
> +min = other.min;
> +max = other.max;
> +minMaxAssigned = true;
>  }
>  }
>
> @@

Re: [numbers-primes] Improving trial division code and algorithm

2019-06-04 Thread Gilles Sadowski

Hi.

Le mar. 4 juin 2019 à 03:49, Heinrich Bohne  a écrit :
>
> I have been advised to raise this improvement suggestion on the
> developers' mailing list. It is about the trial division algorithm in
> the method SmallPrimes.boundedTrialDivision(int, int, List) in
> the primes module. Currently, this algorithm skips multiples of 2 and 3
> as trial candidates, which reduces the number of integers to be tried to
> 1/3 of all integers. The choice of these two prime numbers as the only
> ones whose multiples should be skipped seems arbitrary and is probably
> rooted in the fact that the way the code currently achieves this is
> based on code duplication (by hard-coding the alternation of the trial
> candidate's increment between 2 and 4), and choosing any other prime
> number, or more prime numbers, would increase the extent of the code
> duplication to insufferable dimensions.
>
> However, when altering the code's mechanism not to rely on code
> duplication, using more primes than just 2 and 3 is conceivable. For
> example, when skipping multiples of 2, 3, 5, 7 and 11, the number of
> integers to be tried can be reduced to 16/77 of all integers, at the
> cost of storing 480 pre-computed integers in an array. Considering that
> the class SmallPrimes already contains an array with the first 512 prime
> numbers, this does not seem like very much.
>
> Some more details about this suggestion are explained here:
> https://issues.apache.org/jira/browse/NUMBERS-104
>

Thanks for your contribution.
PR 46 has been merged.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[All] Alpha/beta releases

2019-06-04 Thread Gilles Sadowski

Hello.

Does someone see a practical way to automate package names
and source files conversions so that each all alpha/beta releases
can be used together (e.g. to compare their behaviours).

I mean, for release version "1.0-alpha1", the top-level package
name "o.a.c.compid" would be turned into "o.a.c.compid.alpha1".

This would also solve issues with compatibility checkers (with the
added bonus that JAR hell could never happen).

Couldn't the "shade" plugin be put to use (so that all artefacts have
their top-level package transparently set to "o.a.c.compid.alpha1"
and all the tools operate on that)?


Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Commons][Descriptive][STATISTICS-7][GSoC] SummaryStatistics class design & Whether to use DoubleSummaryStatistics class from java.util package?

2019-06-03 Thread Gilles Sadowski

Hello.

Side note: Top-posting is quite annoying in these discussions...

Le dim. 2 juin 2019 à 21:27, Eric Barnhill  a écrit :
>
> As discussed on prior threads you should have both. There will need to be
> static convenience methods for a user who wants to make a very simple call,
> say Stats.mean() . But, as Alex said, this convenience class will just be a
> front end for the statistics functionality itself. That needs to be in its
> own classes (Mean(), Variance()) which can produce instances that give the
> user more flexibility, For example storeless statistics like Mean() or
> Variance(), or StandardDeviation(), should be updatable, as Gilles said, or
> handle different kind of streams like Alex said. Yet these classes need to
> be designed so that they perform as well as simple implementations when
> desired.
>

Related discussion:
https://issues.apache.org/jira/browse/STATISTICS-14

I agree with the requirement that "simple" usage must be possible.
However, it seems to me that the discussion is upside-down: simple
usage can always be provided by another layer (similar to the "toArray"
method in JDK's "List").  Seamless integration with stream does not
as obvious; hence should not be an afterthought.
Unless I'm mistaken, another way to look at it, is the "in-memoy" vs
"storeless" divide.  The latter being the most interesting case (when the
quantity can be computed) design-wise.

I suggest that the testing ground (read: code) is to provide the variance.
And see how it plays with a "DoubleStream", how it can also provide
"sum of squares" and "mean"; or how, inversely, "sum of squares" and
"mean" can be "combined" to provide variance.

Regards,
Gilles

> On Sun, Jun 2, 2019 at 5:45 AM Virendra singh Rajpurohit <
> virendrasing...@gmail.com> wrote:
>
> > I've been trying to make summary statistics class. I have some doubt.
> > There is a class DoubleSummaryStatistics in java.util package(There are two
> > more for Int and Long). I'll attach this file here.
> > Do I have to design SummaryStatistics in this way only? I mean,
> > description on DoubleSummaryStatistics is "This class is designed to work
> > with (though does not require) streams
> > <https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html>.
> > For example, you can compute summary statistics on a stream of doubles with:
> >
> >
> >  DoubleSummaryStatistics stats = 
> > doubleStream.collect(DoubleSummaryStatistics::new,
> >   
> > DoubleSummaryStatistics::accept,
> >
> >
> > DoubleSummaryStatistics::combine);"
> > Earlier my understanding of the project was that the user just have to
> > call the function "getSummary()" & all the calculations will be done
> > automatically in streams. but As we can see in DoubleSummaryStatistics we
> > have to call collect() method.
> > There are some functions like max, min, sum, count, average which are
> > already defined in this class. So should I extend this class in my class or
> > not? Also, I'll have to add more statistics other than max,min,sum for that
> > I have to override accept() function which will be used for  streams.
> >
> > Warm Regards,
> > --
> > *Virendra Singh Rajpurohit*
> >
> > *University of Petroleum and Energy Studies,Dehradun*
> > Linkedin:https://www.linkedin.com/in/virendra-singh-rajpurohit
> >
> >
> >
> >
> >
> > [image: Mailtrack]
> > <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
> >  Sender
> > notified by
> > Mailtrack
> > <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
> >  06/02/19,
> > 6:14:27 PM
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] - GEOMETRY-54

2019-06-02 Thread Gilles Sadowski

Hello.

Le dim. 2 juin 2019 à 10:00, Karl Heinz Marbaise  a écrit :
>
> Hi,
>
> On 02.06.19 00:06, Alex Herbert wrote:
> >
> >> On 1 Jun 2019, at 22:49, Gilles Sadowski  wrote:
> >>
> >> Hello.
> >>
> >> Le sam. 1 juin 2019 à 21:56, Karl Heinz Marbaise  a 
> >> écrit :
> >>>
> >>> Hi,
> >>> I've created a branch[1] which fixes some checkstyle reported issues.
> >>> The resulting build[2] shows the issues have been fixed.
> >>>
> >>> Can someone take a look at that...
> >>>
> >>> If there are no objections would it be Ok to merge that to master?
> >>
> >> I'm against "throws" clauses for unchecked exception; as per J. Bloch:
> >> "[...] do not provide throws clauses for unchecked exceptions".[1] >
> > I recently upgraded checkstyle in [rng] and [statistics] because it was 
> > using old configuration that did not utilise recent checkstyle features to 
> > enforce the coding style. Geometry was based on the same checkstyle and it 
> > should really be upgraded.
> >
> > However I did not upgrade [geometry] or [numbers] as a quick check showed 
> > there were a lot of problems [1] (and I did not have time). Even just 
> > upgrading checkstyle to 8.20 and keeping the same config finds additional 
> > problems as the checking is better.
> >
> > Would you consider incorporating an update to checkstyle in this branch or 
> > another PR? Most of the work is trivial and should not take long. The main 
> > source of problems are the tests which could be ignored during checks.
>
> I think it makes sense to make  separate JIRA + Branch and upgrade the
> configuration as in statistics...first (also in geometry would make sense).
>
> afterwards I can reconsider GEOMETRY-54 ...
>
> I would not ignore tests cause tests should check production code so it
> should have the same quality as the production code if not even better

Ideally yes, but their main purpose is to help ensure correctness
of the main code, in the same way that CheckStyle helps too, by
making the code more readable through consistency.
So, enforcing CheckStyle on the test code is a lower priority (wrt,
say, getting to a releasable state for the main code...).

We could perhaps have a maven profile that selects whether the
tests should be scanned or not.  And open a general JIRA issue
for the task for gradually upgrading all the test sources.
You are of course welcome to tackle that task. :-)

Please note that it would also mean porting the tests to Junit 5
(as per another recent thread on this list).

Regards,
Gilles

>
> Kind regards
> Karl Heinz Marbaise
>
> >
> > Alex
> >
> >
> > [1] 
> > http://mail-archives.apache.org/mod_mbox/commons-dev/201905.mbox/%3C43eb34dc-ebdc-e0d8-c943-a35bc642d4ca%40gmail.com%3E
> >  
> > <http://mail-archives.apache.org/mod_mbox/commons-dev/201905.mbox/%3c43eb34dc-ebdc-e0d8-c943-a35bc642d...@gmail.com%3E>
> >
> >>
> >> Regards,
> >> Gilles
> >>
> >> [1] 
> >> https://books.google.be/books?id=ka2VUBqHiWkC=PA253=PA253=effective+java+bloch+throws+clause+unchecked=bl=y_HoMgr2Q0=ACfU3U2ffB7Nq_sS4VAFz0vVACe8fPT8WA=fr=X=2ahUKEwiM0q-QncniAhUIY1AKHZm0CY4Q6AEwDHoECAkQAQ#v=onepage=effective%20java%20bloch%20throws%20clause%20unchecked=false
> >>
> >>>
> >>> Kind regards
> >>> Karl Heinz Marbaise
> >>>
> >>> [1]:
> >>> https://gitbox.apache.org/repos/asf?p=commons-geometry.git;a=commitdiff;h=6bfaf0653730bc8edc701b4e34f24d04adbaa78a
> >>> [2]: https://travis-ci.org/apache/commons-geometry/builds/540087047
> >>>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] - GEOMETRY-54

2019-06-01 Thread Gilles Sadowski

Hello.

Le sam. 1 juin 2019 à 21:56, Karl Heinz Marbaise  a écrit :
>
> Hi,
> I've created a branch[1] which fixes some checkstyle reported issues.
> The resulting build[2] shows the issues have been fixed.
>
> Can someone take a look at that...
>
> If there are no objections would it be Ok to merge that to master?

I'm against "throws" clauses for unchecked exception; as per J. Bloch:
"[...] do not provide throws clauses for unchecked exceptions".[1]

Regards,
Gilles

[1] 
https://books.google.be/books?id=ka2VUBqHiWkC=PA253=PA253=effective+java+bloch+throws+clause+unchecked=bl=y_HoMgr2Q0=ACfU3U2ffB7Nq_sS4VAFz0vVACe8fPT8WA=fr=X=2ahUKEwiM0q-QncniAhUIY1AKHZm0CY4Q6AEwDHoECAkQAQ#v=onepage=effective%20java%20bloch%20throws%20clause%20unchecked=false

>
> Kind regards
> Karl Heinz Marbaise
>
> [1]:
> https://gitbox.apache.org/repos/asf?p=commons-geometry.git;a=commitdiff;h=6bfaf0653730bc8edc701b4e34f24d04adbaa78a
> [2]: https://travis-ci.org/apache/commons-geometry/builds/540087047
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] - Contributions to Commons Numbers

2019-05-31 Thread Gilles Sadowski

Hello.

>> [...]
> >>
> >> 1. The documentation[1] states that every Apache committer has write
> >> access to the commons projects.
> >
> > In practice, you may still have to ask the PMC chair (Gary) to grant
> > you the access rights of the "commons" team.
>
> That is just fine...should I send a mail personally to him?

As you wish, I guess.  Usually, he would notice such requests; if not
done by next week, it might help. ;-)

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Geometry] Build aborted on Jenkins

2019-05-31 Thread Gilles Sadowski

Hi.

All builds fail:
https://builds.apache.org/view/A-D/view/Commons/job/commons-geometry/

Thanks,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] - Contributions to Commons Numbers

2019-05-31 Thread Gilles Sadowski

Hi Eric

Le ven. 31 mai 2019 à 19:29, Eric Barnhill  a écrit :
>
> This is well worth discussing.
>
> The protocol here could be improved. Where I work, we all write a lot of
> code and we all have write access. We also *always* submit PRs rather than
> push directly, and *always* request review from at least one other person.
> This is because it is always risky to push code that doesn't have other
> eyes on it.
>
> So whether you get/have write access or not, I think the protocol should
> always be PRs. That is common practice in industry. We could all make more
> use of the "request review" portion of the PR interface. For numbers, this
> might entail requesting review from Gilles and one peer. To clarify, this
> is only my suggestion and others may disagree.

Partly (depending on the contents of the change).
And if we think we missed something important, then "git revert" should
be fine to restart the discussion.

> Speaking to Fraction specifically where you have been contributing. First
> of all thank you for your contributions there. I just about finished my
> contributions to that module, but have been using my "Apache time" to
> mentor the GSoC coders, and have not had time to consider the recent
> suggestions. Please feel free to finish it and add your name as a
> contributor. If you do I would prefer that you submit a PR and request
> Gilles and myself for review.

The "fraction-dev" is fairly old (it does not work on travis anymore);
what about merging it to "master" (after a "rebase", I guess)?
Then we can ask PRs against "master" for the recent suggestions.

Best,
Gilles

>
>
> On Fri, May 31, 2019 at 10:12 AM Karl Heinz Marbaise 
> wrote:
>
> > Hi to all,
> >
> > I have contributed some PR#s (via GitHub) to the commons-numbers
> > project...(They have been accepted and merged ;-))
> >
> > I have some questions:
> >
> > 1. The documentation[1] states that every Apache committer has write
> > access to the commons projects.
> >
> > So I could change to use gitbox directly via branch instead of GitHub PR's.
> >
> > The question is: What is the prefered way to contribute to the projects?
> >
> >   - via GitHub PR
> >   - via Branch GitBox ?
> >
> >
> > 2. I have already access to JIRA but unfortunately I can't assign JIRA
> > issue to myself ?
> >
> > Is this intentionally or is this an issue?
> >
> >
> > Kind regards
> > Karl Heinz Marbaise
> >
> >
> > [1]: https://commons.apache.org/
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] - Contributions to Commons Numbers

2019-05-31 Thread Gilles Sadowski

Hi.

Le ven. 31 mai 2019 à 19:12, Karl Heinz Marbaise  a écrit :
>
> Hi to all,
>
> I have contributed some PR#s (via GitHub) to the commons-numbers
> project...(They have been accepted and merged ;-))
>
> I have some questions:
>
> 1. The documentation[1] states that every Apache committer has write
> access to the commons projects.

In practice, you may still have to ask the PMC chair (Gary) to grant
you the access rights of the "commons" team.

>
> So I could change to use gitbox directly via branch instead of GitHub PR's.
>
> The question is: What is the prefered way to contribute to the projects?
>
>   - via GitHub PR
>   - via Branch GitBox ?

If it fixes the project config or many obvious little things like typos, and
you have the access rights, it's a lot of unnecessary work that I'd have
to do the merge.

If the change is related to the code itself, the usage could be to file a
JIRA report and post to the "dev" ML, asking whether there is any
objection to the change.  With only +1s (or the absence of reaction
after a few days), it's normally safe to commit the change yourself.

>
> 2. I have already access to JIRA but unfortunately I can't assign JIRA
> issue to myself ?
>
> Is this intentionally or is this an issue?

That should be fixed with Gary's action mentioned above.

Thanks,
Gilles

>
> Kind regards
> Karl Heinz Marbaise
>
>
> [1]: https://commons.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics][descriptive] Classes or static methods for common descriptive statistics?

2019-05-29 Thread Gilles Sadowski

Hello.

Le mar. 28 mai 2019 à 20:36, Alex Herbert  a écrit :
>
>
>
> > On 28 May 2019, at 18:09, Eric Barnhill  wrote:
> >
> > The previous commons-math interface for descriptive statistics used a
> > paradigm of constructing classes for various statistical functions and
> > calling evaluate(). Example
> >
> > Mean mean = new Mean();
> > double mn = mean.evaluate(double[])
> >
> > I wrote this type of code all through grad school and always found it
> > unnecessarily bulky.  To me these summary statistics are classic use cases
> > for static methods:
> >
> > double mean .= Mean.evaluate(double[])
> >
> > I don't have any particular problem with the evaluate() syntax.
> >
> > I looked over the old Math 4 API to see if there were any benefits to the
> > previous class-oriented approach that we might not want to lose. But I
> > don't think there were, the functionality outside of evaluate() is minimal.
>
> A quick check shows that evaluate comes from UnivariateStatistic. This has 
> some more methods that add little to an instance view of the computation:
>
> double evaluate(double[] values) throws MathIllegalArgumentException;
> double evaluate(double[] values, int begin, int length) throws 
> MathIllegalArgumentException;
> UnivariateStatistic copy();
>
> However it is extended by StorelessUnivariateStatistic which adds methods to 
> update the statistic:
>
> void increment(double d);
> void incrementAll(double[] values) throws MathIllegalArgumentException;
> void incrementAll(double[] values, int start, int length) throws 
> MathIllegalArgumentException;
> double getResult();
> long getN();
> void clear();
> StorelessUnivariateStatistic copy();
>
> This type of functionality would be lost by static methods.
>
> If you are moving to a functional interface type pattern for each statistic 
> then you will lose the other functionality possible with an instance state, 
> namely updating with more values or combining instances.
>
> So this is a question of whether updating a statistic is required after the 
> first computation.
>
> Will there be an alternative in the library for a map-reduce type operation 
> using instances that can be combined using Stream.collect:
>
>  R collect(Supplier supplier,
>   ObjDoubleConsumer accumulator,
>   BiConsumer combiner);
>
> Here  would be Mean:
>
> double mean = Arrays.stream(new double[1000]).collect(Mean::new, Mean::add, 
> Mean::add).getMean() with:
>
> void add(double);
> void add(Mean);
> double getMean();
>
> (Untested code)
>
> >
> > Finally we should consider whether we really need a separate class for each
> > statistic at all. Do we want to call:
> >
> > Mean.evaluate()
> >
> > or
> >
> > SummaryStats.mean()
> >
> > or maybe
> >
> > Stats.mean() ?
> >
> > The last being nice and compact.
> >
> > Let's make a decision so our esteemed mentee Virendra knows in what
> > direction to take his work this summer. :)
>

I'm not sure I understand the implicit conclusions of this conversation
and the other one there:
https://markmail.org/message/7dmyhzuy6lublyb5

Do we agree that the core issue is *not* how to compute a mean, or a
median, or a fourth moment, but how any and all of those can be
computed seamlessly through a functional API (stream)?

As Alex pointed out, a useful functionality is the ability to "combine"
instances, e.g. if data are collected by several threads.
A potential use-case is the retrieval of the current value of (any)
statistical quantities while the data continues to be collected.

An initial idea would be:
public interface StatQuantity {
public double value(double[]); // For "basic" usage.
public double value(DoubleStream); // For "advanced" usage.
}

public class StatCollection {
/** Specify which quantities this collection will hold/compute. */
public StatCollection(Map stats) { /*... */ }

/**
 * Start a worker thread.
 * @param data Values for which the stat quantities must be computed.
 */
public void startCollector(DoubleStream data) { /* ... */ }

/** Combine current state of workers. */
    public void collect() { /* ... */ }

/** @return the current (combined) value of a named quantity. */
public double get(String name) { /* ... */ }

private StatCollector implements Callable {
StatCollector(DoubleStream data) { /* ... */ }
}
}

This is all totally untested, very partial, and probably wrong-headed but
I thought that we were looking at this kind of refactoring.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Upgrade examples to Java 8

2019-05-26 Thread Gilles Sadowski

Le dim. 26 mai 2019 à 19:28, Alex Herbert  a écrit :
>
> Currently the examples projects use:
>
> Examples-jmh: JDK 1.6
> Examples-quadrature: JDK 1.7
> Examples-stress: JDK 1.8
> Examples-sampling: JDK 1.8
> Examples-jpms: JDK 1.9
>
> Having JMH at level 1.6 prevents testing against algorithms in SecureRandom 
> (newer algorithms in 1.8), ThreadLocalRandom (1.7) and SplittableRandom (1.8).
>
> It would be good to upgrade JMH to 1.8 but I don’t see a reason to not 
> upgrade all to JDK 1.8. These are not part of the RNG distribution.
>
> Alex
>

It was assumed that upgrading sub-modules of "commons-rng-examples"
was allowed if a particular example requires it.
+1 for "jmh"
+0 for "quadrature"

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Proposal to introduce JUnit 5 in commons-numbers

2019-05-25 Thread Gilles Sadowski

Hi.

Le ven. 24 mai 2019 à 06:01, Eitan Adler  a écrit :
>
> (please make sure to CC me on replies)
>
> +1 on this. One thing I'd like for us to avoid a mess of different junit
> versions making it difficult to know which runner will be executing the
> class. It would be great if we did a complete conversion and not just
> introduced new syntax.

+1

> I've actually done this before on a largish Java
> project from JUnit3/4 to 5. It isn't hard, just a fair amount of mechanical
> code changes.

Patch/PR welcome.

Regards,
Gilles

>
> On Wed, 22 May 2019 at 15:30, Eric Barnhill  wrote:
>
> > +1
> >
> > On Wed, May 22, 2019 at 3:15 PM Gilles Sadowski 
> > wrote:
> >
> > > Hi.
> > >
> > > Le mer. 22 mai 2019 à 18:43, Heinrich Bohne  a
> > > écrit :
> > > >
> > > > Right now, commons-numbers is using JUnit 4.12, the last stable version
> > > > of JUnit 4. As far as I am aware, there is no explicit syntax in JUnit
> > > > 4.12 for testing whether an exception is thrown apart from either using
> > > > the deprecated class ExpectedException or adding the "expected"
> > > > parameter to the Test annotation. The problem with the latter approach
> > > > is that it is impossible to ascertain where exactly in the annotated
> > > > method the exception is thrown – it could be thrown somewhere
> > unexpected
> > > > and the test will still pass. Besides, when testing the same exception
> > > > trigger with multiple different inputs, it is impractical to create a
> > > > separate method for each test case, which would be necessary with both
> > > > aforementioned approaches.
> > > >
> > > > This has led to the creation of constructs where the expected exception
> > > > is swallowed, which has been deemed undesirable
> > > > <
> > >
> > https://issues.apache.org/jira/browse/NUMBERS-99?focusedCommentId=16843419=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16843419
> > > >.
> > > > Because of this, I propose to add JUnit 5 as a dependency in
> > > > commons-numbers. JUnit 5 has several "assertThrows" methods that would
> > > > solve the described dilemma.
> > >
> > > +1
> > >
> > > Gilles
> > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [GSoC] Thursday mentee meeting

2019-05-23 Thread Gilles Sadowski

Hello.

Le mer. 22 mai 2019 à 20:51, Eric Barnhill  a écrit :
>
> Let's have another mentee meeting Thursday morning, same time as the
> previous two.

I won't be in front of the keyboard at 5 PM UTC.

Two potential contributors (Aleksander Ściborek and Ellen Kartysheva)
posted (here, on the "dev" ML) proposals that are connected to the GSoC
work discussed in these meetings.
Please engage with them in order to come up with an consensus on the
way(s) forward.

Thanks,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] release

2019-05-23 Thread Gilles Sadowski

Hi.

Le jeu. 23 mai 2019 à 15:37, Rob Tompkins  a écrit :
>
>
>
> > On May 23, 2019, at 7:25 AM, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> >> Le mer. 22 mai 2019 à 14:07, Matt Juntunen  a 
> >> écrit :
> >>
> >> Hi Sven,
> >>
> >> Until we can roll out an actual release of numbers and geometry,
> >
> > Any update of the roadmap? :-)
>
> Do we have a reason to not release?

Code is in the middle of being refactored, and contains many
to-be-deleted classes.

Gilles

>
> -Rob
>
> >
> >> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] binary compatibility with final modifier

2019-05-23 Thread Gilles Sadowski

Hello.

Le jeu. 23 mai 2019 à 14:10, Alex Herbert  a écrit :
>
>
> On 23/05/2019 12:53, sebb wrote:
> > Are the classes supposed to be final?
> > Or just the existing constructor(s)?
>
> The two package-private classes are definitely helper classes and should
> be final.
>
> The class with the clirr issue (it is actually an info) only has static
> methods. So currently it is a utility class.
>
> Changing it to have a new role with instance methods would be a design
> update that could be served by introducing a new class. However this
> class has taken the best name.
>
> Any instance role for the class would require that it is typed for
> generics. But a quick try seems to pass clirr.
>
> Gilles, any opinion on a future for ListSampler as:
>
> public class ListSampler {
>
> // Other static stuff (already in the class)...
>
> T sample();
>
> }

Unless I'm missing something, this use-case is covered by
"CollectionSampler".[1]
"ListSampler" is for other use-cases (sublist, in-place shuffle).[2]

Regards,
Gilles

[1] 
http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/CollectionSampler.html
[2] 
http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/ListSampler.html

>
> Alex
>
>
> >
> > On Thu, 23 May 2019 at 12:51, Gilles Sadowski  wrote:
> >> Hello.
> >>
> >> Le jeu. 23 mai 2019 à 13:43, Alex Herbert  a 
> >> écrit :
> >>> [rng] has three classes with a private constructor that are not
> >>> currently marked as final. 1 is public and 2 are package private.
> >>>
> >>> If I mark them as final then clirr:check ignores the package private
> >>> ones and produces this warning for the public one:
> >> If it's a "Warning" and not an "Error", I don't think that it could
> >> count as a release blocker.  [Confirmation from PMC members
> >> welcome...]
> >>
> >>> "Added final modifier to class, but class was effectively final anyway"
> >>>
> >>>
> >>> Given the class could not have been extended (due to a private
> >>> constructor) it seems OK to allow the final modifier.
> >> I think so.
> >>
> >>> So can the final modifier be added? Is there a precedent here with
> >>> regard to releases?
> >> Cf. above.
> >>
> >> Gilles
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] binary compatibility with final modifier

2019-05-23 Thread Gilles Sadowski

Hello.

Le jeu. 23 mai 2019 à 13:43, Alex Herbert  a écrit :
>
> [rng] has three classes with a private constructor that are not
> currently marked as final. 1 is public and 2 are package private.
>
> If I mark them as final then clirr:check ignores the package private
> ones and produces this warning for the public one:

If it's a "Warning" and not an "Error", I don't think that it could
count as a release blocker.  [Confirmation from PMC members
welcome...]

>
> "Added final modifier to class, but class was effectively final anyway"
>
>
> Given the class could not have been extended (due to a private
> constructor) it seems OK to allow the final modifier.

I think so.

>
> So can the final modifier be added? Is there a precedent here with
> regard to releases?

Cf. above.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics] PMD

2019-05-23 Thread Gilles Sadowski

Hi.

Le jeu. 23 mai 2019 à 13:33, Alex Herbert  a écrit :
>
> Having just fixed [rng] for PMD I had a look at why it was fine in
> statistics as some of the issues are present there.
>
> PMD in statistics is old. It uses rules that are deprecated.
>
> I have updated to a similar ruleset to [rng] and fixed all the problems.
> See this PR [1].
>
> One rule violation is this class name:
>
> SaddlePointExpansion
>
> It is a class copied from [math]. I've set the name to use an exclusion.
> However since the code has not been released we could just rename to
>
> SaddlePointExpansionHelper
> SaddlePointExpansionUtils

Sure.  Also, the class is package-private.

Gilles

>
> Alex
>
>
> [1] https://github.com/apache/commons-statistics/pull/13
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] release

2019-05-23 Thread Gilles Sadowski

Hi.

Le mer. 22 mai 2019 à 14:07, Matt Juntunen  a écrit :
>
> Hi Sven,
>
> Until we can roll out an actual release of numbers and geometry,

Any update of the roadmap? :-)

> I think your best bet is to fork commons-geometry and all of its SNAPSHOT 
> dependencies, change the groupIds and/or artifactIds in the poms to something 
> custom (to avoid conflicts later on), and then release those directly to your 
> artifactory server. You'll probably need to make other modifications to the 
> poms in order to get this to work. This process will most likely be very 
> painful. If you only use small portions of the geometry code, another option 
> would be to temporarily copy some of the classes from commons-geometry 
> directly into your application code. This would avoid a lot of messing around 
> with release processes and would make the commons-geometry behavior that your 
> application relies on directly visible. I would use this approach, if at all 
> possible, until a real release can be made.
>
> Godspeed.
>
> -Matt
> 
> From: Sven Rathgeber 
> Sent: Wednesday, May 22, 2019 3:23 AM
> To: dev@commons.apache.org
> Subject: [geometry] release
>
> Hi,
>
> I use in one of our applications the current state
> of https://github.com/apache/commons-geometry 
> (c45647f45df7d81819e47ad6bd0d342069fb305d ).
> (... which has a couple of child projects and relies on the current state of 
> common-numbers -> in sum about 15 jars.)
>
> Now I have to release my application in order to bring it to our testsystem, 
> but maven
> is not happy about the SNAPSHOT release of commons-geometry.
>
> I tried with a profile to release it to our artifactory server  but that 
> looks like a pretty hard
> way. Currently maven wants me to commit to 'https://gitbox.apache.org' :), 
> which is not exactly what I want.
>
> Do you see a way how I can release my application ?

This could be an approach:
https://maven.apache.org/plugins/maven-shade-plugin/

Regards,
Gilles

>
> Cheers.
> Sven
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] default maven goal

2019-05-23 Thread Gilles Sadowski

Hi.

Le jeu. 23 mai 2019 à 11:49, Alex Herbert  a écrit :
>
> The [rng] pom does not have a default goal. Here are the default goals
> in the projects I currently have checked out:
>
> commons-codec/pom.xml:clean verify apache-rat:check
> clirr:check javadoc:javadoc
>
> commons-collections/pom.xml:clean verify
> apache-rat:check clirr:check javadoc:javadoc
>
> commons-lang/pom.xml:   clean verify apache-rat:check
> clirr:check checkstyle:check spotbugs:check javadoc:javadoc
>
> commons-statistics/pom.xml:clean verify
> apache-rat:check clirr:check checkstyle:check pmd:check spotbugs:check
> javadoc:javadoc
>
> commons-text/pom.xml:clean verify apache-rat:check
> clirr:check checkstyle:check spotbugs:check javadoc:javadoc
>
> These seem to match at least this:
>
> clean verify apache-rat:check clirr:check javadoc:javadoc
>
> Some also run:
>
> checkstyle:check spotbugs:check
>
> The only projects I have without a default goal are:
>
> commons-geometry
> commons-math
> commons-numbers
> commons-rng
>
> I think it would be useful to run all the checks that the project is
> required to pass in travis as the default goal. This can be used by a
> developer as a final check before commit.
>
> Opinions?
>
> Is the defaultGoal a domain of the developer (as it is used above) or
> should it be for an end user, e.g. where I have seen it used for 'mvn
> install'.

No idea; I always specify something to do.

Before a commit, I generally run
$ mvn site site:stage
and have a look at the generated reports (mainly CheckStyle).

Regards,
Gilles

>
> Alex
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Proposal to introduce JUnit 5 in commons-numbers

2019-05-22 Thread Gilles Sadowski

Hi.

Le mer. 22 mai 2019 à 18:43, Heinrich Bohne  a écrit :
>
> Right now, commons-numbers is using JUnit 4.12, the last stable version
> of JUnit 4. As far as I am aware, there is no explicit syntax in JUnit
> 4.12 for testing whether an exception is thrown apart from either using
> the deprecated class ExpectedException or adding the "expected"
> parameter to the Test annotation. The problem with the latter approach
> is that it is impossible to ascertain where exactly in the annotated
> method the exception is thrown – it could be thrown somewhere unexpected
> and the test will still pass. Besides, when testing the same exception
> trigger with multiple different inputs, it is impractical to create a
> separate method for each test case, which would be necessary with both
> aforementioned approaches.
>
> This has led to the creation of constructs where the expected exception
> is swallowed, which has been deemed undesirable
> <https://issues.apache.org/jira/browse/NUMBERS-99?focusedCommentId=16843419=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16843419>.
> Because of this, I propose to add JUnit 5 as a dependency in
> commons-numbers. JUnit 5 has several "assertThrows" methods that would
> solve the described dilemma.

+1

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] branch fraction-dev updated (3b21325 -> 92de0b4)

2019-05-22 Thread Gilles Sadowski

Hi Eric.

Will you have look a NUMBERS-100:
   https://issues.apache.org/jira/browse/NUMBERS-100

I've just thought that it might interfere with your changes in the
"fraction-dev" branch.

Regards,
Gilles


Le mer. 22 mai 2019 à 21:23,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> ericbarnhill pushed a change to branch fraction-dev
> in repository https://gitbox.apache.org/repos/asf/commons-numbers.git.
>
>
> from 3b21325  NUMBERS-97: restoring pow() method, lost in rebase
>  new 092e816  NUMBERS-97: replacing pow method
>  new 97683d5  NUMBERS-97: test for Fraction parse method
>  new 3460841  NUMBERS-97: Added test of parse method in BigFractionTest, 
> and updated outdated use of RoundingMode
>  new 92de0b4  minor: login credentials test
>
> The 4 revisions listed above as "new" are entirely new to this
> repository and will be described in separate emails.  The revisions
> listed as "add" were already present in the repository and have only
> been added to this reference.
>
>
> Summary of changes:
>  .../commons/numbers/fraction/BigFraction.java  | 27 +++
>  .../commons/numbers/fraction/BigFractionTest.java  | 30 
> --
>  .../commons/numbers/fraction/FractionTest.java |  4 +--
>  3 files changed, 57 insertions(+), 4 deletions(-)
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Suppress PMD violations

2019-05-22 Thread Gilles Sadowski

Hi Alex.

PR looks fine.

Thanks,
Gilles


Le mer. 22 mai 2019 à 17:09, Alex Herbert  a écrit :
>
> I'm trying to get pmd:check to be useful.
>
> This means fixing all the PMD violations. To fix some would be a
> refactor of reference algorithms which I do not want to start doing. So
> I've opted for the easier fix of increasing the allowed complexity.
>
> Some PMD cheks I had to disable were:
>
> - AccessorMethodGeneration
>
> This allows internal private classes to access private methods of the
> outer class and vice versa. It could instead be fixed by changing to
> package-private methods where appropriate.
>
> - OnlyOneReturn
>
> There are many code examples of fast exit from methods with multiple
> return statements.
>
> - BeanMembersShouldSerialize
>
> I do not think we intend to have the classes as Serializable.
>
> - DataflowAnomalyAnalysis
>
> This rule is not very reliable [1]. It does not like a lot of the
> algorithms in the code that are established.
>
>
> For one violation in sampling it can either be suppressed, or fixed by
> promoting a private class constructor to package-private. I think that
> the promotion to a package private constructor is OK. It is for this class:
>
> LargeMeanPoissonSampler.LargeMeanPoissonSamplerState
>
> This class is used by the LargeMeanPoissonSamplerCache and is already
> package-private. So making the constructor package private seems reasonable.
>
>
> The options to suppress violations [2] are:
>
> 1. Use annotations
>
> 2. Use // NOPMD comment at the end of the offending line
>
> 3. Add suppression to the pmd configuration xml.
>
> So not wanting to litter the code with comments and annotations I have
> updated the PMD xml to exclude certain checks.
>
> There does not appear to be a separate PMD exclusions file in the manner
> of spotbugs. The exclusions performed at the configuration file use
> regular expressions so can be configured. But it requires XPath and the
> syntax for regular expressions doesn't work with examples I have tried.
> I have fixed it with explicit 'or' statements for matching multiple
> classes. I have not found out how to match a class and a method in the
> same expression. This could be used to narrow the scope of exclusions.
>
>
> All changes are in this PR [3]. Have a look and see if you don't agree
> with the changes required.
>
>
> Alex
>
>
> [1] https://github.com/pmd/pmd/issues/873
>
> [2] https://pmd.github.io/latest/pmd_userdocs_suppressing_warnings.html
>
> [3] https://github.com/apache/commons-rng/pull/45
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] BigDecimalStatistics proposition

2019-05-21 Thread Gilles Sadowski

Hello.

Le mar. 21 mai 2019 à 00:20, Aleksander Ściborek
 a écrit :
>
> Hi
>
> >Hence my question: Do you have use-cases? (Your own, or reference
> >to other libraries that use types similar to "BigDecimal" in the way you
> >propose to be implemented here.)
>
> My idea is to use that API in similar way:
>
> https://pastebin.com/HGcKwv3V

This is a usage example.
What I meant is a "real-life" application.

>
> >Yes, but that "package" does not exist yet; hence we (you) have to
> >1. suggest a scope for a new maven module,
>
> My general proposition is to provide some new tools for functional way of
> processing data in order to generate some statistics(right now I think
> mostly about BigDecimal/BigInteger). The scope for now is not big but I
> believe it is quite useful.

This component is a WIP, but there should be some rough roadmap.
The guideline is that at some point, all the functionality in the
  org.apache.commons.math4.stat.descriptive
package (and sub-packages thereof) of "Commons Math" should be
provided by a new
  commons-statistics-descriptive
maven module of "Commons Statistics".

Perhaps that your proposed contribution fits that plan, but it is not
obvious to me.  For example, should there be a dedicated
  commons-statistics-descriptive-bigdecimal
module?  Or will the functionality be provided in a generic way?

IMHO, a worthy principle, to be applied for the new components, is to
avoid that they become a dump of numerous disparate little tools.

Maybe I'm missing what you are getting at.
Hopefully, someone else can comment (Eric?).

Regards,
Gilles

>
> On Fri, 17 May 2019 at 15:37, Gilles Sadowski  wrote:
>
> > Hi.
> >
> > Le ven. 17 mai 2019 à 15:13, Aleksander Ściborek
> >  a écrit :
> > >
> > > Hi,
> > >
> > > >How is the "null" consideration related to the functionality of
> > performing
> > > an average, and other operations?
> > >
> > > I just don't know why they haven't implement anything like that for
> > > BigDecimal or BigInteger, especially it's for my strange because
> > BigDecimal
> > > is a convenient type for financial calculations.
> >
> > Hence my question: Do you have use-cases? (Your own, or reference
> > to other libraries that use types similar to "BigDecimal" in the way you
> > propose to be implemented here.)
> >
> > >
> > > > Which "package"
> > >
> > > I was referring to your words:
> > >
> > > "I'm wary of dropping this in "Commons Statistics" without a broader view
> > > of the design of a package where it would perhaps fit with similar
> > > functionality"
> >
> > Yes, but that "package" does not exist yet; hence we (you) have to
> > 1. suggest a scope for a new maven module,
> > 2. show how your proposed code fit within the rest (even if it's not
> > implemented yet).
> >
> > >
> > > >How will a user get e.g. the variance too?
> > >
> > > I didn't plan that functionality
> >
> > But is it possible, in your plan?
> >
> > Regards,
> > Gilles
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng][geometry][statistics][numbers] Updating Checkstyle rules

2019-05-21 Thread Gilles Sadowski

Hi.

Thanks for the CheckStyle update.

I'd say that we enforce it for the "main" parts of the repositories and
be lenient for the existing unit tests (but try to follow convention for
new tests).

Regards,
Gilles


Le mar. 21 mai 2019 à 17:23, Alex Herbert  a écrit :
>
> On 21/05/2019 12:47, Alex Herbert wrote:
> > The checkstyle file for all these projects has a common origin (of
> > [math]?). Checkstyle has advanced since the origin of these checks and
> > there are many more checks that can be added to maintain the current
> > coding style.
> >
> > I've looked at this for RNG starting from a template for the Sun
> > standards [1].
> >
> > The code is maintained to high standard and the sun standards need
> > little modification. Here are the changes:
> >
> > - Removed FinalParameters
> >
> > - Removed MagicNumber
> >
> > - Removed InnerAssignment
> >
> > - Change line length to 120
> >
> > - Changed ParameterNumber to only check methods (not constructors)
> >
> > - Changed NoWhitespaceAfter to remove checks for array initialisers
> > allowing the whitespace after the { here:
> >
> > double[] array = new double[] { 1, 2, 3 };
> >
> > (arguably this could be left at the default Sun coding style to have
> > 'new double[] {1, 2, 3}'.)
> >
> > - Changed WhitespaceAround to allow empty constructors (for private
> > utility constructors) and empty types (for marker interfaces)
> >
> > - Changed VisibilityModifier to allow protected fields
> >
> > - Changed OperatorWrap to the current checkstyle config
> >
> >
> > When run over RNG this requires the following changes:
> >
> > Some method javadoc required a missing end period.
> >
> > Line length: Some wrapping to 120 characters is required.
> >
> > Whitespace after: Some updates to change declarations of generics, e.g.
> >
> > Map to Map
> >
> > 3 utility classes should be final. These may be a breaking API
> > changes. The classes have private constructor so should not be
> > inherited from by any user. Only one is public the other two are
> > package private.
> >
> > 1 TODO comment is as yet undone.
> >
> > Indentation: This picked up a few formatting errors.
> >
> >
> > All the changes are in a new PR [2] so you can view the new additions
> > to the checkstyle file and the changes to the code that must be made.
> >
> > I have just appended the additions to the current checkstyle config.
> > However going forward it may be better to match the order of the
> > reference checkstyle template provided by checkstyle and then add to
> > that for commons specific requirements.
> >
> >
> > Statistics changes:
> >
> > I tried this on statistics. There are 415 errors, 50 errors if the
> > test sources are excluded. Most of these in the source are genuine
> > formatting issues. The only rule that is broken is
> > LocalFinalVariableName which requires variables to be named
> > '^[a-z][a-zA-Z0-9]*$'.
> >
> > I do not agree with naming conventions when the code is implementing
> > an equation so I would either remove this rule or add checkstyle
> > exceptions file for the two methods where the rule is broken.
> >
> > See the modifications for statistics in this PR [3]. I've not fixed
> > all the tests. They are mainly failures due to indentation or
> > whitespace, e.g:
> >
> > observedCounts[s-1]++;
> > vs
> > observedCounts[s - 1]++;
> >
> >
> > Fixing them would be easy but I am out of time for today.
> >
> >
> > [1]
> > https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/sun_checks.xml
> >
> > [2] https://github.com/apache/commons-rng/pull/44
> >
> > [3] https://github.com/apache/commons-statistics/pull/10
> >
> >
> I've looked at [numbers] with the current config. It uses no indentation
> for case statements. This is recommended by Oracle. So I've updated the
> [rng] and [statistics] PRs to reflect this.
>
> I've not got time to fix numbers but the new checkstyle file finds a lot
> of legitimate problems with the current formatting.
>
> Old checks:
>
> [INFO] There are 6 errors reported by Checkstyle 8.20 with
> /home/ah403/git/commons-numbers/commons-numbers-core/../src/main/resources/checkstyle/checkstyle.xml
> ruleset.
> [INFO] There are 15 errors reported by Checkstyle 8.20 with
> /home/ah403/git/commons-numbers/commons-numbers-gamma/../src/main/resources/checkstyle/checkstyle.xml
> ruleset.
> [INF

Re: [Configuration] Optional Includes in Properties files

2019-05-20 Thread Gilles Sadowski

Le lun. 20 mai 2019 à 14:51, Gary Gregory  a écrit :
>
> Hi All:
>
> Right now, if you uses an 'include' in a properties file and that file is
> missing, the rest of the file does not load.

IMHO, it seems like a bug.

If the contents is required, failure should occur because of that
(later, according to code logic), not because the file is missing.

> I'd like to add a 'includesoptional' where nothing happens if the file is
> missing.
>
> Any objections or thoughts on a better name?

includeifexist
(?)

Gilles

>
> Gary

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [gsoc][statistics] Enable Travis

2019-05-18 Thread Gilles Sadowski

Hi.

> >
> >> On 17 May 2019, at 23:24, Gilles Sadowski  >> <mailto:gillese...@gmail.com>> wrote:
> >>
> >> Hi.
> >>
> >> Le ven. 17 mai 2019 à 18:33, Alex Herbert  >> <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>>
> >>> There is no .travis.yml file for Statistics and so PRs from GSoC mentees
> >>> are without checks.
> >>>
> >>> Plus I do not know how to check if the Travis integration is active for
> >>> this project.
> >>
> >> I filed a request at INFRA:
> >>  https://issues.apache.org/jira/browse/INFRA-18398 
> >> <https://issues.apache.org/jira/browse/INFRA-18398>
> >
> > OK. I didn’t know that was how it worked.
> >
> > I’ve pushed to my PR and Travis is working.
>
> Oh, no coverage info. Does coveralls require INFRA as well?
>

Seems so.  I filed:
  https://issues.apache.org/jira/browse/INFRA-18399

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [gsoc][statistics] Enable Travis

2019-05-17 Thread Gilles Sadowski

Hi.

Le ven. 17 mai 2019 à 18:33, Alex Herbert  a écrit :
>
> There is no .travis.yml file for Statistics and so PRs from GSoC mentees
> are without checks.
>
> Plus I do not know how to check if the Travis integration is active for
> this project.

I filed a request at INFRA:
  https://issues.apache.org/jira/browse/INFRA-18398

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-rng] 02/03: Fixed typo and updated deprecated generator.

2019-05-17 Thread Gilles Sadowski

Hello.

Le ven. 17 mai 2019 à 16:10, Alex Herbert  a écrit :
>
>
> On 17/05/2019 14:41, Gilles Sadowski wrote:
> > Hi Alex.
> >
> > Le ven. 17 mai 2019 à 15:33,  a écrit :
> >> This is an automated email from the ASF dual-hosted git repository.
> >>
> >> aherbert pushed a commit to branch master
> >> in repository https://gitbox.apache.org/repos/asf/commons-rng.git
> >>
> >> commit 5cf993a22bcedfe8c66dd9ae6536c1ee2db146ea
> >> Author: aherbert 
> >> AuthorDate: Fri May 17 14:31:40 2019 +0100
> >>
> >>  Fixed typo and updated deprecated generator.
> >> ---
> >>   .../commons/rng/examples/sampling/UniformSamplingVisualCheck.java | 
> >> 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git 
> >> a/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> >>  
> >> b/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> >> index 8875934..a912791 100644
> >> --- 
> >> a/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> >> +++ 
> >> b/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> >> @@ -26,14 +26,14 @@ import 
> >> org.apache.commons.rng.sampling.distribution.ContinuousSampler;
> >>
> >>   /**
> >>* Creates 2D plot of sampling output.
> >> - * It is a "manual" check that could help ensure that no artefacts
> >> + * It is a "manual" check that could help ensure that no artifacts
> > That was not a typo. ;-)
>
> Apparently 'Artefact' is the British spelling and 'Artifact' is the US
> spelling [1].

Indeed, I use British spelling. :-)

> So I do not know why my British configured IDE picked the
> original out as wrong.
>
> Anyway does commons use US spelling?

I don't know that there is such a rule.
[Why did the Americans change the spelling? ;-)]

> If so it should stay updated,
> otherwise I can revert.

Either is fine, I guess.

Regards,
Gilles

>
> https://en.oxforddictionaries.com/definition/artefact
>
> >
> >>* exist in some tiny region of the expected range, due to loss of
> >>* accuracy, e.g. when porting C code based on 32-bits "float" to
> >>* "Commons RNG" that uses Java "double" (64-bits).
> >>*/
> >>   public class UniformSamplingVisualCheck {
> >>   /** RNG. */
> >> -private final UniformRandomProvider rng = 
> >> RandomSource.create(RandomSource.XOR_SHIFT_1024_S);
> >> +private final UniformRandomProvider rng = 
> >> RandomSource.create(RandomSource.XOR_SHIFT_1024_S_PHI);
> >>   /** Samplers. */
> >>   private final ContinuousSampler[] samplers = new ContinuousSampler[] 
> >> {
> >>   new ZigguratNormalizedGaussianSampler(rng),
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-rng] 02/03: Fixed typo and updated deprecated generator.

2019-05-17 Thread Gilles Sadowski

Hi Alex.

Le ven. 17 mai 2019 à 15:33,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> aherbert pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-rng.git
>
> commit 5cf993a22bcedfe8c66dd9ae6536c1ee2db146ea
> Author: aherbert 
> AuthorDate: Fri May 17 14:31:40 2019 +0100
>
> Fixed typo and updated deprecated generator.
> ---
>  .../commons/rng/examples/sampling/UniformSamplingVisualCheck.java | 4 
> ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git 
> a/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
>  
> b/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> index 8875934..a912791 100644
> --- 
> a/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> +++ 
> b/commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheck.java
> @@ -26,14 +26,14 @@ import 
> org.apache.commons.rng.sampling.distribution.ContinuousSampler;
>
>  /**
>   * Creates 2D plot of sampling output.
> - * It is a "manual" check that could help ensure that no artefacts
> + * It is a "manual" check that could help ensure that no artifacts

That was not a typo. ;-)

>   * exist in some tiny region of the expected range, due to loss of
>   * accuracy, e.g. when porting C code based on 32-bits "float" to
>   * "Commons RNG" that uses Java "double" (64-bits).
>   */
>  public class UniformSamplingVisualCheck {
>  /** RNG. */
> -private final UniformRandomProvider rng = 
> RandomSource.create(RandomSource.XOR_SHIFT_1024_S);
> +private final UniformRandomProvider rng = 
> RandomSource.create(RandomSource.XOR_SHIFT_1024_S_PHI);
>  /** Samplers. */
>  private final ContinuousSampler[] samplers = new ContinuousSampler[] {
>  new ZigguratNormalizedGaussianSampler(rng),
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] BigDecimalStatistics proposition

2019-05-17 Thread Gilles Sadowski

Hi.

Le ven. 17 mai 2019 à 15:13, Aleksander Ściborek
 a écrit :
>
> Hi,
>
> >How is the "null" consideration related to the functionality of performing
> an average, and other operations?
>
> I just don't know why they haven't implement anything like that for
> BigDecimal or BigInteger, especially it's for my strange because BigDecimal
> is a convenient type for financial calculations.

Hence my question: Do you have use-cases? (Your own, or reference
to other libraries that use types similar to "BigDecimal" in the way you
propose to be implemented here.)

>
> > Which "package"
>
> I was referring to your words:
>
> "I'm wary of dropping this in "Commons Statistics" without a broader view
> of the design of a package where it would perhaps fit with similar
> functionality"

Yes, but that "package" does not exist yet; hence we (you) have to
1. suggest a scope for a new maven module,
2. show how your proposed code fit within the rest (even if it's not
implemented yet).

>
> >How will a user get e.g. the variance too?
>
> I didn't plan that functionality

But is it possible, in your plan?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] BigDecimalStatistics proposition

2019-05-17 Thread Gilles Sadowski

Hello.

Le ven. 17 mai 2019 à 14:40, Aleksander Ściborek
 a écrit :
>
> Hi, I don't know why there is not such a class in JDK. Maybe because
> BigDecimal and BigInteger are objects not primitives therefore they can
> have null value.

How is the "null" consideration related to the functionality of performing
an average, and other operations?

> I believe that that package

Which "package"?

> should contains BigDecimalStatistics,
> BigIntegerStatistics and tools for downstreams collectors in order to
> performe some downstream operations like averagingInt in JDK.

How will a user get e.g. the variance too?

Regards,
Gilles

> I've already started implementing BigDecimalStatistics on my fork but I
> haven't pushed those changes on my fork
>
>
> On Fri, 17 May 2019 at 14:19, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le ven. 17 mai 2019 à 12:14, Aleksander Ściborek
> >  a écrit :
> > >
> > > Hi,
> > > Right now I'm going to minic IntSummaryStatistics. Orginal idea had been
> > to
> > > create a BigDecimalAverager just for calculate an average in functional
> > > style, but after I saw IntSummaryStatistics from JDK i deciced to exdends
> > > functionality.
> >
> > I'm wary of dropping this in "Commons Statistics" without a broader view
> > of the design of a package where it would perhaps fit with similar
> > functionality
> > for other number types.
> > Could the class be generic?  If so, what would be the API required to
> > perform
> > the operations?  "Commons Numbers" has some suggestions[1][2]; those can,
> > and should, be adapted to actual use-cases, such as "streams" (preferably
> > before the first release).
> >
> > It would perhaps be helpful to know why there is no "BigDecimalStatistics"
> > in
> > the JDK.
> >
> > Regards,
> > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] BigDecimalStatistics proposition

2019-05-17 Thread Gilles Sadowski

Hello.

Le ven. 17 mai 2019 à 12:14, Aleksander Ściborek
 a écrit :
>
> Hi,
> Right now I'm going to minic IntSummaryStatistics. Orginal idea had been to
> create a BigDecimalAverager just for calculate an average in functional
> style, but after I saw IntSummaryStatistics from JDK i deciced to exdends
> functionality.

I'm wary of dropping this in "Commons Statistics" without a broader view
of the design of a package where it would perhaps fit with similar functionality
for other number types.
Could the class be generic?  If so, what would be the API required to perform
the operations?  "Commons Numbers" has some suggestions[1][2]; those can,
and should, be adapted to actual use-cases, such as "streams" (preferably
before the first release).

It would perhaps be helpful to know why there is no "BigDecimalStatistics" in
the JDK.

Regards,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-numbers.git;a=tree;f=commons-numbers-core/src/main/java/org/apache/commons/numbers/core
[2] 
https://gitbox.apache.org/repos/asf?p=commons-numbers.git;a=blob;f=commons-numbers-field/src/main/java/org/apache/commons/numbers/field/Field.java

>
> Regards, Aleksander
>
> On Fri, 17 May 2019 at 00:24, Gilles Sadowski  wrote:
>
> > Hi.
> >
> > Le jeu. 16 mai 2019 à 22:45, Aleksander Ściborek
> >  a écrit :
> > >
> > > Should I create a new Maven commons-statistics submodule for this?
> >
> > [If the current idea is put the functionality in "Commons Statistics", you
> > should change this thread's "Subject:" line.]
> >
> > Then, yes, there should be a new module.
> >
> > > Besides
> > > the BigDecimalStatistics I'm going to create support for downstream
> > > operators for BigDecimals and maybe BigIntegers.
> >
> > Is the goal to "only" mimic the JDK's "IntSummaryStatistics", or do you
> > have a specific use-case?
> > In the latter case, it will be worth considering how all the functionality
> > in
> > Commons Math's "o.a.c.math4.stat.descriptive" package[1] will be
> > supported.
> >
> > Regards,
> > Gilles
> >
> > [1]
> > https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat/descriptive
> >
> >
> >
> > > On Wed, 15 May 2019 at 03:36, Eric Barnhill 
> > wrote:
> > >
> > > > Yes. This sounds great for commons-statistics. Other work in a similar
> > vein
> > > > will be happening this summer by one of our GSOC mentees.
> > > >
> > > > On Tue, May 14, 2019, 15:04 Gary Gregory 
> > wrote:
> > > >
> > > > > We have a Commons Statistics component that might be a fit.
> > > > >
> > > > > Gary
> > > > >
> > > > > On Tue, May 14, 2019, 17:34 Aleksander Ściborek <
> > > > > aleksanderscibo...@gmail.com> wrote:
> > > > >
> > > > > > Hi, I've come up with the idea of making easier using Stream with
> > > > > > BigDecimal class.
> > > > > > The idea is to create BigDecimalStatistics class which provide a
> > > > > convenient
> > > > > > way for calculating max, min, average and sum from BigDecimals from
> > > > > Stream.
> > > > > > I think that it's very suitable for commons library.
> > > > > > Should it be implemented in commons lang or commons math? I believe
> > > > that
> > > > > > it's more suitable for commons lang
> > > > > > This is a link to Jira Ticket : LANG-1459
> > > > > > <https://issues.apache.org/jira/browse/LANG-1459>
> > > > > > Aleksander
> > > > > >
> > > > >
> > > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Lang] BigDecimalStatistics proposition

2019-05-16 Thread Gilles Sadowski

Hi.

Le jeu. 16 mai 2019 à 22:45, Aleksander Ściborek
 a écrit :
>
> Should I create a new Maven commons-statistics submodule for this?

[If the current idea is put the functionality in "Commons Statistics", you
should change this thread's "Subject:" line.]

Then, yes, there should be a new module.

> Besides
> the BigDecimalStatistics I'm going to create support for downstream
> operators for BigDecimals and maybe BigIntegers.

Is the goal to "only" mimic the JDK's "IntSummaryStatistics", or do you
have a specific use-case?
In the latter case, it will be worth considering how all the functionality in
Commons Math's "o.a.c.math4.stat.descriptive" package[1] will be
supported.

Regards,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat/descriptive



> On Wed, 15 May 2019 at 03:36, Eric Barnhill  wrote:
>
> > Yes. This sounds great for commons-statistics. Other work in a similar vein
> > will be happening this summer by one of our GSOC mentees.
> >
> > On Tue, May 14, 2019, 15:04 Gary Gregory  wrote:
> >
> > > We have a Commons Statistics component that might be a fit.
> > >
> > > Gary
> > >
> > > On Tue, May 14, 2019, 17:34 Aleksander Ściborek <
> > > aleksanderscibo...@gmail.com> wrote:
> > >
> > > > Hi, I've come up with the idea of making easier using Stream with
> > > > BigDecimal class.
> > > > The idea is to create BigDecimalStatistics class which provide a
> > > convenient
> > > > way for calculating max, min, average and sum from BigDecimals from
> > > Stream.
> > > > I think that it's very suitable for commons library.
> > > > Should it be implemented in commons lang or commons math? I believe
> > that
> > > > it's more suitable for commons lang
> > > > This is a link to Jira Ticket : LANG-1459
> > > > <https://issues.apache.org/jira/browse/LANG-1459>
> > > > Aleksander
> > > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] stress test results

2019-05-16 Thread Gilles Sadowski

Hi.

Le jeu. 16 mai 2019 à 16:04, Alex Herbert  a écrit :
>
>
>
> > On 16 May 2019, at 14:42, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> > Le jeu. 16 mai 2019 à 12:06, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> I have run the stress test using the new application. The new application 
> >> has two major changes over the previous application:
> >>
> >> 1. It detects the platform byte-order and sends the bits in the correct 
> >> order to be read by a C application
> >> 2. The bridge to TestU01 has been updated to use all the input int values, 
> >> previously it was using every other int value
> >>
> >> So we can expect differences from both test suites Dieharder and TestU01 
> >> BigCrush.
> >>
> >> For reference here are the old results (from the user guide, reordered to 
> >> the RandomSource enum order):
> >>
> >> RNG Dieharder   TestU01 (BigCrush)
> >> JDK 11, 12, 13  74, 72, 75
> >> WELL_512_A  0, 0, 0 7, 6, 6
> >> WELL_1024_A 0, 0, 0 4, 4, 5
> >> WELL_19937_A0, 0, 0 3, 2, 3
> >> WELL_19937_C0, 1, 0 2, 2, 3
> >> WELL_44497_A0, 0, 0 2, 3, 3
> >> WELL_44497_B0, 0, 0 2, 2, 2
> >> MT  0, 1, 0 3, 2, 2
> >> ISAAC   0, 0, 1 0, 1, 0
> >> SPLIT_MIX_640, 0, 0 2, 0, 0
> >> XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
> >> TWO_CMRES   1, 1, 1 0, 0, 1
> >> MT_64   0, 0, 1 3, 2, 3
> >> MWC_256 0, 0, 0 0, 0, 0
> >> KISS0, 0, 0 1, 2, 0
> >>
> >> Here are the new results:
> >>
> >> RNG Dieharder   TestU01 (BigCrush)
> >> JDK 4,4,4,4,4   74,72,74,73,74
> >> WELL_512_A  0,0,0,0,0   7,6,6,6,6
> >> WELL_1024_A 0,0,0,0,0   4,4,5,4,4
> >> WELL_19937_A0,1,0,0,1   3,3,2,2,2
> >> WELL_19937_C0,0,0,0,0   2,2,3,2,2
> >> WELL_44497_A0,0,0,0,0   2,2,2,2,3
> >> WELL_44497_B0,0,0,0,0   2,3,2,2,2
> >> MT  0,0,0,0,0   2,3,2,2,2
> >> ISAAC   0,0,0,0,0   0,1,2,0,0
> >> SPLIT_MIX_640,0,0,0,0   1,0,0,0,0
> >> XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0
> >> TWO_CMRES   2,2,2,2,2   4,3,3,5,4
> >> MT_64   0,0,0,0,0   2,3,2,2,2
> >> MWC_256 0,1,0,0,0   0,0,0,2,0
> >> KISS0,0,0,0,0   0,0,0,0,0
> >> XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0
> >> XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3
> >> XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0
> >> XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1
> >> XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0
> >> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0
> >> XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0
> >> XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0
> >> XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1
> >> XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2
> >> XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0
> >>
> >> (Note: All of the single fails except one under Dieharder are for the 
> >> flawed diehard_sums test. I include it here for direct comparison with old 
> >> results. I would recommend we strip this from the new results for the user 
> >> guide.)
> >>
> >> I ran them 3 times. Then because the results were different (mainly for 
> >> the JDK generator for Dieharder) I doubled checked everything and ran 
> >> another 2. Results are still the same. Dieharder is much better for the 
> >> JDK than previously. It systematically fails:
> >>
> >> diehard_opso:0
> >> diehard_oqso:0
> >> diehard_dna:0
> >> dab_bytedistrib:0
> >>
> >> The TWO_CMRES generator is now worse as it is systematically failing:
> >>
> >> diehard_oqso:0
> >> diehard_dna:0
> >>
> >> The results from BigCrush are similar for JDK and all the others except 
> >> TWO_CMRES. This is now failing a few more tests. It systematically fails:
>

Re: [rng] stress test results

2019-05-16 Thread Gilles Sadowski

DK trial 1
> dh_r_2_3 = Dieharder bit reversed for WELL_512_A trial 3
>
> I propose to:
>
> - Delete all the old results and add these new ones using a new directory 
> structure. All results can reside in a single directory.
> - Ignore for now the bit-reversed results.
> - Delete the old stress test code. The new code supersedes all functionality 
> of the old version.
> - Commit the new ‘results’ command when I have confirmed the APT table is 
> correctly generated.

+1

>
> Questions:
>
> 1. Do we stick to using 3 trials or update to 5 (because I have the results)?

+1

> 2. Do we remove the diehard_sums test result?
>
> I would recommend removing diehard_sums. It pollutes the results for most 
> generators with a spurious fail that should be ignored. So I think we should 
> ignore it.

+0 (as you wish)

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [GSoC] commons-gsoc Thursday meeting?

2019-05-16 Thread Gilles Sadowski

Hi Eric.

I won't be able to attend (but I've already provided comments on the ML).

Best,
Gilles

Le mar. 14 mai 2019 à 18:57, Rob Tompkins  a écrit :
>
>
> On 5/14/2019 12:47 PM, Eric Barnhill wrote:
> > Should we have another Slack meeting at the same time this Thursday, 5pm
> > UTC (9am California time)?
>
>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [GSoC][STATISTICS][Regression] Architecture Implementation Suggestions

2019-05-16 Thread Gilles Sadowski

 functionalitites, and on the other (in a *different* "maven
module"), all the
conversions that may be implemented for the convenience of users.

> while allowing multiple types of regression to be calculated via a universal 
> form….
> which could become a challenge once details are in order.
>
>
>
> So this is the current state of my plan, with your input, I will move to the 
> next steps, plan more details and start creating the software flowchart.
>
>
>
> Thank you in advance for any advice/suggestions,

To summarize, my main suggestion is to split this post in more
manageable chunks.

Regards,
Gilles

>
> -Ben Nguyen
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-11 Thread Gilles Sadowski

Le sam. 11 mai 2019 à 23:32, Alex Herbert  a écrit :
>
>
>
> > On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le ven. 10 mai 2019 à 15:53, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >>
> >> On 10/05/2019 14:27, Gilles Sadowski wrote:
> >>> Hi Alex.
> >>>
> >>> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
> >>> écrit :
> >>>> Can I get a review of the PR for RNG-101 please.
> >>> Thanks for this work!
> >>>
> >>> I didn't go into the details; however, I see many fields and methods like
> >>>   table1 ... table5
> >>>   fillTable1 ... fillTable5
> >>>   getTable1 ... getTable5
> >>> Wouldn't it be possible to use a 2D table:
> >>>   table[5][];
> >>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> >>> is necessary (where "tableIndex" runs from 0 to 4)?
> >>
> >> Yes. The design is based around using 5 tables as per the example code.
> >>
> >> The sample() method knows which table it needs so it can directly jump
> >> to the table in question. I'd have to look at the difference in speed
> >> when using a 2D table as you are adding another array access but
> >> reducing the number of possible method calls (although you still need a
> >> method call). Maybe this will be optimised out by the JVM.
> >>
> >> If the speed is not a factor then I'll rewrite it. Otherwise it's
> >> probably better done for speed as this is the entire point of the
> >> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> >> perfectly fair sampler).
> >>
> >> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> >> states using 3 tables of base 2^10 then you get a speed increase
> >> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> >> tables of base 2^15 does not make it much faster again.
> >>
> >> I'll have a rethink to see if I can make the design work for different
> >> base sizes.
> >
> > That could be an extension made easier with the 2D table, but
> > I quite agree that given the relatively minor speed improvement
> > to be expected, it is not the main reason; the rationale was just to
> > make the code a more compact and a little easier to grasp (IMHO).
> >
> > Gilles
>
> I’ve done a more extensive look at the implications of changing the 
> implementation of the algorithm. This tested using: 1D or 2D tables; 
> interfaced storage to dynamic table types; base 6 or base 10 for the 
> algorithm; and allowing the base to be chosen. Results are in the Jira 
> ticket. Basically 2D arrays are slower and supporting choices for the backing 
> storage or base of the algorithm is slower.
>
> To support the Poisson and Binomial samplers only requires 16-bit storage. So 
> a dedicated sampler using base 6 and short for the tables will be the best 
> compromise between storage space and speed. The base 10 sampler is faster but 
> takes about 9-10x more space in memory.
>
> Note I originally wrote the sampler to use only 16-bit storage. I then 
> modified it to use dynamic storage without measuring performance. And so I 
> made it slightly slower.
>
> The question is does the library even need to have a 32-bit storage 
> implementation? This would be used for a probability distribution with more 
> than 2^16 different possible samples. I think this would be an edge case. 
> Here the memory requirements will be in the tens of MB at a minimum but may 
> balloon to become much larger. In this case a different algorithm such as the 
> Alias method or a guide table is more memory stable as it only requires 12 
> bytes of storage per index, irrespective of the shape of the probability 
> distribution.
>
> If different implementations (of this algorithm) are added to the library 
> then the effect of using a sampler that dynamically chooses the storage space 
> and/or base for the algorithm is noticeable in the performance. In this case 
> these would be better served using a factory:
>
> class DiscreteProbabilitySamplerFactory {
> DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
> double[])
> }
>
> But if specifically targeting this algorithm it could be:
>
> class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
> DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
> double[], boolean useBase10)
> }
>
> Or someth

Re: [statistics] Mode function for Cauchy distribution

2019-05-11 Thread Gilles Sadowski

Hi.

Le ven. 10 mai 2019 à 14:45, Udit Arora  a écrit :
>
> I am not sure what to say.. I completely agree that most distributions have
> undefined statistical values. I dont really have any particular reason for
> adding mode in the interface like one mentioned by Sir Alex for mean and
> variance. Please let me know if I should go ahead..

If you don't see a reason, it's reason enough for not doing it. ;-)

Perhaps a more straightforward way to start contributing is to
browse the list of open issue issues; see e.g. the "Numbers"
project[1].  Help is most needed to progress towards a release,
because "Statistics", and others, depend on it.

Regards,
Gilles

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20NUMBERS%20AND%20status%20%3D%20Open

>
> On Fri, 10 May 2019, 2:15 am Alex Herbert,  wrote:
>
> >
> >
> > > On 9 May 2019, at 21:17, Eric Barnhill  wrote:
> > >
> > > Awesome!
> > >
> > > On Thu, May 9, 2019 at 10:44 AM Udit Arora 
> > wrote:
> > >
> > >> I will see what I can do. It will take some time, but I will get to know
> > >> more about the other distributions.
> > >>
> > >>
> > >> On Thu, 9 May 2019, 10:58 pm Eric Barnhill, 
> > >> wrote:
> > >>
> > >>> Udit, is it clear what to do here? Gilles recommends you propose some
> > >> edits
> > >>> to ContinuousDistribution instead, to return Mode and Median.
> > >>>
> > >>> But then, if an interface is altered, all the classes that implement
> > that
> > >>> interface need to have these functions added, so we hope you are up for
> > >> all
> > >>> that additional work. We can help you.
> >
> > I think it would be prudent to go through all the distributions and see
> > what is defined for each. Wikipedia has a helper table for all its
> > distributions containing:
> >
> > Mean
> > Median
> > Mode
> > Variance
> > Skewness
> > Ex. kurtosis
> > Entropy
> > Fisher Information
> >
> > If many are undefined then you are adding to an interface something not
> > generally supported.
> >
> > Currently the ContinuousDistribution interface only has the mean and the
> > variance. But note that these are used by the inverse cumulative
> > probability method in the base abstract class. Same goes for the
> > DiscreteDistribution.
> >
> > I am +0 for adding more methods. I don’t see a reason not to. But nor do I
> > see a need (within the library) to have them at the interface level if the
> > mode or median for example are not required in a generic way.
> >
> > >>>
> > >>> Last is the idea of accessor methods. if the method starts with get_()
> > >> then
> > >>> in principle this is just returning a field already present. But with
> > >> that
> > >>> in mind, I don't know why we already have a method name like getMean()
> > in
> > >>> this interface. We don't really know whether for a given distribution,
> > >> that
> > >>> would be a true accessor or need to be calculated. So I think all these
> > >>> method names should just be mean(), mode(), median(), etc.
> > >>>
> > >>> So sorry if this is blowing up into more work than you expected. It
> > often
> > >>> works that way! I certainly think these changes are worthwhile however.
> > >>>
> > >>>
> > >>>
> > >>> On Thu, May 9, 2019 at 7:17 AM Gilles Sadowski 
> > >>> wrote:
> > >>>
> > >>>> Hi Udit.
> > >>>>
> > >>>> Le jeu. 9 mai 2019 à 12:52, Udit Arora  a
> > >> écrit :
> > >>>>>
> > >>>>> I intend to add a mode function for the Cauchy Distribution. It is a
> > >>>> small
> > >>>>> addition which i thought might be helpful.
> > >>>>
> > >>>> How will it be helpful?  I.e. what would an application developer
> > >>>> be able to do, that he can't with the current code?
> > >>>>
> > >>>> You've surely noted that that the class you want to modify is but
> > >>>> one of the implementations of the interface "ContinuousDistribution".
> > >>>> So if you propose to change the API, the change should be done
> > >>>> at the interface level, and the appropriat

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Gilles Sadowski

Hi.

Le ven. 10 mai 2019 à 15:53, Alex Herbert  a écrit :
>
>
> On 10/05/2019 14:27, Gilles Sadowski wrote:
> > Hi Alex.
> >
> > Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
> > écrit :
> >> Can I get a review of the PR for RNG-101 please.
> > Thanks for this work!
> >
> > I didn't go into the details; however, I see many fields and methods like
> >table1 ... table5
> >fillTable1 ... fillTable5
> >getTable1 ... getTable5
> > Wouldn't it be possible to use a 2D table:
> >table[5][];
> > so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> > is necessary (where "tableIndex" runs from 0 to 4)?
>
> Yes. The design is based around using 5 tables as per the example code.
>
> The sample() method knows which table it needs so it can directly jump
> to the table in question. I'd have to look at the difference in speed
> when using a 2D table as you are adding another array access but
> reducing the number of possible method calls (although you still need a
> method call). Maybe this will be optimised out by the JVM.
>
> If the speed is not a factor then I'll rewrite it. Otherwise it's
> probably better done for speed as this is the entire point of the
> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> perfectly fair sampler).
>
> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> states using 3 tables of base 2^10 then you get a speed increase
> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> tables of base 2^15 does not make it much faster again.
>
> I'll have a rethink to see if I can make the design work for different
> base sizes.

That could be an extension made easier with the 2D table, but
I quite agree that given the relatively minor speed improvement
to be expected, it is not the main reason; the rationale was just to
make the code a more compact and a little easier to grasp (IMHO).

Gilles

>
> >
> > The diff for "DiscreteSamplersList.java" refers to
> > MarsagliaTsangWangBinomialSampler
> > but
> >MarsagliaTsangWangSmallMeanPoissonSampler
> > seems to be missing.
>
> Oops, I missed adding that back. I built the PR from code where I was
> testing lots of implementations.
>
> I've just added it back and it is still passing locally. Travis should
> see that too as I pushed the change to the PR.
>
> >
> > Regards,
> > Gilles
> >
> >> This is a new sampler based on the source code from the paper:
> >>
> >> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
> >> Fast Generation of Discrete Random Variables.
> >> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
> >>
> >> https://www.jstatsoft.org/article/view/v011i03
> >>
> >> The code has no explicit licence.
> >>
> >> The paper states:
> >>
> >> "We have provided C versions of the two methods described here, for
> >> inclusion in the “Browse
> >> files”section of the journal. ... You may then want to examine the
> >> components of the two files, for illumination
> >> or for extracting portions that might be usefully applied to your
> >> discrete distributions."
> >>
> >> So I assuming that it can be incorporated with little modification.
> >>
> >> The Java implementation has been rewritten to allow the storage to be
> >> optimised for the required size. The generation of the tables has been
> >> adapted appropriately and checks have been added on the input parameters
> >> to ensure the sampler does not generate exceptions once constructed (I
> >> found out the hard way that the original code was not entirely correct).
> >>
> >> Thanks.
> >>
> >> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Gilles Sadowski

Hi Alex.

Le ven. 10 mai 2019 à 13:57, Alex Herbert  a écrit :
>
> Can I get a review of the PR for RNG-101 please.

Thanks for this work!

I didn't go into the details; however, I see many fields and methods like
  table1 ... table5
  fillTable1 ... fillTable5
  getTable1 ... getTable5
Wouldn't it be possible to use a 2D table:
  table[5][];
so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
is necessary (where "tableIndex" runs from 0 to 4)?

The diff for "DiscreteSamplersList.java" refers to
   MarsagliaTsangWangBinomialSampler
but
  MarsagliaTsangWangSmallMeanPoissonSampler
seems to be missing.

Regards,
Gilles

> This is a new sampler based on the source code from the paper:
>
> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
> Fast Generation of Discrete Random Variables.
> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
>
> https://www.jstatsoft.org/article/view/v011i03
>
> The code has no explicit licence.
>
> The paper states:
>
> "We have provided C versions of the two methods described here, for
> inclusion in the “Browse
> files”section of the journal. ... You may then want to examine the
> components of the two files, for illumination
> or for extracting portions that might be usefully applied to your
> discrete distributions."
>
> So I assuming that it can be incorporated with little modification.
>
> The Java implementation has been rewritten to allow the storage to be
> optimised for the required size. The generation of the tables has been
> adapted appropriately and checks have been added on the input parameters
> to ensure the sampler does not generate exceptions once constructed (I
> found out the hard way that the original code was not entirely correct).
>
> Thanks.
>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Copying samplers

2019-05-09 Thread Gilles Sadowski

Le jeu. 9 mai 2019 à 17:00, Alex Herbert  a écrit :
>
>
> On 09/05/2019 15:39, Gilles Sadowski wrote:
> > Le jeu. 9 mai 2019 à 15:41, Alex Herbert  a écrit 
> > :
> >> On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:
> >>
> >>>
> >>>> On 4 May 2019, at 22:34, Gilles Sadowski  wrote:
> >>>>
> >>>> Hi.
> >>>>
> >>>> Le sam. 4 mai 2019 à 21:31, Alex Herbert  a
> >>> écrit :
> >>>>>
> >>>>>
> >>>>>> On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> >>>>>>
> >>>>>> Hello.
> >>>>>>
> >>>>>> Le ven. 3 mai 2019 à 16:57, Alex Herbert  >>> <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>>>>>> Most of the samplers in the library have very small states that are
> >>> easy
> >>>>>>> to compute. Some have computations that are more expensive, such as
> >>> the
> >>>>>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> >>>>>>>
> >>>>>>> However once the state is computed the only part of the state that
> >>>>>>> changes is the RNG. I would like to suggest a way to copy samplers as
> >>>>>>> something like:
> >>>>>>>
> >>>>>>> DiscreteSampler newInstance(UniformRandomProvider)
> >>>>>>>
> >>>>>>> The new instance would share all the private state of the first
> >>> sampler
> >>>>>>> except the RNG. This can be used for multi-threaded applications which
> >>>>>>> require a new sampler per thread but sample from the same
> >>> distribution.
> >>>>>>> A particular case in point is the as yet not integrated
> >>>>>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >>>>>>> "large" state [2] that takes a "long" time [3] to compute but is
> >>>>>>> effectively immutable. This could be shared across instances saving
> >>>>>>> memory for parallel application.
> >>>>>>>
> >>>>>>> A copy instance would be almost zero set-up time and provide
> >>> opportunity
> >>>>>>> for caching of commonly used samplers.
> >>>>>> The goal is sharing (immutable) state so it seems that the semantics is
> >>>>>> not "copy".
> >>>>>>
> >>>>>> Isn't it a "factory" that we are after?  E.g. something like:
> >>>>>> public final class CachedSamplingFactory {
> >>>>>>private static PoissonSamplerCache poisson = new
> >>> PoissonSamplerCache();
> >>>>>>public PoissonSampler createPoissonSampler(UniformRandomProvider
> >>>>>> rng, double mean) {
> >>>>>>if (!poisson.isCached(mean)) {
> >>>>>>poisson.createCache(mean); // Initialize (requires
> >>>>>> synchronization) ...
> >>>>>>}
> >>>>>>return new PoissonSampler(poisson.getCache(mean), rng); //
> >>>>>> Construct using pre-built state.
> >>>>>>}
> >>>>>> }
> >>>>>> [It may be overkill, more work, and less performant…]
> >>>>> But you need a factory for every class you want to share state for. And
> >>> the factory actually has to look in a cache. If you operate on an instance
> >>> then you get what you want. Another working version of the same sampler. 
> >>> It
> >>> would also be thread safe without synchronisation as long as the state is
> >>> immutable. The only mutable state is the passed in RNG.
> >>>> Agreed.  It was what I meant by the last sentence.
> >>>>
> >>>>>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> >>> interface (?).
> >>>>> I did think of extending DiscreteSampler with this functionality. Not
> >>> adding to the interface as it currently is ‘functional’ as it has only one
> >>> method. I think that should not change. Having thought about it a bit more
> >>> I like the idea of a new functional interface. Perhap

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski

Le jeu. 9 mai 2019 à 17:07, Alex Herbert  a écrit :
>
>
> On 09/05/2019 15:46, Gilles Sadowski wrote:
> > Le jeu. 9 mai 2019 à 14:30, Alex Herbert  a écrit 
> > :
> >>
> >>
> >>> On 9 May 2019, at 12:58, Gilles Sadowski  wrote:
> >>>
> >>> Hi.
> >>>
> >>> Le jeu. 9 mai 2019 à 13:31, Alex Herbert  >>> <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>>> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
> >>>> sequence [1] to create randomness. This is basically a linear increment 
> >>>> added to a sum that will eventually wrap (due to overflow) to restart at 
> >>>> the beginning. The MSWS paper recommends an increment with a high number 
> >>>> of different bits set in a random pattern across the 64-bit of the long. 
> >>>> The paper recommends using a permutation of 8 from the 16 hex digits for 
> >>>> the upper and lower 32-bits.
> >>>>
> >>>> The source code for the MSWS provides a routine that generates a 
> >>>> permutation. Unfortunately:
> >>>>
> >>>> - The code is GPL 3 so restricting it from use under the Apache licence 
> >>>> (without jumping through some hoops)
> >>>> - The algorithm is a simple rejection method that suffers from high 
> >>>> rejection probability when approaching 8 digits already chosen
> >>>>
> >>>> I have created an alternative faster implementation for use when seeding 
> >>>> the MSWS generator. However it may be a function to be reused in other 
> >>>> places.
> >>>>
> >>>> The question is where to put this utility function. It requires a source 
> >>>> of randomness to create the permutation. It has the following signature:
> >>>>
> >>>> /**
> >>>> * Creates an {@code int} containing a permutation of 8 hex digits chosen 
> >>>> from 16.
> >>>> *
> >>>> * @param rng Source of randomness.
> >>>> * @return Hex digit permutation.
> >>>> */
> >>>> public static int createIntHexPermutation(UniformRandomProvider rng);
> >>>>
> >>>> Likewise:
> >>>>
> >>>> /**
> >>>> * Creates a {@code long} containing a permutation of 8 hex digits chosen 
> >>>> from 16 in
> >>>> * the upper and lower 32-bits.
> >>>> *
> >>>> * @param rng Source of randomness.
> >>>> * @return Hex digit permutation.
> >>>> */
> >>>> public static long createLongHexPermutation(UniformRandomProvider rng);
> >>>>
> >>>> Options:
> >>>>
> >>>> - Put it as a package private function inside the MSWS generator to be 
> >>>> used only when creating this generator. Package private allows unit 
> >>>> testing the algorithm does provides the random permutation 16-choose-8
> >>>> - Put it as a helper function in org.apache.commons.rng.core.util
> >>> - In "SeedFactory" (?).
> >>>
> >>> For MSWS ("core" module), the increment would be an argument to the 
> >>> constructor
> >>> (allowing the user to shoot himself in the foot, like when passing a
> >>> bad seed), and
> >>> "RandomSource" ("simple" module) would offer to provide an instance
> >>> for which the
> >>> increment was computed according to the recommendation.
> >>
> >> OK. That makes it easier to build the reference implementation in Core as 
> >> it just matches the C reference code. I can add the seeding function to 
> >> SeedFactory in the Simple module. So if a user passes anything to be used 
> >> as the seed then it passes through unchanged (or converted). But if they 
> >> do not provide a seed then it should be generated appropriately.
> >>
> >> This means I should really get on with updating the RandomSourceInternal 
> >> and ProviderBuilder (RNG 75 [1]). It currently does not support creating 
> >> seeds based on the exact RandomSource. It just uses the native seed type 
> >> of the RandomSource. Here are the current use cases that should be handled:
> >>
> >> - MSWS recommends a seed with a permutation of hex digits.
> >> - XorShiRo family of generators all require seeds with at least some 
> >> non-

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski

Le jeu. 9 mai 2019 à 14:30, Alex Herbert  a écrit :
>
>
>
> > On 9 May 2019, at 12:58, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le jeu. 9 mai 2019 à 13:31, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
> >> sequence [1] to create randomness. This is basically a linear increment 
> >> added to a sum that will eventually wrap (due to overflow) to restart at 
> >> the beginning. The MSWS paper recommends an increment with a high number 
> >> of different bits set in a random pattern across the 64-bit of the long. 
> >> The paper recommends using a permutation of 8 from the 16 hex digits for 
> >> the upper and lower 32-bits.
> >>
> >> The source code for the MSWS provides a routine that generates a 
> >> permutation. Unfortunately:
> >>
> >> - The code is GPL 3 so restricting it from use under the Apache licence 
> >> (without jumping through some hoops)
> >> - The algorithm is a simple rejection method that suffers from high 
> >> rejection probability when approaching 8 digits already chosen
> >>
> >> I have created an alternative faster implementation for use when seeding 
> >> the MSWS generator. However it may be a function to be reused in other 
> >> places.
> >>
> >> The question is where to put this utility function. It requires a source 
> >> of randomness to create the permutation. It has the following signature:
> >>
> >> /**
> >> * Creates an {@code int} containing a permutation of 8 hex digits chosen 
> >> from 16.
> >> *
> >> * @param rng Source of randomness.
> >> * @return Hex digit permutation.
> >> */
> >> public static int createIntHexPermutation(UniformRandomProvider rng);
> >>
> >> Likewise:
> >>
> >> /**
> >> * Creates a {@code long} containing a permutation of 8 hex digits chosen 
> >> from 16 in
> >> * the upper and lower 32-bits.
> >> *
> >> * @param rng Source of randomness.
> >> * @return Hex digit permutation.
> >> */
> >> public static long createLongHexPermutation(UniformRandomProvider rng);
> >>
> >> Options:
> >>
> >> - Put it as a package private function inside the MSWS generator to be 
> >> used only when creating this generator. Package private allows unit 
> >> testing the algorithm does provides the random permutation 16-choose-8
> >> - Put it as a helper function in org.apache.commons.rng.core.util
> >
> > - In "SeedFactory" (?).
> >
> > For MSWS ("core" module), the increment would be an argument to the 
> > constructor
> > (allowing the user to shoot himself in the foot, like when passing a
> > bad seed), and
> > "RandomSource" ("simple" module) would offer to provide an instance
> > for which the
> > increment was computed according to the recommendation.
>
>
> OK. That makes it easier to build the reference implementation in Core as it 
> just matches the C reference code. I can add the seeding function to 
> SeedFactory in the Simple module. So if a user passes anything to be used as 
> the seed then it passes through unchanged (or converted). But if they do not 
> provide a seed then it should be generated appropriately.
>
> This means I should really get on with updating the RandomSourceInternal and 
> ProviderBuilder (RNG 75 [1]). It currently does not support creating seeds 
> based on the exact RandomSource. It just uses the native seed type of the 
> RandomSource. Here are the current use cases that should be handled:
>
> - MSWS recommends a seed with a permutation of hex digits.
> - XorShiRo family of generators all require seeds with at least some non-zero 
> elements.
>
> My idea was to target this part of the ProviderBuilder createSeed method:
>
> if (seed == null) {
> // Create a random seed of the appropriate native type.
>
> if (source.getSeed().equals(Integer.class)) {
> nativeSeed = SeedFactory.createInt();
> } else if (source.getSeed().equals(Long.class)) {
> nativeSeed = SeedFactory.createLong();
>
>
> To change it to:
>
> if (seed == null) {
> // Delegate to the source to create an appropriate seed (since it knows 
> best)
> return source.createSeed()

But IIUC, that would mean that the code for computing the seed
is in "core", not "simple" (where "SeedFactory" is defined).
My

Re: [rng] Copying samplers

2019-05-09 Thread Gilles Sadowski

Le jeu. 9 mai 2019 à 15:41, Alex Herbert  a écrit :
>
> On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:
>
> >
> >
> > > On 4 May 2019, at 22:34, Gilles Sadowski  wrote:
> > >
> > > Hi.
> > >
> > > Le sam. 4 mai 2019 à 21:31, Alex Herbert  a
> > écrit :
> > >>
> > >>
> > >>
> > >>> On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> > >>>
> > >>> Hello.
> > >>>
> > >>> Le ven. 3 mai 2019 à 16:57, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> > >>>>
> > >>>> Most of the samplers in the library have very small states that are
> > easy
> > >>>> to compute. Some have computations that are more expensive, such as
> > the
> > >>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> > >>>>
> > >>>> However once the state is computed the only part of the state that
> > >>>> changes is the RNG. I would like to suggest a way to copy samplers as
> > >>>> something like:
> > >>>>
> > >>>> DiscreteSampler newInstance(UniformRandomProvider)
> > >>>>
> > >>>> The new instance would share all the private state of the first
> > sampler
> > >>>> except the RNG. This can be used for multi-threaded applications which
> > >>>> require a new sampler per thread but sample from the same
> > distribution.
> > >>>>
> > >>>> A particular case in point is the as yet not integrated
> > >>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> > >>>> "large" state [2] that takes a "long" time [3] to compute but is
> > >>>> effectively immutable. This could be shared across instances saving
> > >>>> memory for parallel application.
> > >>>>
> > >>>> A copy instance would be almost zero set-up time and provide
> > opportunity
> > >>>> for caching of commonly used samplers.
> > >>>
> > >>> The goal is sharing (immutable) state so it seems that the semantics is
> > >>> not "copy".
> > >>>
> > >>> Isn't it a "factory" that we are after?  E.g. something like:
> > >>> public final class CachedSamplingFactory {
> > >>>   private static PoissonSamplerCache poisson = new
> > PoissonSamplerCache();
> > >>>
> > >>>   public PoissonSampler createPoissonSampler(UniformRandomProvider
> > >>> rng, double mean) {
> > >>>   if (!poisson.isCached(mean)) {
> > >>>   poisson.createCache(mean); // Initialize (requires
> > >>> synchronization) ...
> > >>>   }
> > >>>   return new PoissonSampler(poisson.getCache(mean), rng); //
> > >>> Construct using pre-built state.
> > >>>   }
> > >>> }
> > >>> [It may be overkill, more work, and less performant…]
> > >>
> > >> But you need a factory for every class you want to share state for. And
> > the factory actually has to look in a cache. If you operate on an instance
> > then you get what you want. Another working version of the same sampler. It
> > would also be thread safe without synchronisation as long as the state is
> > immutable. The only mutable state is the passed in RNG.
> > >
> > > Agreed.  It was what I meant by the last sentence.
> > >
> > >>>
> > >>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> > interface (?).
> > >>
> > >> I did think of extending DiscreteSampler with this functionality. Not
> > adding to the interface as it currently is ‘functional’ as it has only one
> > method. I think that should not change. Having thought about it a bit more
> > I like the idea of a new functional interface. Perhaps:
> > >>
> > >> interface DiscreteSamplerProvider {
> > >>DiscreteSampler create(UniformRandomProvider rng);
> > >> }
> > >>
> > >> Substitute ‘Provider’ for:
> > >>
> > >> - Generator
> > >> - Supplier (possible clash or alignment with Java 8 depending on the
> > way it is done)
> > >> - Factory (though the method is not static so I do not like th

Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Gilles Sadowski

Hi Udit.

Le jeu. 9 mai 2019 à 12:52, Udit Arora  a écrit :
>
> I intend to add a mode function for the Cauchy Distribution. It is a small
> addition which i thought might be helpful.

How will it be helpful?  I.e. what would an application developer
be able to do, that he can't with the current code?

You've surely noted that that the class you want to modify is but
one of the implementations of the interface "ContinuousDistribution".
So if you propose to change the API, the change should be done
at the interface level, and the appropriate computation performed, or
method overloads defined, for all implementations.

The "accessor" methods refer to fields that were set by the contructor;
e.g. for "CauchyDistribution", "median" and "scale".
In this case, it happens that "mode" has the same value as "median",
but does this warrant an additional method?

Regards,
Gilles

> Thanks

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski

Hi.

Le jeu. 9 mai 2019 à 13:31, Alex Herbert  a écrit :
>
> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
> sequence [1] to create randomness. This is basically a linear increment added 
> to a sum that will eventually wrap (due to overflow) to restart at the 
> beginning. The MSWS paper recommends an increment with a high number of 
> different bits set in a random pattern across the 64-bit of the long. The 
> paper recommends using a permutation of 8 from the 16 hex digits for the 
> upper and lower 32-bits.
>
> The source code for the MSWS provides a routine that generates a permutation. 
> Unfortunately:
>
> - The code is GPL 3 so restricting it from use under the Apache licence 
> (without jumping through some hoops)
> - The algorithm is a simple rejection method that suffers from high rejection 
> probability when approaching 8 digits already chosen
>
> I have created an alternative faster implementation for use when seeding the 
> MSWS generator. However it may be a function to be reused in other places.
>
> The question is where to put this utility function. It requires a source of 
> randomness to create the permutation. It has the following signature:
>
> /**
>  * Creates an {@code int} containing a permutation of 8 hex digits chosen 
> from 16.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
> public static int createIntHexPermutation(UniformRandomProvider rng);
>
> Likewise:
>
> /**
>  * Creates a {@code long} containing a permutation of 8 hex digits chosen 
> from 16 in
>  * the upper and lower 32-bits.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
> public static long createLongHexPermutation(UniformRandomProvider rng);
>
> Options:
>
> - Put it as a package private function inside the MSWS generator to be used 
> only when creating this generator. Package private allows unit testing the 
> algorithm does provides the random permutation 16-choose-8
> - Put it as a helper function in org.apache.commons.rng.core.util

- In "SeedFactory" (?).

For MSWS ("core" module), the increment would be an argument to the constructor
(allowing the user to shoot himself in the foot, like when passing a
bad seed), and
"RandomSource" ("simple" module) would offer to provide an instance
for which the
increment was computed according to the recommendation.

Regards,
Gilles

>
> Note that the function is an alternative to that used by the SplittableRandom 
> to create an increment for its own Weyl sequence. That uses a fast method 
> that is prone to weak randomness in potential output.
>
> If other methods will potentially be added to the helper class a more generic 
> name should be used. Possibilities are:
>
> PermutationUtils
> SequenceUtils
> IncrementUtils
> SeedUtils
>
> Given that the method is for seeding Weyl sequences then I am favouring 
> SeedUtils.
>
>
> [1] https://en.wikipedia.org/wiki/Weyl_sequence 
> <https://en.wikipedia.org/wiki/Weyl_sequence>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [STATISTICS][Regression][Linear Math] Is there any plan/anyone working on a new Linear Math module currently?

2019-05-09 Thread Gilles Sadowski

Hi.

Le mer. 8 mai 2019 à 23:59, Eric Barnhill  a écrit :
>
> It looks to me like the EJML library is the best choice for linear algebra

https://lessthanoptimal.github.io/Java-Matrix-Benchmark/runtime/2019_02_i53570/

> right now, is well supported, and we should not reinvent the wheel

+1

> unless
> we have the motivation and expertise to do so.

Quite unlikely to be done in time for it to be useful to the GSoC assignment.

>
> EJML is under the Apache 2.0 license which I read to mean we can use it in
> any derivative way we please so long as (and this would be true regardless
> if the license requires it IMO) we attribute the source.
>
> So as a default plan I would shade these libraries within the regression
> module,

+1

It may be prudent to delineate an interface between "Commons" and
the linear algebra functionalities providers (cf. list in the above link),
so that we can switch from one to another and analyze the impact of
doing so.

Regards,
Gilles

> with thanks and attribution to the EJML site and org.
>
>
> On Wed, May 8, 2019 at 2:49 PM Rob Tompkins  wrote:
>
> >
> >
> > > On May 8, 2019, at 4:37 PM, Ben Nguyen  wrote:
> > >
> > > Hello,
> > >
> > > The regression module will require a lot of linear math, specifically
> > matrix operations which I’ve heard is outdated. Are there any updates on
> > it’s development? Is this someone’s GSoC project? If not I could try to
> > help by attempting to start porting regression essential operations. But
> > the dependencies for the current library is vast so this would end up being
> > a large endeavor and I know I am not one to properly design a linear math
> > library, I only know the basics, it would probably become a mess. So if
> > there is no current development plan I fear I might have to start by using
> > the old library for now until linear’s development kicks in…. Is this okay?
> > >
> >
> > I suppose the question is: what is commons-numbers, and if a matrix is a
> > “number” or it is sufficiently different to warrant a separate component.
> >
> > It is worth noting that in there have been past arguments over additional
> > math components before we get 1.0 releases for the current ones in flight
> > (but I feel like the fastest route to any component’s 1.0 should take
> > priority).
> >
> > What are other folks’ thoughts here? I would think that linear algebra
> > would likely be a widely used library as it’s fairly fundamental to a
> > collection of machine learning algorithms as they are based in least
> > squares.
> >
> > -Rob
> >
> > > Thank you,
> > > Ben
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-rng] branch master updated: Update pom.xml

2019-05-08 Thread Gilles Sadowski

Le jeu. 9 mai 2019 à 00:25,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> aherbert pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-rng.git
>
>
> The following commit(s) were added to refs/heads/master by this push:
>  new 0996b4e  Update pom.xml
>  new 280f3ec  Merge pull request #42 from AbhishekSinghDhadwal/master
> 0996b4e is described below
>
> commit 0996b4e4292411d392e2aa86974323a58029
> Author: Abhishek Singh Dhadwal 
> <39513876+abhisheksinghdhad...@users.noreply.github.com>

Abhishek,

You might want to have your actual email address appear in the log...

Regards,
Gilles

> AuthorDate: Thu May 9 00:35:29 2019 +0530
>
> Update pom.xml
> ---
>  pom.xml | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/pom.xml b/pom.xml
> index cdac4f7..8d43d2a 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -74,6 +74,9 @@
>  
>Artem Barger
>  
> + 
> +  Abhishek Singh Dhadwal
> +
>
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics][numbers] set up develop branches?

2019-05-08 Thread Gilles Sadowski

Le mer. 8 mai 2019 à 18:38, Eric Barnhill  a écrit :
>
> Since it looks like we will have some development in these libraries this
> summer (whee!) I propose starting 'develop' branches for these libraries.

+1
For "Statistics", it makes sense since there might be some flux,
as alternative design are tried out.

Regards,
Gilles

> The mentees and others can then create feature branches off of develop, and
> submit pull requests for feature branches into develop. Then develop is
> merged into master periodically when all is clear. That is the typical
> GitHub cadence as I know it anyway. I am very used to this pattern and will
> happy to be the person making sure it happens.
>
> So, perhaps interested parties could vote, if it goes ahead I will write
> the ticket, then create the develop branches.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics]Log-Cauchy Distribution

2019-05-08 Thread Gilles Sadowski

Hi.

I see that a discussion about is still going on on GitHub[1]; thus,
I remind that API changes *must* be agreed on here.  [Please start
a new thread.]

Best,
Gilles

[1] https://github.com/apache/commons-statistics/pull/4#discussion_r282004202

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics]Log-Cauchy Distribution

2019-05-05 Thread Gilles Sadowski

Hi.

Le dim. 5 mai 2019 à 17:40, Udit Arora  a écrit :
>
> Sir
> I am not able to download apache.commons.

"Commons" contains many projects, each with its separate source
repository.

> I even tried to change the mirror

Instructions for developers are here (for "Commons Statisitics"):
  http://commons.apache.org/proper/commons-statistics/scm.html

> but to no avail. Also my familiarity with R is less. So I am not completely
> sure how to make the test cases for this distribution.

You could look for other source of comparisons; e.g.
https://keisan.casio.com/menu/system/0540

Gilles


> Thanks
>
> On Fri, May 3, 2019 at 7:26 PM Udit Arora  wrote:
>
> > Ok sir..
> > Thanks
> >
> > On Fri, 3 May 2019, 6:23 pm Gilles Sadowski,  wrote:
> >
> >> Hello.
> >>
> >> Le jeu. 2 mai 2019 à 19:34, Udit Arora  a écrit :
> >> >
> >> > This is a new discussion for making a Log-Cauchy Distribution.
> >> > I just want to add a new distribution to the already existing
> >> distribution
> >> > list. Just like Cauchy Distribution I intend to include CDF, PDF and
> >> some
> >> > other functions.
> >> > Please let me know if I should go ahead with this idea.
> >>
> >> Sure!
> >> For a new implementation, you should provide reference(s) and
> >> unit tests (based on those that exist for the other implementations,
> >> preferably reaching for full coverage).
> >>
> >> Thanks,
> >> Gilles
> >>
> >> > Thanks
> >> > Udit Arora
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >> For additional commands, e-mail: dev-h...@commons.apache.org
> >>
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Copying samplers

2019-05-04 Thread Gilles Sadowski

Hi.

Le sam. 4 mai 2019 à 21:31, Alex Herbert  a écrit :
>
>
>
> > On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> > Le ven. 3 mai 2019 à 16:57, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> Most of the samplers in the library have very small states that are easy
> >> to compute. Some have computations that are more expensive, such as the
> >> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> >>
> >> However once the state is computed the only part of the state that
> >> changes is the RNG. I would like to suggest a way to copy samplers as
> >> something like:
> >>
> >> DiscreteSampler newInstance(UniformRandomProvider)
> >>
> >> The new instance would share all the private state of the first sampler
> >> except the RNG. This can be used for multi-threaded applications which
> >> require a new sampler per thread but sample from the same distribution.
> >>
> >> A particular case in point is the as yet not integrated
> >> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >> "large" state [2] that takes a "long" time [3] to compute but is
> >> effectively immutable. This could be shared across instances saving
> >> memory for parallel application.
> >>
> >> A copy instance would be almost zero set-up time and provide opportunity
> >> for caching of commonly used samplers.
> >
> > The goal is sharing (immutable) state so it seems that the semantics is
> > not "copy".
> >
> > Isn't it a "factory" that we are after?  E.g. something like:
> > public final class CachedSamplingFactory {
> >private static PoissonSamplerCache poisson = new PoissonSamplerCache();
> >
> >public PoissonSampler createPoissonSampler(UniformRandomProvider
> > rng, double mean) {
> >if (!poisson.isCached(mean)) {
> >poisson.createCache(mean); // Initialize (requires
> > synchronization) ...
> >}
> >return new PoissonSampler(poisson.getCache(mean), rng); //
> > Construct using pre-built state.
> >}
> > }
> > [It may be overkill, more work, and less performant…]
>
> But you need a factory for every class you want to share state for. And the 
> factory actually has to look in a cache. If you operate on an instance then 
> you get what you want. Another working version of the same sampler. It would 
> also be thread safe without synchronisation as long as the state is 
> immutable. The only mutable state is the passed in RNG.

Agreed.  It was what I meant by the last sentence.

> >
> > IIUC, you suggest to add "newInstance" in the "DiscreatSampler" interface 
> > (?).
>
> I did think of extending DiscreteSampler with this functionality. Not adding 
> to the interface as it currently is ‘functional’ as it has only one method. I 
> think that should not change. Having thought about it a bit more I like the 
> idea of a new functional interface. Perhaps:
>
> interface DiscreteSamplerProvider {
> DiscreteSampler create(UniformRandomProvider rng);
> }
>
> Substitute ‘Provider’ for:
>
> - Generator
> - Supplier (possible clash or alignment with Java 8 depending on the way it 
> is done)
> - Factory (though the method is not static so I do not like this)
> - etc
>
> So this then becomes a functional interface that can be used by anything. 
> However instances of a sampler would be expected to return a sampler matching 
> their own functionality.
>
> Note there are some samplers not implementing an interface that also could 
> benefit from this. Namely CollectionSampler and 
> DiscreteProbabilityCollectionSampler. So does this need a generic interface:
>
> Sampler {
> T sample();
> }
>
> To be complimented with:
>
> SamplerProvider {
> Sampler create(UniformRandomProvider rng);
> }
>
> So the library would require:
>
> SamplerProvider
> DiscreteSamplerProvider
> ContinuousSamplerProvider
>
> Any sampler can choose to implement being a Provider. There are some cases 
> where it is mute. For example a ZigguratNormalizedGaussianSampler just stores 
> the rng in the constructor. However it could still be a Provider just the 
> method would only call the constructor. It would allow writing a generic 
> multi-threaded application that just uses e.g. a DiscreteSamplerProvider to 
> create samplers for each thread. You can then drop in the actual 
> implementation you require. For example you could swap the

Re: [rng] Copying samplers

2019-05-04 Thread Gilles Sadowski

Hello.

Le ven. 3 mai 2019 à 16:57, Alex Herbert  a écrit :
>
> Most of the samplers in the library have very small states that are easy
> to compute. Some have computations that are more expensive, such as the
> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
>
> However once the state is computed the only part of the state that
> changes is the RNG. I would like to suggest a way to copy samplers as
> something like:
>
> DiscreteSampler newInstance(UniformRandomProvider)
>
> The new instance would share all the private state of the first sampler
> except the RNG. This can be used for multi-threaded applications which
> require a new sampler per thread but sample from the same distribution.
>
> A particular case in point is the as yet not integrated
> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> "large" state [2] that takes a "long" time [3] to compute but is
> effectively immutable. This could be shared across instances saving
> memory for parallel application.
>
> A copy instance would be almost zero set-up time and provide opportunity
> for caching of commonly used samplers.

The goal is sharing (immutable) state so it seems that the semantics is
not "copy".

Isn't it a "factory" that we are after?  E.g. something like:
public final class CachedSamplingFactory {
private static PoissonSamplerCache poisson = new PoissonSamplerCache();

public PoissonSampler createPoissonSampler(UniformRandomProvider
rng, double mean) {
if (!poisson.isCached(mean)) {
poisson.createCache(mean); // Initialize (requires
synchronization) ...
}
return new PoissonSampler(poisson.getCache(mean), rng); //
Construct using pre-built state.
}
}
[It may be overkill, more work, and less performant...]

IIUC, you suggest to add "newInstance" in the "DiscreatSampler" interface (?).
I'm a bit wary that this would compound two different functionalities:
  * data generator (method "sample"),
  * generator generator (method "newInstance").
[But I currently don't have an example where this would be a problem.]

Regards,
Gilles

> Alex
>
> [1] https://issues.apache.org/jira/browse/RNG-91
>
> [2] kB, or possibly MB, of tabulated data
>
> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 times
> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
> however that construction still takes only 1.1 and 4.5 microseconds for
> the "long" time.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics]Log-Cauchy Distribution

2019-05-03 Thread Gilles Sadowski

Hello.

Le jeu. 2 mai 2019 à 19:34, Udit Arora  a écrit :
>
> This is a new discussion for making a Log-Cauchy Distribution.
> I just want to add a new distribution to the already existing distribution
> list. Just like Cauchy Distribution I intend to include CDF, PDF and some
> other functions.
> Please let me know if I should go ahead with this idea.

Sure!
For a new implementation, you should provide reference(s) and
unit tests (based on those that exist for the other implementations,
preferably reaching for full coverage).

Thanks,
Gilles

> Thanks
> Udit Arora

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-05-03 Thread Gilles Sadowski

Hello.

Le jeu. 2 mai 2019 à 23:49, Alex Herbert  a écrit :
>
>
>
> > On 1 May 2019, at 23:15, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> >>> [...]
> >>
> >> So do we do:
> >>
> >> UniformRandomProvider restrict(JumpableUniformRandomProvider);
> >> JumpableUniformRandomProvider restrict(LongJumpableUniformRandomProvider);
> >> UniformRandomProvider restrict(RestorableUniformRandomProvider);
> >>
> >> Or:
> >>
> >> UniformRandomProvider unjumpable(JumpableUniformRandomProvider);
> >> JumpableUniformRandomProvider 
> >> unlongJumpable(LongJumpableUniformRandomProvider);
> >
> > I'm a bit hesitant on the spelling…
>
> Do you mean unlongJumpable vs unLongJumpable vs unlongjumpable? In that 
> regard I was maintaining the likeness to unrestorable, but since there are 
> two words after 'un' I put the second with camelcase.

I had noticed that the consistent name would be "unlongJumpable" but
"unlong" just looks like a typo. :-{

>
> Or just the entire method name? I don’t like it much but its function is 
> clear: allow access to jump() but not longJump().

Shall we just leave out those convenience methods until there is an
explicit need?  As discussed previously, it doesn't seem to me that
the added safety is not as useful as for "unrestorable".

Regards,
Gilles

>
> >
> >> UniformRandomProvider unrestorable(RestorableUniformRandomProvider);
> >>
> >> The later option only adds two new methods. The first has 3 new methods 
> >> (deprecating unrestorable with restrict) but suffers from having to cast 
> >> instances of multiple interfaces to ensure the correct restrict is called.
> >
> > Oops indeed.
> > This is too error-prone.
> >
> >> So this makes me favour the verbosely named option.
> >
> > +1
> >
> > Regards,
> > Gilles
> >
> >>
> >> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Help with GitHub "support"

2019-05-02 Thread Gilles Sadowski

Thank you Bruno and Alex!

Best,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[All] Help with GitHub "support"

2019-05-02 Thread Gilles Sadowski

Hi.

Some people are providing PRs[1] on GitHub without engaging with
us, here, or on JIRA.
When this happens for codes[2] which I'm the assumed reviewer,[3]
I'd need help from someone, with a GitHub account, who would post
a comment there, in order to let the "outside" contributors know that
we won't apply PRs without tracking information (JIRA ticket and/or
post on "dev"), as per the "contributions guidelines".[4]

Thanks,
Gilles

[1] Last examples:
https://github.com/apache/commons-math/pull/105
https://github.com/apache/commons-statistics/pull/4
[2] "RNG", "Numbers", "Statistics"
[3] Unless someone else is willing to engage in reviewing the
proposal on GitHub, and perform the merge.
[4] http://commons.apache.org/patches.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-05-01 Thread Gilles Sadowski

Hi.

> > [...]
>
> So do we do:
>
> UniformRandomProvider restrict(JumpableUniformRandomProvider);
> JumpableUniformRandomProvider restrict(LongJumpableUniformRandomProvider);
> UniformRandomProvider restrict(RestorableUniformRandomProvider);
>
> Or:
>
> UniformRandomProvider unjumpable(JumpableUniformRandomProvider);
> JumpableUniformRandomProvider 
> unlongJumpable(LongJumpableUniformRandomProvider);

I'm a bit hesitant on the spelling...

> UniformRandomProvider unrestorable(RestorableUniformRandomProvider);
>
> The later option only adds two new methods. The first has 3 new methods 
> (deprecating unrestorable with restrict) but suffers from having to cast 
> instances of multiple interfaces to ensure the correct restrict is called.

Oops indeed.
This is too error-prone.

> So this makes me favour the verbosely named option.

+1

Regards,
Gilles

>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-05-01 Thread Gilles Sadowski

Hi.

Le mar. 30 avr. 2019 à 17:08, Alex Herbert  a écrit :
>
> On 29/04/2019 22:14, Gilles Sadowski wrote:
> > Hello.
> >
> > Le lun. 29 avr. 2019 à 19:09, Alex Herbert  a 
> > écrit :
> >> On 28/04/2019 19:11, Gilles Sadowski wrote:
> >>> Le dim. 28 avr. 2019 à 17:02, Alex Herbert  a 
> >>> écrit :
> >>>>
> >>>>> On 28 Apr 2019, at 00:59, Bernd Eckenfels  
> >>>>> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> Just a question, I am unclear on the terminology, is „jump“ (did I miss 
> >>>>> the discussion leading toot?) something invented here? It sounds to me 
> >>>>> like this is a generator where the state can be cloned and it is 
> >>>>> „seekable“. It probably makes sense to have those two dimensions 
> >>>>> separated anyway.
> >>>> Hi Bernd, thanks for the input.
> >>>>
> >>>> This thread started with the definition:
> >>>> Jump:
> >>>>
> >>>> To create a new instance of the generator that is deterministically 
> >>>> based on the state of the current instance but advanced a set number of 
> >>>> iterations.
> >>>>
> >>>>
> >>>> However it is not required to create a new instance at the same time as 
> >>>> jumping. You are correct in that this is two functionalities:
> >>>>
> >>>> 1. Jump forward in the sequence
> >>>> 2. Copy
> >>>>
> >>>> However the two are coupled. Having jump on its own is useless (why move 
> >>>> forward in the sequence without using it?). So a copy needs to be 
> >>>> created somewhere before/after the jump.
> >>>>
> >>>> The idea of a jump is to create a series of the generator at different 
> >>>> points in the state. The generators can be used for parallel 
> >>>> computations and will be ensured to not overlap in their output sequence 
> >>>> for number of outputs skipped by the jump length.
> >>>>
> >>>> FYI. The generators that support this have jump sizes of 2^64, 96, 128, 
> >>>> 192, 256 and 512. So this is a lot of output sequence to jump.
> >>>>
> >>>> Copy on its own works but for what purpose? If you want a second 
> >>>> generator at the moment you just create a new one (with a different 
> >>>> seed). Duplicate copies of generators is prone to potential pitfalls 
> >>>> where simulations are not as random as you intend. For a special use 
> >>>> case where you wish to run multiple simulations with the same generator 
> >>>> you can use the Restorable interface to save the state of one and 
> >>>> re-create it in other instances.
> >>>>
> >>>> The current thread came to the choice of:
> >>>>
> >>>>>>> So the options are (in all cases returning the copy):
> >>>>>>>
> >>>>>>> 1. createAndJumpCopy
> >>>>>>> 2. copyAndJumpParent
> >>>>>>> 3. jumpParentAndCopy
> >>>>>>> 4. jump and copy separately
> >>>> Jump and copy separately was ruled out to discourage misuse of copy.
> >>>>
> >>>> The current suggestion is 1. Create a copy and jump that ahead. The 
> >>>> current instance is not affected.
> >>>>
> >>>> I now consider this to be weaker for a variety of use cases than 2. This 
> >>>> copies the current state for use and then jumps the parent ahead. So 
> >>>> this alters the state of the parent generator.
> >>>>
> >>>> Note that all other methods of a generator alter its state. So having 
> >>>> jump alter its state is reasonable.
> >>>>
> >>>> The most flexible API is to separate jump and copy into two methods. We 
> >>>> can still support helper functions that take in a Jumpable generator and 
> >>>> create a jump series of generators for parallel work. Separating jump 
> >>>> and copy allows the functionality to be used in a larger number of ways 
> >>>> than any other interface that attempts to combine jump and copy.
> >>>>
> >>>> I am fine with having separate jump and copy. If so the copy method, 
> >>&g

Re: [rng] Split and Jump functions

2019-04-29 Thread Gilles Sadowski

Hello.

Le lun. 29 avr. 2019 à 19:09, Alex Herbert  a écrit :
>
> On 28/04/2019 19:11, Gilles Sadowski wrote:
> > Le dim. 28 avr. 2019 à 17:02, Alex Herbert  a 
> > écrit :
> >>
> >>
> >>> On 28 Apr 2019, at 00:59, Bernd Eckenfels  wrote:
> >>>
> >>> Hello,
> >>>
> >>> Just a question, I am unclear on the terminology, is „jump“ (did I miss 
> >>> the discussion leading toot?) something invented here? It sounds to me 
> >>> like this is a generator where the state can be cloned and it is 
> >>> „seekable“. It probably makes sense to have those two dimensions 
> >>> separated anyway.
> >> Hi Bernd, thanks for the input.
> >>
> >> This thread started with the definition:
> >> Jump:
> >>
> >> To create a new instance of the generator that is deterministically based 
> >> on the state of the current instance but advanced a set number of 
> >> iterations.
> >>
> >>
> >> However it is not required to create a new instance at the same time as 
> >> jumping. You are correct in that this is two functionalities:
> >>
> >> 1. Jump forward in the sequence
> >> 2. Copy
> >>
> >> However the two are coupled. Having jump on its own is useless (why move 
> >> forward in the sequence without using it?). So a copy needs to be created 
> >> somewhere before/after the jump.
> >>
> >> The idea of a jump is to create a series of the generator at different 
> >> points in the state. The generators can be used for parallel computations 
> >> and will be ensured to not overlap in their output sequence for number of 
> >> outputs skipped by the jump length.
> >>
> >> FYI. The generators that support this have jump sizes of 2^64, 96, 128, 
> >> 192, 256 and 512. So this is a lot of output sequence to jump.
> >>
> >> Copy on its own works but for what purpose? If you want a second generator 
> >> at the moment you just create a new one (with a different seed). Duplicate 
> >> copies of generators is prone to potential pitfalls where simulations are 
> >> not as random as you intend. For a special use case where you wish to run 
> >> multiple simulations with the same generator you can use the Restorable 
> >> interface to save the state of one and re-create it in other instances.
> >>
> >> The current thread came to the choice of:
> >>
> >>>>> So the options are (in all cases returning the copy):
> >>>>>
> >>>>> 1. createAndJumpCopy
> >>>>> 2. copyAndJumpParent
> >>>>> 3. jumpParentAndCopy
> >>>>> 4. jump and copy separately
> >> Jump and copy separately was ruled out to discourage misuse of copy.
> >>
> >> The current suggestion is 1. Create a copy and jump that ahead. The 
> >> current instance is not affected.
> >>
> >> I now consider this to be weaker for a variety of use cases than 2. This 
> >> copies the current state for use and then jumps the parent ahead. So this 
> >> alters the state of the parent generator.
> >>
> >> Note that all other methods of a generator alter its state. So having jump 
> >> alter its state is reasonable.
> >>
> >> The most flexible API is to separate jump and copy into two methods. We 
> >> can still support helper functions that take in a Jumpable generator and 
> >> create a jump series of generators for parallel work. Separating jump and 
> >> copy allows the functionality to be used in a larger number of ways than 
> >> any other interface that attempts to combine jump and copy.
> >>
> >> I am fine with having separate jump and copy. If so the copy method, being 
> >> part of the Jumpable interface, will be functionally coupled with the jump 
> >> method and should be described in Javadoc with the intended purpose to use 
> >> it to copy the parent state either before or after a jump into a child 
> >> generator.
> >>
> >> As a precursor this API is very flexible:
> >>
> >> JumpableUniformRandomProvider extends UniformRandomProvider {
> >>  /** Jump and return same instance. */
> >>  JumpableUniformRandomProvider jump();
> >>  /** Copy the instance. */
> >>  JumpableUniformRandomProvider copy();
> >> }
> >>
> >> Returning the same instance in jump() allows method chaining such as 
> >> either:

Re: [rng] Split and Jump functions

2019-04-28 Thread Gilles Sadowski

> state. */
> JumpableUniformRandomProvider jump();
> }
>
> JumpableUniformRandomProvider extends UniformRandomProvider {
> /** Copy the instance, then jump the copy ahead. Return the copy. The 
> current instance is not affected. */
> JumpableUniformRandomProvider jump();
> }
>
> So the split functions without allowing method chaining:
>
> JumpableUniformRandomProvider extends UniformRandomProvider {
> /** Jump the current instance ahead. */
> void jump();
> /** Copy the instance. This is intended to be used either before or after 
> a call to jump()
>  * to create a series of generators. */
> JumpableUniformRandomProvider copy();
> }

As you indicated above, there is no advantage in having separate
"jump()" and "copy()",
as counter-intuitive it may look at first sight.

Regards,
Gilles

>
> WDYT?
>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-28 Thread Gilles Sadowski

Hi.

>
> Just a question, I am unclear on the terminology, is „jump“ (did I miss the 
> discussion leading toot?) something invented here?

Not invented here: It's a functionality that exist for some RNG algorithms.

> It sounds to me like this is a generator where the state can be cloned and it 
> is „seekable“. It probably makes sense to have those two dimensions separated 
> anyway.
>
> Gruss
> Bernd
>
>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-27 Thread Gilles Sadowski

Hello.

>
>
> > On 27 Apr 2019, at 14:49, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le sam. 27 avr. 2019 à 15:05, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> I have created RNG-97 and RNG-98 for Jump and LongJump.
> >>
> >> Please take a look and comment.
> >>
> >> The documentation highlights the implementation detail that a jump or long 
> >> jump creates a copy that is far ahead. The original generator is not 
> >> effected.
> >>
> >> The use case is thus:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng2.jump();
> >> rng4 = rng3.jump();
> >>
> >> As opposed to:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng1.jump();
> >> rng4 = rng1.jump();
> >>
> >> Where rng1 will be advanced each time leaving behind a copy generator.
> >>
> >> In either case it will be an overlap problem if any of the children are 
> >> then used for jumping. So as long as the documentation is clear then this 
> >> is OK. The helper method to create a jump series (or long jump series) in 
> >> RandomSource seems the best way to avoid incorrect usage.
> >
> > +1
> >
> > I think that the default should be to prevent a "jump" on the returned
> > instances.
> > An overload could be defined with a parameter (e.g. "allowFurtherJump") but 
> > I'd
> > leave it out until it is requested based on an actual use-case.
>
> I presume you are talking about the helper method in RandomSource.
>
> However it does open the possibility instead of this:
>
> JumpableUniformRandomProvider {
> UniformRandomProvider jump();
> }
>
> This only works if the state is modified for the current instance to allow 
> chaining jumps.
>
> Having typed all this up into a summary for the two tickets I feel that they 
> implement the idea in the wrong way. I think the jump should advance the 
> state of the current generator. This is the master generator created and used 
> in the high level code that controls the number of jumps that are required. 
> The returned copy should be a copy of where the generator was. The copy 
> should not be used for further jumps. In this way the interface for jump 
> could be made to return a UniformRandomProvider.
>
> When done like that the jumpable RNG is the only thing you need to hold a 
> reference to. And you can later decide (perhaps dynamically) if you need to 
> do some more jumps to get another series. Each call to jump moves the master 
> along and leaves behind a RNG that can be used for a set number of cycles 
> (the jump length). So you can do:
>
> JumpableUniformRandomProvider rng = …;
>
> UniformRandomProvider[] series1 = RandomSource.createJumpSeries(rng);
> // Do work with series1 and then maybe
> UniformRandomProvider[] series2 = RandomSource.createJumpSeries(rng);
> // Do work with series2, etc
> UniformRandomProvider[] series3 = RandomSource.createJumpSeries(rng);
>
> Or
>
> JumpableUniformRandomProvider masterRng = …;
>
> ExecutorService executor = Executors.newCachedThreadPool();
> ArrayList> futures = new ArrayList<>();
> for (Input input : inputs) {
> final UniformRandomProvider rng = masterRng.jump();
> futures.add(executor.submit(new Callable() {
> // Do something random with rng, then
> return new Result(...);
> }));
> }
>
> The later example uses ‘inputs’ as something where perhaps the size is not 
> known such as an Iterable or likewise in Java 8 it could be written to 
> consume a Stream.

That's a convincing example!

> Similarly the LongJumpableUniformRandomProvider interface can return a 
> JumpableUniformRandomProvider so preventing the result from being used for 
> another long jump but it can be used for (short) jumps.
>
> Have a think on use cases but my feeling is that the interface is more 
> powerful if you do advance the state and leave copies behind, rather than 
> creating future copies which must be chained together to create a series.

OK to change the perspective. ;-)

Gilles

>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-27 Thread Gilles Sadowski

Hi.

Le sam. 27 avr. 2019 à 15:05, Alex Herbert  a écrit :
>
> I have created RNG-97 and RNG-98 for Jump and LongJump.
>
> Please take a look and comment.
>
> The documentation highlights the implementation detail that a jump or long 
> jump creates a copy that is far ahead. The original generator is not effected.
>
> The use case is thus:
>
> rng1 = …;
> rng2 = rng1.jump();
> rng3 = rng2.jump();
> rng4 = rng3.jump();
>
> As opposed to:
>
> rng1 = …;
> rng2 = rng1.jump();
> rng3 = rng1.jump();
> rng4 = rng1.jump();
>
> Where rng1 will be advanced each time leaving behind a copy generator.
>
> In either case it will be an overlap problem if any of the children are then 
> used for jumping. So as long as the documentation is clear then this is OK. 
> The helper method to create a jump series (or long jump series) in 
> RandomSource seems the best way to avoid incorrect usage.

+1

I think that the default should be to prevent a "jump" on the returned
instances.
An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
leave it out until it is requested based on an actual use-case.

Best,
Gilles

>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-parent] branch master updated: japicmp-maven-plugin should not break builds on source incompatible changes by default.

2019-04-19 Thread Gilles Sadowski

Hi.

Le ven. 19 avr. 2019 à 19:09,  a écrit :
>
> This is an automated email from the ASF dual-hosted git repository.
>
> ggregory pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-parent.git
>
>
> The following commit(s) were added to refs/heads/master by this push:
>  new ad831d8  japicmp-maven-plugin should not break builds on source 
> incompatible changes by default.
> ad831d8 is described below
>
> commit ad831d8c8eabed2dece24bf3c56015a9f817edd9
> Author: Gary Gregory 
> AuthorDate: Fri Apr 19 13:09:42 2019 -0400
>
> japicmp-maven-plugin should not break builds on source incompatible
> changes by default.

Doesn't seem to match the huge change!

Gilles

> ---
>  pom.xml | 3648 
> ---
>  src/changes/changes.xml |1 +
>  2 files changed, 1826 insertions(+), 1823 deletions(-)
>
> [... Too many lines ...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-19 Thread Gilles Sadowski

Le jeu. 18 avr. 2019 à 21:53, Alex Herbert  a écrit :
>
>
>
> > On 18 Apr 2019, at 14:12, Gilles Sadowski  wrote:
> >
> > Hello Alex.
> >
> >>>> [...]
> >>
> >> OK so this results in:
> >>
> >> /**
> >>  * Some summary.
> >>  */
> >> public interface JumpableUniformRandomProvider extends
> >> UniformRandomProvider {
> >> /**
> >>  * Creates a copy of the UniformRandomProvider and advances the
> >> state of the copy.
> >>  * The state of the current instance is not altered. The state of
> >> the copy will be
> >>  * advanced an equivalent of {@code n} sequential calls to a method
> >> that updates the
> >>  * state of the provider.
> >>  *
> >>  * @return the copy with an advanced state
> >>  */
> >> JumpableUniformRandomProvider jump();
> >> }
> >>
> >
> > +1
> > [Clean and lean: and no side-effects to explain...]
> >
> >>
> >> This can be documented in an implementation as:
> >>
> >> public class MyJumpingRNG implements JumpableUniformRandomProvider {
> >> /**
> >>  * {@inheritDoc}
> >>  *
> >>  * The jump size {@code n} is the equivalent of {@code 2^64}
> >> calls to
> >>  * {@link UniformRandomProvider#nextLong() nextLong()}.
> >>  */
> >> @Override
> >> public JumpableUniformRandomProvider jump() {
> >> // TODO Auto-generated method stub
> >> return null;
> >> }
> >> }
> >
> > +1
> >
> >>
> >> Do we add a second interface for LongJumpableUniformRandomProvider?
> >
> > Sure, if the functionality is provided by some of the algorithms implemented
> > in [RNG].
> > But let's have the two functionalities in separate commits.
> >
> >>
> >>>> So the options are (in all cases returning the copy):
> >>>>
> >>>> 1. createAndJumpCopy
> >>>> 2. copyAndJumpParent
> >>>> 3. jumpParentAndCopy
> >>>> 4. jump and copy separately
> >>>>
> >>>> 1. Your preferred option. A copy of the state is made. The state is 
> >>>> advanced in the copy and returned. But when called repeatedly it will 
> >>>> get the same generator and code must be organised appropriately.
> >>> We could provide a convenience method in  "RandomSource":
> >>>
> >>> public UniformRandomProvider[] jump(int n,
> >>> JumpableUniformRandomProvider parent) {
> >>> final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
> >>> UniformRandomProvider tmp = parent;
> >>> for (int i = 0; i < n; i++) {
> >>> rngs[i] = restrict(tmp);
> >>> tmp = tmp.jump();
> >>> }
> >>> return rngs;
> >>> }
> >>
> >> +1. Remove the need for the user to repeat boiler plate code.
> >>
> >> Same sort of idea of longJump() too.
> >
> > +1
> >
> >>>> It is not actually possible to jump forward a single instance. Only 
> >>>> children are advanced.
> >>> A feature: There is only one way to alter the state of an instance
> >>> (i.e. a call to "next()").
> >> OK.
> >
> > Great. :-)
> >
> > Gilles
>
> This sounds like a resolution. I will put the ideas into a Jira ticket for 
> Jumpable.

Thanks.

>
> I am a bit busy at the moment with other mini-projects (updates to 
> nextInt(int) being the main one, Poisson samplers (again) being another 
> leading to a family of log normal based distributions that may be supported 
> using cumulative probability look-up tables) but will hope to get this done 
> soon. The actual implementation should be quite easy.
>
> Here’s one for you to think about on the subject of Jumpable. What about 
> support for the generators that can be advanced by a user specified 
> increment? For example the SplitMix algorithm is based on a sequence and so 
> can be advanced from 1 to 2^64-1 steps. It does seem strange to support this 
> (if we add jumpable to SplitMix) using only one specific jump distance. A 
> Skippable can do this:
>
> SkippableURP {
> public SkippableURP skip(long steps);
>
> // or
>
> public SkippableURP skipPower2(long power);
> }
>
> Too much?

You read my mind. ;-)
What would be the uses of "short" jumps (i.e. having small
non-overlapping sequences
from many instances, rather than a longer one from a single instance)?
IIUC, the hard-coded jump sizes in existing implementations seem a compromise,
based on the number of potentially concurrent threads, or independent
simulations.
Increasing that number does not seem necessary for the mid-term.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-18 Thread Gilles Sadowski

Hello Alex.

>>> [...]
>
> OK so this results in:
>
> /**
>   * Some summary.
>   */
> public interface JumpableUniformRandomProvider extends
> UniformRandomProvider {
>  /**
>   * Creates a copy of the UniformRandomProvider and advances the
> state of the copy.
>   * The state of the current instance is not altered. The state of
> the copy will be
>   * advanced an equivalent of {@code n} sequential calls to a method
> that updates the
>   * state of the provider.
>   *
>   * @return the copy with an advanced state
>   */
>  JumpableUniformRandomProvider jump();
> }
>

+1
[Clean and lean: and no side-effects to explain...]

>
> This can be documented in an implementation as:
>
> public class MyJumpingRNG implements JumpableUniformRandomProvider {
>  /**
>   * {@inheritDoc}
>   *
>   * The jump size {@code n} is the equivalent of {@code 2^64}
> calls to
>   * {@link UniformRandomProvider#nextLong() nextLong()}.
>   */
>  @Override
>  public JumpableUniformRandomProvider jump() {
>  // TODO Auto-generated method stub
>  return null;
>  }
> }

+1

>
> Do we add a second interface for LongJumpableUniformRandomProvider?

Sure, if the functionality is provided by some of the algorithms implemented
in [RNG].
But let's have the two functionalities in separate commits.

>
> >> So the options are (in all cases returning the copy):
> >>
> >> 1. createAndJumpCopy
> >> 2. copyAndJumpParent
> >> 3. jumpParentAndCopy
> >> 4. jump and copy separately
> >>
> >> 1. Your preferred option. A copy of the state is made. The state is 
> >> advanced in the copy and returned. But when called repeatedly it will get 
> >> the same generator and code must be organised appropriately.
> > We could provide a convenience method in  "RandomSource":
> >
> > public UniformRandomProvider[] jump(int n,
> > JumpableUniformRandomProvider parent) {
> >  final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
> >  UniformRandomProvider tmp = parent;
> >  for (int i = 0; i < n; i++) {
> >  rngs[i] = restrict(tmp);
> >  tmp = tmp.jump();
> >  }
> >  return rngs;
> > }
>
> +1. Remove the need for the user to repeat boiler plate code.
>
> Same sort of idea of longJump() too.

+1

> >> It is not actually possible to jump forward a single instance. Only 
> >> children are advanced.
> > A feature: There is only one way to alter the state of an instance
> > (i.e. a call to "next()").
> OK.

Great. :-)

Gilles

> >
>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-17 Thread Gilles Sadowski

Hello.

Le lun. 15 avr. 2019 à 01:03, Alex Herbert  a écrit :
>
>
>
> > On 14 Apr 2019, at 01:31, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> >> On 11/04/2019 13:22, Gilles Sadowski wrote:
> >>>>> [...]
> >>>> Not adding a dedicated method would mean everyone has to do this:
> >>>>
> >>>> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) 
> >>>> RandomSource.create(…)
> >>>>
> >>>> But adding a mirror methods:
> >>>>
> >>>> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> >>>> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
> >>>>
> >>>> Does seem to add clutter to RandomSource. If we leave it out for now it 
> >>>> can be added in the future.
> >>> +1 (for leaving out).
> >>>
> >>>> Is there scope for needing the Jumpable to be detected through the API? 
> >>>> E.g. add:
> >>>>
> >>>> boolean RandomSource::isJumpable(RandomSource)
> >>>>
> >>>> We would just need to maintain an EnumSet for Jumpable and likewise for 
> >>>> LongJumpable. Again more clutter to the RandomSource interface but 
> >>>> perhaps less so than a mirror create method that would throw 
> >>>> IllegalArgumentException for any RandomSource specified that is not 
> >>>> Jumpable.
> >>> +1 (for leaving out for now).
> >>>
> >>>>> [...]
> >>>>>> I was hoping to avoid creating a duplicate class. But actually that
> >>>>>> would be fine and easier for testing. The implementation is trivial 
> >>>>>> anyway.
> >>>>> Why duplicate?
> >>>>> IIUC, shouldn't the existing "core" class define an additional 
> >>>>> constructor
> >>>>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> >>>>> would use the default increment.
> >>>>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
> >>>> OK. That was where I should have gone to start with. I’ll do that.
> >>>>
> >>>>>> I've just finished an initial implementation of the MSWS RNG that uses 
> >>>>>> a
> >>>>>> self-generated Weyl sequence. It works. So the idea for this can be
> >>>>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >>>>>> both. Perhaps moving the code to create the Weyl sequence increment to
> >>>>>> NumberFactory.
> >>>>> +1
> >>>> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when 
> >>>> I’ll get around to doing this though. It seems less important than other 
> >>>> tasks. It may even be redundant if the only reason is to increase the 
> >>>> state space for the generator. Each generator would still only have a 
> >>>> period of 2^64. You can just create a lot of them with a weak assurance 
> >>>> they will not overlap and that the uniformity is statistically the same. 
> >>>> Since there are of the order of 2^63 variants we are not going to be 
> >>>> able to test this and would have to rely on the theory behind it.
> >>>>
> >>>> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can 
> >>>> create either 2^32 rngs with no overlap for the first 2^32 output 
> >>>> numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap 
> >>>> for the first 2^32 output numbers. That is a lot for a small parallel 
> >>>> situation and does have the assurance of no overlap. Any parallel usage 
> >>>> where longer sequences are expected to be used can use one of the 
> >>>> XorShiRo generators.
> >>> I'm a bit lost here.  I thought that "SplitMix64" did not provide
> >>> "jump",
> >>
> >> The state it is a linear sum. So can be jumped very easily.
> >>
> >> y = mx + c
> >>
> >> c is the current Weyl state, m the number of transitions and x the Weyl
> >> increment. So we can make SplitMix Jumpable. The maximum jump length is
> >> 2^64 which will wrap the state.
> >>
> >>> hence the
> >>> "Weyl way" to ensure no-over

Re: [rng] Split and Jump functions

2019-04-13 Thread Gilles Sadowski

Hello.

> On 11/04/2019 13:22, Gilles Sadowski wrote:
> >>> [...]
> >> Not adding a dedicated method would mean everyone has to do this:
> >>
> >> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) 
> >> RandomSource.create(…)
> >>
> >> But adding a mirror methods:
> >>
> >> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> >> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
> >>
> >> Does seem to add clutter to RandomSource. If we leave it out for now it 
> >> can be added in the future.
> > +1 (for leaving out).
> >
> >> Is there scope for needing the Jumpable to be detected through the API? 
> >> E.g. add:
> >>
> >> boolean RandomSource::isJumpable(RandomSource)
> >>
> >> We would just need to maintain an EnumSet for Jumpable and likewise for 
> >> LongJumpable. Again more clutter to the RandomSource interface but perhaps 
> >> less so than a mirror create method that would throw 
> >> IllegalArgumentException for any RandomSource specified that is not 
> >> Jumpable.
> > +1 (for leaving out for now).
> >
> >>> [...]
> >>>> I was hoping to avoid creating a duplicate class. But actually that
> >>>> would be fine and easier for testing. The implementation is trivial 
> >>>> anyway.
> >>> Why duplicate?
> >>> IIUC, shouldn't the existing "core" class define an additional constructor
> >>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> >>> would use the default increment.
> >>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
> >> OK. That was where I should have gone to start with. I’ll do that.
> >>
> >>>> I've just finished an initial implementation of the MSWS RNG that uses a
> >>>> self-generated Weyl sequence. It works. So the idea for this can be
> >>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >>>> both. Perhaps moving the code to create the Weyl sequence increment to
> >>>> NumberFactory.
> >>> +1
> >> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when 
> >> I’ll get around to doing this though. It seems less important than other 
> >> tasks. It may even be redundant if the only reason is to increase the 
> >> state space for the generator. Each generator would still only have a 
> >> period of 2^64. You can just create a lot of them with a weak assurance 
> >> they will not overlap and that the uniformity is statistically the same. 
> >> Since there are of the order of 2^63 variants we are not going to be able 
> >> to test this and would have to rely on the theory behind it.
> >>
> >> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can 
> >> create either 2^32 rngs with no overlap for the first 2^32 output numbers 
> >> or 2^16 rngs that can each be jumped 2^16 times with no overlap for the 
> >> first 2^32 output numbers. That is a lot for a small parallel situation 
> >> and does have the assurance of no overlap. Any parallel usage where longer 
> >> sequences are expected to be used can use one of the XorShiRo generators.
> > I'm a bit lost here.  I thought that "SplitMix64" did not provide
> > "jump",
>
> The state it is a linear sum. So can be jumped very easily.
>
> y = mx + c
>
> c is the current Weyl state, m the number of transitions and x the Weyl
> increment. So we can make SplitMix Jumpable. The maximum jump length is
> 2^64 which will wrap the state.
>
> > hence the
> > "Weyl way" to ensure no-overlap, with high probability (why "weak 
> > assurance"?)
> I used the term "weak assurance" as opposed to "concrete assurance". The
> wording used by the JDK is "with very high probability, the set of
> values collectively generated by the two objects has the same
> statistical properties as if the same quantity of values were generated
> by a single object". It is a bit of semantics.
>
> >
> > Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
> > cannot ensure that the added flexibility does not come with unsuspected
> > drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
> > choosing the (hard-coded) values that do produce good sequences.]
>
> The likelihood of overlap is

Re: [rng] Split and Jump functions

2019-04-11 Thread Gilles Sadowski

> > [...]
>
> Not adding a dedicated method would mean everyone has to do this:
>
> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) 
> RandomSource.create(…)
>
> But adding a mirror methods:
>
> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
>
> Does seem to add clutter to RandomSource. If we leave it out for now it can 
> be added in the future.

+1 (for leaving out).

> Is there scope for needing the Jumpable to be detected through the API? E.g. 
> add:
>
> boolean RandomSource::isJumpable(RandomSource)
>
> We would just need to maintain an EnumSet for Jumpable and likewise for 
> LongJumpable. Again more clutter to the RandomSource interface but perhaps 
> less so than a mirror create method that would throw IllegalArgumentException 
> for any RandomSource specified that is not Jumpable.

+1 (for leaving out for now).

>
> > [...]
> >>
> >> I was hoping to avoid creating a duplicate class. But actually that
> >> would be fine and easier for testing. The implementation is trivial anyway.
> >
> > Why duplicate?
> > IIUC, shouldn't the existing "core" class define an additional constructor
> > that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> > would use the default increment.
> > [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
>
> OK. That was where I should have gone to start with. I’ll do that.
>
> >
> >> I've just finished an initial implementation of the MSWS RNG that uses a
> >> self-generated Weyl sequence. It works. So the idea for this can be
> >> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >> both. Perhaps moving the code to create the Weyl sequence increment to
> >> NumberFactory.
> >
> > +1
>
> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll 
> get around to doing this though. It seems less important than other tasks. It 
> may even be redundant if the only reason is to increase the state space for 
> the generator. Each generator would still only have a period of 2^64. You can 
> just create a lot of them with a weak assurance they will not overlap and 
> that the uniformity is statistically the same. Since there are of the order 
> of 2^63 variants we are not going to be able to test this and would have to 
> rely on the theory behind it.
>
> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can 
> create either 2^32 rngs with no overlap for the first 2^32 output numbers or 
> 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 
> 2^32 output numbers. That is a lot for a small parallel situation and does 
> have the assurance of no overlap. Any parallel usage where longer sequences 
> are expected to be used can use one of the XorShiRo generators.

I'm a bit lost here.  I thought that "SplitMix64" did not provide
"jump", hence the
"Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)

Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
cannot ensure that the added flexibility does not come with unsuspected
drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
choosing the (hard-coded) values that do produce good sequences.]

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] nextInt(int) and nextLong(long) can be improved

2019-04-10 Thread Gilles Sadowski

Le mer. 10 avr. 2019 à 18:26, Alex Herbert  a écrit :
>
>
> On 10/04/2019 15:59, Gilles Sadowski wrote:
> > Hello.
> >
> > Le mer. 10 avr. 2019 à 15:22, Alex Herbert  a 
> > écrit :
> >> On 10/04/2019 13:46, Alex Herbert wrote:
> >>> The code for nextInt(int) checks the range number n is a power of two
> >>> and if so it computes a fast solution:
> >>>
> >>>  return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> >>>
> >>> This scales a 31 bit positive number by a power of 2 (i.e. n) then
> >>> discards bits. So this is effectively just a left shift followed by a
> >>> right shift to discard significant bits from the number. The same
> >>> result can be achieved using a mask to discard the significant bits:
> >>>
> >>>  return nextInt() & (n-1)
> >>>
> >>> This works if n is a power of 2 as (n-1) will be all the bits set
> >>> below it. Note: This method is employed by ThreadLocalRandom.
> >>>
> >>> It also makes the method applicable to nextLong(long) since you no
> >>> longer require the long multiplication arithmetic.
> >>>
> >>> I suggest updating the methods to use masking. Note that the output
> >>> from the following method will be the same:
> >> Update: It will not be the same as this method returns the lower order
> >> bits, not the higher order bits. See below.
> >>>  public int nextInt(int n) {
> >>>  checkStrictlyPositive(n);
> >>>
> >>>  final int nm1 = n - 1;
> >>>  if ((n & nm1) == 0) {
> >>>  // Range is a power of 2
> >>>  return (nextInt() >>> 1) & nm1;
> >>>  }
> >>>  int bits;
> >>>  int val;
> >>>  do {
> >>>  bits = nextInt() >>> 1;
> >>>  val = bits % n;
> >>>  } while (bits - val + nm1 < 0);
> >>>
> >>>  return val;
> >>>  }
> >>>
> >>> It can be sped up by removing the unsigned shift for the power of 2
> >>> case, but that would make the output change as the least significant
> >>> bit is now part of the result.
> >>>
> >>>
> >> Note:
> >>
> >> The current method originates from the implementation in
> >> java.util.Random. There the Javadoc states:
> >>
> >> "The algorithm treats the case where n is a power of two specially: it
> >> returns the correct number of high-order bits from the underlying
> >> pseudo-random number generator. In the absence of special treatment, the
> >> correct number of low-order bits would be returned. Linear congruential
> >> pseudo-random number generators such as the one implemented by this
> >> class are known to have short periods in the sequence of values of their
> >> low-order bits. Thus, this special case greatly increases the length of
> >> the sequence of values returned by successive calls to this method if n
> >> is a small power of two."
> >>
> >> java.util.Random does not support nextLong(long).
> >>
> >> So the change to the implementation would require that the underlying
> >> generator is a good provider of lower order bits. This is assumed to be
> >> the case for ThreadLocalRandom which is why the implementation is
> >> different. The same faster method is also used in SplittableRandom.
> >>
> >> Given that the generators in Commons RNG are mostly good providers
> > With the notable exception of "JDK"...
> >
> >> of bits the change to use the lower order bits should be reasonable.
> > ... Hence, calling "nextInt(int)" on that provider will generate a
> > sequence even worse than it would be from direct calls to
> > "java.util.Random".
> >
> > That's a dilemma.
>
> Yes. The faster method will be OK for some but not all.
>
> This method is in BaseProvider. So is used by all generators. The method
> and its alternative could be moved to NumberFactory (and tested to do
> what they say).

+1

> Then any generator that shows poor results in
> BigCrush/Dieharder can be set to use the current implementation on the
> assumption that the lower order bits are poor. Any generator that is
> good can use the faster implementation via @Override.
>
> Or conversely we

Re: Re: [rng] Split and Jump functions

2019-04-10 Thread Gilles Sadowski

Hi.

>
> On 10/04/2019 18:58, Gilles Sadowski wrote:
> > Hi.
> >
> >> [... long quote skipped where I think we largely agree on the
> >> conclusions ...]
> >> So do we have a working idea to:
> >>
> >> - Add interface 'JumpableUniformRandomProvider'
> > Do we need to add "createJumpable" factory methods in "RandomSource"
> > methods or is there a way to avoid the duplication?
> >
> > As mentioned in an earlier post, it would be cleaner/nicer (?) to add
> > methods
> > UniformRandomProvider jump();
> > boolean isJumpable();
> > to "UniformRandomProvider".
> > This would require dropping support of Java 6 and 7 and perhaps a good
> > reason to do so (?) ...
>
> And move to V2.0 with Java 8 giving the opportunity to clean up other
> deprecated stuff.

No!
Changing major version version would entail package names change
(thus forcing user codes updates)
We don't need this since there isn't any breaking change if we add
default methods.

>
> Would the the default implementation be to throw an
> UnsupportedOperationException?

Yes.
>
> I'm +0 on this.
>
> I'm not against it but do think the UniformRandomProvider interface
> could become quite cluttered with these extra methods that would be in
> the minority (jump, longJump, isJumpable, isLongJumpable are not
> generally available). It would also allow methods/classes that currently
> use simple methods from UniformRandomProvider to have access to call
> jump on the generator and spawn lots of sub generators. I think this is
> bad. These methods should be written to explicitly require a Jumpable
> instance.

>From this POV, I certainly agree.

> My approach would have been to leave RandomSource as is and then state
> that the returned generator can be tested to see if it is Jumpable using
> instanceof. Someone who is writing code to use a Jumpable RNG should be
> fine with that since they would have to know a priory what RandomSource
> to create to get a Jumpable.

If it makes more sense, I'm fine letting the user write the "ugly" code. ;-)

> I would add a method to mimic RandomSource.unrestorable as
> RandomSource.unjumpable. Or since they both would be doing the same
> thing a new method RandomSource.restrict to 'restrict' functionality to
> just the data generation methods in UniformRandomProvider. This restrict
> method can be called by RandomSource.unrestorable and make that deprecated.

Looks neat.

> >
> >> - Add interface 'LongJumpable... extends JumpableUniformRandomProvider'
> > Same question...
> >
> >> - Test if a SplitMix variant with a self generated Weyl sequence can
> >> pass tests of uniformity. This would just require a seed of long[2], one
> >> for the state and one to use to derive the Weyl sequence increment.
> > Is the new seed length a temporary workaround for the test,
> > to be replaced with a new "SPLIT_MIX_64_K" provider, as
> > mentioned in your previous message, if the test passes?
> >
> > Gilles
>
> I was hoping to avoid creating a duplicate class. But actually that
> would be fine and easier for testing. The implementation is trivial anyway.

Why duplicate?
IIUC, shouldn't the existing "core" class define an additional constructor
that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
would use the default increment.
[Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]

> I've just finished an initial implementation of the MSWS RNG that uses a
> self-generated Weyl sequence. It works. So the idea for this can be
> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> both. Perhaps moving the code to create the Weyl sequence increment to
> NumberFactory.

+1

Regards,
Gilles

> >
> >> Alex
> >>
> >>
> >>> Regards,
> >>> Gilles
> >>>
> >>>> Alex
> >>>>
> >>>>
> >>>> [1] https://en.wikipedia.org/wiki/Weyl_sequence
> >>>>
> >>>> [2] The Jira ticket RNG-85 had a note about the seeding algorithm for
> >>>> the generator being GPL. There are alternatives based on the
> >>>> SplittableRandom seeding method that could be used instead to
> >>>> create the
> >>>> increment for the Weyl sequence. I've speed tested the provided
> >>>> algorithm and it is about 85x slower than the one used in
> >>>> SplittableRandom. Since that algorithm has an issue with the unsigned
> >>>> shift not being modelled by the Binomial distribution an updated
>

Re: [rng] Split and Jump functions

2019-04-10 Thread Gilles Sadowski

Hi.

> [... long quote skipped where I think we largely agree on the conclusions ...]

> So do we have a working idea to:
>
> - Add interface 'JumpableUniformRandomProvider'

Do we need to add "createJumpable" factory methods in "RandomSource"
methods or is there a way to avoid the duplication?

As mentioned in an earlier post, it would be cleaner/nicer (?) to add methods
UniformRandomProvider jump();
boolean isJumpable();
 to "UniformRandomProvider".
This would require dropping support of Java 6 and 7 and perhaps a good
reason to do so (?) ...

>
> - Add interface 'LongJumpable... extends JumpableUniformRandomProvider'

Same question...

> - Test if a SplitMix variant with a self generated Weyl sequence can
> pass tests of uniformity. This would just require a seed of long[2], one
> for the state and one to use to derive the Weyl sequence increment.

Is the new seed length a temporary workaround for the test,
to be replaced with a new "SPLIT_MIX_64_K" provider, as
mentioned in your previous message, if the test passes?

Gilles

>
> Alex
>
>
> >
> > Regards,
> > Gilles
> >
> >> Alex
> >>
> >>
> >> [1] https://en.wikipedia.org/wiki/Weyl_sequence
> >>
> >> [2] The Jira ticket RNG-85 had a note about the seeding algorithm for
> >> the generator being GPL. There are alternatives based on the
> >> SplittableRandom seeding method that could be used instead to create the
> >> increment for the Weyl sequence. I've speed tested the provided
> >> algorithm and it is about 85x slower than the one used in
> >> SplittableRandom. Since that algorithm has an issue with the unsigned
> >> shift not being modelled by the Binomial distribution an updated
> >> algorithm could be used that would be novel so avoid the Oracle or GPL
> >> licences.
> >>
> >>> Best,
> >>> Gilles
> >>>
> >>>>>> Alex
> >>>>>>
> >>>>>> [1] https://github.com/aappleby/smhasher
> >>>>>>
> >>>>>> [2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
> >>>>>>
> >>>>>> [3] The SplitMix64 is a simple linear series and thus can be jumped in
> >>>>>> any power of 2 up to the maximum for a long (which causes sequence
> >>>>>> wrapping). It can actually be jumped any number of iterations using
> >>>>>> BigInteger arithmetic but jumping in powers of 2 can be implemented
> >>>>>> using long arithmetic where the rollover bits beyond 64 are naturally
> >>>>>> discarded:
> >>>>>>
> >>>>>> long jumps = 1234567;
> >>>>>>
> >>>>>> long increment = 0x9e3779b97f4a7c15L;
> >>>>>>
> >>>>>> state = BigInteger.valueOf(state)
> >>>>>>
> >>>>>>   
> >>>>>> .add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
> >>>>>>
> >>>>>>   .longValue(); // narrowing primitive conversion
> >>>>>>
> >>>>>> int jumpPower = 32;
> >>>>>>
> >>>>>> state = BigInteger.valueOf(state)
> >>>>>>
> >>>>>>   
> >>>>>> .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
> >>>>>>
> >>>>>>   .longValue(); // narrowing primitive conversion
> >>>>>>
> >>>>>> // Same as above by discarding overflow bits
> >>>>>>
> >>>>>> state = state + (increment << jumpPower);
> >>>>>>
> >>>>>> This is based on my understanding of BigInteger and signed/unsigned
> >>>>>> arithmetic and should be verified in tests.
> >>>>>>
> >>>>>> [4] Given the period of the SplitMix is 2^64, and the period may be 
> >>>>>> less
> >>>>>> than this in practice it may be better to only support jumps of e.g.
> >>>>>> 2^32 in a manner similar to those provided for the XorShiRo generators
> >>>>>> where the state can be advanced a factor of the period, or just not
> >>>>>> supports jumps. I can see the utility in jumping more than
> >>>>>> Integer.MAX_VALUE (guaranteed unique outputs for the maximum array 
> >>>>>> size)
> >>>>>> but less than 2^32 is tending towards not very many random numbers from
> >>>>>> the original instance before sequence overlap with the jumped instance.
> >>>>>>
> >>>>>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >>> For additional commands, e-mail: dev-h...@commons.apache.org
> >>>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] nextInt(int) and nextLong(long) can be improved

2019-04-10 Thread Gilles Sadowski

Hello.

Le mer. 10 avr. 2019 à 15:22, Alex Herbert  a écrit :
>
> On 10/04/2019 13:46, Alex Herbert wrote:
> > The code for nextInt(int) checks the range number n is a power of two
> > and if so it computes a fast solution:
> >
> > return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> >
> > This scales a 31 bit positive number by a power of 2 (i.e. n) then
> > discards bits. So this is effectively just a left shift followed by a
> > right shift to discard significant bits from the number. The same
> > result can be achieved using a mask to discard the significant bits:
> >
> > return nextInt() & (n-1)
> >
> > This works if n is a power of 2 as (n-1) will be all the bits set
> > below it. Note: This method is employed by ThreadLocalRandom.
> >
> > It also makes the method applicable to nextLong(long) since you no
> > longer require the long multiplication arithmetic.
> >
> > I suggest updating the methods to use masking. Note that the output
> > from the following method will be the same:
> Update: It will not be the same as this method returns the lower order
> bits, not the higher order bits. See below.
> >
> > public int nextInt(int n) {
> > checkStrictlyPositive(n);
> >
> > final int nm1 = n - 1;
> > if ((n & nm1) == 0) {
> > // Range is a power of 2
> > return (nextInt() >>> 1) & nm1;
> > }
> > int bits;
> > int val;
> > do {
> > bits = nextInt() >>> 1;
> > val = bits % n;
> > } while (bits - val + nm1 < 0);
> >
> > return val;
> > }
> >
> > It can be sped up by removing the unsigned shift for the power of 2
> > case, but that would make the output change as the least significant
> > bit is now part of the result.
> >
> >
> Note:
>
> The current method originates from the implementation in
> java.util.Random. There the Javadoc states:
>
> "The algorithm treats the case where n is a power of two specially: it
> returns the correct number of high-order bits from the underlying
> pseudo-random number generator. In the absence of special treatment, the
> correct number of low-order bits would be returned. Linear congruential
> pseudo-random number generators such as the one implemented by this
> class are known to have short periods in the sequence of values of their
> low-order bits. Thus, this special case greatly increases the length of
> the sequence of values returned by successive calls to this method if n
> is a small power of two."
>
> java.util.Random does not support nextLong(long).
>
> So the change to the implementation would require that the underlying
> generator is a good provider of lower order bits. This is assumed to be
> the case for ThreadLocalRandom which is why the implementation is
> different. The same faster method is also used in SplittableRandom.
>
> Given that the generators in Commons RNG are mostly good providers

With the notable exception of "JDK"...

> of bits the change to use the lower order bits should be reasonable.

... Hence, calling "nextInt(int)" on that provider will generate a
sequence even worse than it would be from direct calls to
"java.util.Random".

That's a dilemma.
Since it's not recommended, and provided mainly as baseline for
showing that all the other implementations are better, we could
deprecate (but never delete) the "JDK" enum just to make those
points clear.

Then, we can *currently* make the "good providers" assumption,
but it could soon change since the plan was to also add algorithms
with known shortcomings.[1]

Gilles

[1] https://issues.apache.org/jira/browse/RNG-32

>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[RNG] Re: [commons-rng] 02/02: Disable redundant JMH benchmarks and mark as deprecated.

2019-04-10 Thread Gilles Sadowski

Hello Alex.

> [...]
>
> commit 431a51b8211a40e5e8d9df0406563d4873eb4eb6
> Author: aherbert 
> AuthorDate: Wed Apr 10 11:01:05 2019 +0100
>
> Disable redundant JMH benchmarks and mark as deprecated.

Is there and advantage to keeping disabled code?
[Usually, it rots silently until nobody knows why it was there... ]

If you are sure that no functionality is lost, and that the replacement
is better; I'd favour removing it.  Code archaelogists can always turn
to the VCS, if need be.

Regards,
Gilles

> ---
>  .../rng/examples/jmh/GenerationPerformance.java| 21 --
>  .../jmh/distribution/SamplersPerformance.java  | 33 
> --
>  2 files changed, 30 insertions(+), 24 deletions(-)
>
> diff --git 
> a/commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/GenerationPerformance.java
>  
> b/commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/GenerationPerformance.java
> index 6b4e993..741c510 100644
> --- 
> a/commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/GenerationPerformance.java
> +++ 
> b/commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/GenerationPerformance.java
> @@ -17,7 +17,6 @@
>
> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-09 Thread Gilles Sadowski

Hello.

Le mar. 9 avr. 2019 à 16:42, Alex Herbert  a écrit :
>
> Hi Gilles,
>
> Lots of sensible discussion here. I'll just put all replies at the end.
>
> On 09/04/2019 13:28, Gilles Sadowski wrote:
> > Hello.
> >
> >> Hi Gilles,
> >>
> >> You ask some good questions which I may have been vague about due to
> >> familiarity with the possibilities. I hope to clarify a bit below.
> >>
> >> On 08/04/2019 16:05, Gilles Sadowski wrote:
> >>> Hi Alex.
> >>>
> >>> Le lun. 8 avr. 2019 à 14:36, Alex Herbert  a 
> >>> écrit :
> >>>> This is a starter for a discussion on the split and jump functionality
> >>>> for a random generator.
> >>>>
> >>>> Split:
> >>>>
> >>>> To create a new instance of the generator that is deterministically
> >>>> based on the state of the current instance but the probability that the
> >>>> sequence generated by the new instance and the current instance overlap
> >>>> is negligible.
> >>> I may well be mistaken but I seem to recall that a split is supposed
> >>> to create an instance with no overlap for a sequence below a certain
> >>> length.
> >>   From the implementations I have found in the XorShiRo family they have
> >> both a split and a jump.
> > The C implementations do not contain a "split" function, although the
> > Java ones do.
> > Is it only to preserve familiarity with the "SplittableRandom" API?
> > What are use-cases that would not be (better) covered by jump?
> >
> >> The split basically creates a new random generator. There are no
> >> guarantees about sequence overlap. This is like seeding a new instance.
> >> The difference is that it is deterministic based on the current state
> >> and will return an instance of the same generator, that will be
> >> different, and do it fast. It is very simple to do this. Just scramble
> >> the current state using an algorithm different from how the state is
> >> regularly updated. I see it as a "scrambled copy" type functionality.
> > Is the only purpose to get a second instance faster than by
> > going through the "RandomSource" factory?  [If so, how many
> > creations would be needed in order to see a noticeable difference?]
> > Or there is a inherent interest of generating a instance using
> > another's state a seed?  [I'd tend to think there isn't due to the lack
> > of this functionality in C codes.]
> >
> >> The jump is more constrained. It advances the generator to a point that
> >> would be reached after a large number of calls to next(). Here's how the
> >> documentation from XorShiro256StarStar describes its use (c-code):
> >>
> >> /* This is the jump function for the generator. It is equivalent
> >>  to 2^128 calls to next(); it can be used to generate 2^128
> >>  non-overlapping subsequences for parallel computations. */
> >>
> >> void jump(void);
> > Indeed, the "non-overlapping" guarantee is a added feature.
> >
> > In comparison, "split" looks pretty bland (as just another way
> > to instantiate a generator).
> >
> >> /* This is the long-jump function for the generator. It is equivalent to
> >>  2^192 calls to next(); it can be used to generate 2^64 starting 
> >> points,
> >>  from each of which jump() will generate 2^64 non-overlapping
> >>  subsequences for parallel distributed computations. */
> >>
> >> void long_jump(void);
> >>
> >> So the idea is to seed an experiment once to get a single generator.
> >> Then jump it for each parallel computation. Each computation will then
> >> be guaranteed to run with a different sequence for at least as long as
> >> the jump length.
> >>
> >> This results in the following type of code using the API I suggested:
> >>
> >> // If jump returns a new instance
> >> JumpableUniformRandomProvider source = ...;
> >> UniformRandomProvider[] rngs = new UniformRandomProvider[128];
> >> for (int i = 0; i < rngs.length; i++) {
> >>   // Advance state
> >>   rngs[i] = source.jump();
> >>   source = rngs[i];
> >> }
> >>
> >> In my suggested API the jump returns a new instance. So calling jump
> >> repeatedly on the same generator keeps returning the same fast-forward
> >> state.
> > The le

Re: UNCHECKED [parent] Introducing Automatic-Module-Name

2019-04-09 Thread Gilles Sadowski

Le mar. 9 avr. 2019 à 14:11, Rob Tompkins  a écrit :
>
>
>
> > On Apr 9, 2019, at 7:21 AM, Gilles Sadowski  wrote:
> >
> > Le mar. 9 avr. 2019 à 13:03, sebb  > <mailto:seb...@gmail.com>> a écrit :
> >>
> >> On Tue, 9 Apr 2019 at 11:43, Gilles Sadowski  wrote:
> >>>
> >>>> [...]
> >>>>>
> >>>>> $ git diff pom.xml
> >>>>> diff --git a/pom.xml b/pom.xml
> >>>>> index 2612dd6..54a88e4 100644
> >>>>> --- a/pom.xml
> >>>>> +++ b/pom.xml
> >>>>> @@ -570,6 +570,7 @@
> >>>>>   
> >>>>> ${implementation.build}
> >>>>>   
> >>>>> ${maven.compiler.source}
> >>>>>   
> >>>>> ${maven.compiler.target}
> >>>>> +  
> >>>>> ${commons.module.name}
> >>>
> >>> ${commons.automatic.module.name}
> >>>
> >>>>> 
> >>>>>   
> >>>>> 
> >>>>> @@ -1608,6 +1609,9 @@
> >>>>> 1.3
> >>>>> 1.3
> >>>>>
> >>>>> +
> >>>>> +${project.artifactId}
> >>>
> >>> No default should be defined (to avoid the risk of creating incompatible
> >>> but identically named modules).
> >>
> >> Surely that *should* be solved by using groupId + artifactId?
> >
> > From
> >https://blog.joda.org/2017/04/java-se-9-jpms-module-naming.html 
> > <https://blog.joda.org/2017/04/java-se-9-jpms-module-naming.html>
> > ---CUT---
> > Module names must be valid Java identifiers! E.g. no Java keywords, no
> > dashes, no...
> > ---CUT---
> >
> >> We change one or the other when releasing an incompatible module.
> >>
> >>> Then the release plugin could be enhanced (?) so that it would check
> >>> whether the variable has been defined for each JAR to be created (and
> >>> fail the build otherwise).
> >>
> >> But how would that ensure incompatible modules were given different names?
> >
> > It would not.
> > [IIUC, same issue with OSGI config.]
>
> If it’s the same issue as OSGI, should we not then use the same value as we 
> do with OSGI, which we already have?

The requirements/limitations are not necessarily the same.
Also, the parent cannot know the specifics of each component, in
particular for modular projects.

Gilles

> >>>
> >>>>> +
> >>>>> 
> >>>>> false
> >>>>> 
> >>>>>
> >>>
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] Split and Jump functions

2019-04-09 Thread Gilles Sadowski

Hello.

>
> Hi Gilles,
>
> You ask some good questions which I may have been vague about due to
> familiarity with the possibilities. I hope to clarify a bit below.
>
> On 08/04/2019 16:05, Gilles Sadowski wrote:
> > Hi Alex.
> >
> > Le lun. 8 avr. 2019 à 14:36, Alex Herbert  a 
> > écrit :
> >> This is a starter for a discussion on the split and jump functionality
> >> for a random generator.
> >>
> >> Split:
> >>
> >> To create a new instance of the generator that is deterministically
> >> based on the state of the current instance but the probability that the
> >> sequence generated by the new instance and the current instance overlap
> >> is negligible.
> > I may well be mistaken but I seem to recall that a split is supposed
> > to create an instance with no overlap for a sequence below a certain
> > length.
>
>  From the implementations I have found in the XorShiRo family they have
> both a split and a jump.

The C implementations do not contain a "split" function, although the
Java ones do.
Is it only to preserve familiarity with the "SplittableRandom" API?
What are use-cases that would not be (better) covered by jump?

>
> The split basically creates a new random generator. There are no
> guarantees about sequence overlap. This is like seeding a new instance.
> The difference is that it is deterministic based on the current state
> and will return an instance of the same generator, that will be
> different, and do it fast. It is very simple to do this. Just scramble
> the current state using an algorithm different from how the state is
> regularly updated. I see it as a "scrambled copy" type functionality.

Is the only purpose to get a second instance faster than by
going through the "RandomSource" factory?  [If so, how many
creations would be needed in order to see a noticeable difference?]
Or there is a inherent interest of generating a instance using
another's state a seed?  [I'd tend to think there isn't due to the lack
of this functionality in C codes.]

> The jump is more constrained. It advances the generator to a point that
> would be reached after a large number of calls to next(). Here's how the
> documentation from XorShiro256StarStar describes its use (c-code):
>
> /* This is the jump function for the generator. It is equivalent
> to 2^128 calls to next(); it can be used to generate 2^128
> non-overlapping subsequences for parallel computations. */
>
> void jump(void);

Indeed, the "non-overlapping" guarantee is a added feature.

In comparison, "split" looks pretty bland (as just another way
to instantiate a generator).

>
> /* This is the long-jump function for the generator. It is equivalent to
> 2^192 calls to next(); it can be used to generate 2^64 starting points,
> from each of which jump() will generate 2^64 non-overlapping
> subsequences for parallel distributed computations. */
>
> void long_jump(void);
>
> So the idea is to seed an experiment once to get a single generator.
> Then jump it for each parallel computation. Each computation will then
> be guaranteed to run with a different sequence for at least as long as
> the jump length.
>
> This results in the following type of code using the API I suggested:
>
> // If jump returns a new instance
> JumpableUniformRandomProvider source = ...;
> UniformRandomProvider[] rngs = new UniformRandomProvider[128];
> for (int i = 0; i < rngs.length; i++) {
>  // Advance state
>  rngs[i] = source.jump();
>  source = rngs[i];
> }
>
> In my suggested API the jump returns a new instance. So calling jump
> repeatedly on the same generator keeps returning the same fast-forward
> state.

The least surprise behaviour.

> This could lead to errors if not well documented how to use it.

To be documented, but it would be the same kind of errors as
forgetting to increment a counter.

> The alternative is to advance the state of the same instance. So to get
> the same effect for parallel computations you must have an ability to
> copy a generator (which we do not have in the API).
>
> // If jump updates the current instance
> JumpableUniformRandomProvider source = ...;
> UniformRandomProvider[] rngs = new UniformRandomProvider[128];
> for (int i = 0; i < rngs.length; i++) {
>  // Advance state and copy
>  rngs[i] = source.jump().copy();
> }
>
> We previously discussed copy(). IIRC the conclusion was that copy()
> could be misused for parallel computations; it basically allows a
> parallel set of work all with a copy of the same generator and so
> limited randomness across the set.

Not sure what you mean by "l

< 7 8 9 10 11 12 13 14 15 16 >

1101 - 1200 of 4259 matches

Mail list logo