Re: [math] random boolean arrays

2015-07-12 Thread Phil Steitz
On 7/12/15 10:38 AM, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.
 OK, for now we can make it private and use the reflection method
 above to test it.
 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.
 Please consider keeping this class.  Consider this a user request. 
 I have quite a few applications that use this class for two reasons:
 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

Thanks.  I don't think it needs to include data generation methods
for all distributions.  Just the ones most commonly used.  It is a
convenience class providing common random data generation methods
using a shared RandomGenerator.  All the distributions that I need
are there.  We can add others if / when users request them / provide
patches.

Phil

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.
 Well, it is not really necessary to do everything together and refactor
 the distributions.

 Probably it is better to start the other way round, and describe what I
 want to add, and see how other things fit in:

  * I want a generic Sampler interface, i.e. something like this:
  ** nextSample()
  ** nextSamples(int size)
  ** nextSamples(double[] samples)

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.

 The distributions could either be changed to return such a sampler as
 Gilles proposed (with the advantage that no random instance is tied to
 the distribution itself), or implement the interface directly (with the
 advantage that we would not need to refactor too much).

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).
 +1 for sequential sampling.  I don't follow exactly why that
 requires refactoring the distributions; but if it helps in a way I
 don't yet understand, 

[dbcp] 2.2 plan - feedback requested

2015-07-12 Thread Phil Steitz
There are a few decisions to make before we start rolling RCs for
DBCP 2.2.  I would appreciate feedback / patches for the following:

DBCP-436
Seems a reasonable request but hard to implement and test.  I would
say either WONT_FIX or bump to 3.0.  I am curious about how
important this actually is with modern drivers (i.e., whether there
actually is much cost in what we are doing in the 2.x line).

DBCP-438
I need help reviewing the code here.  The null check fix makes the
symptom go away, but I suspect there is a deeper problem here.

DBCP-427
I agree with Vladmir's comment on the re-open, but am hesitant to
change default behavior in a . release.  Interested in other
opinions on this.

DBCP-388
Can moves down the road as [pool] does not yet support this. So bump
to 2.3.

Thanks in advance!

Phil


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[GitHub] commons-collections pull request: Improve ListUtils#longestCommonS...

2015-07-12 Thread kaching88
GitHub user kaching88 opened a pull request:

https://github.com/apache/commons-collections/pull/13

Improve ListUtils#longestCommonSubsequence methods.

Improved version of ListUtils#longestCommonSubsequence to take third or 
more parameters. 
Rawtype lists is because in java 1.6 there is no way to prevent Type 
safety: A generic array is created for a varargs parameter warning every 
method call. 
I also deprecated older two arguments versions of method and write some new 
unit tests to handle threeOrMore-parameter version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kaching88/commons-collections 
longestCommonSubsequence

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/commons-collections/pull/13.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13


commit 5e49f80e17bc317c7e84d883a20c1b8ea1d54a43
Author: kaching88 wa...@o2.pl
Date:   2015-07-12T20:11:30Z

Improve ListUtils#longestCommonSubsequence to take third or more parameters.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE][LAZY] Migrate Commons SCXML to Git

2015-07-12 Thread Woonsan Ko
Hi Benedikt,

I think the svn tree of scxml should go to _moved_to_git, leaving only
SCXMLNowUsesGit.txt. [1]
Would you or someone be able to do that? I don't think I have write
access there.

Thanks in advance,

Woonsan

[1] http://wiki.apache.org/commons/MovingToGit


On Thu, Jul 9, 2015 at 2:50 AM, Benedikt Ritter brit...@apache.org wrote:
 Hello Woonson,

 one think you will have to do after the migration is to create a ticket for
 the github mirror. Otherwise it will continue mirroring the SVN repo.

 Benedikt

 2015-07-09 6:06 GMT+02:00 Woonsan Ko woon...@apache.org:

 Thanks for your support, Ate! :-)

 It was already migrated to git (INFRA-9952). I commented about my
 validations there. Everything seems very fine.
 I will announce it to user/dev community once it gets fully available.

 Cheers,

 Woonsan

 On Wed, Jul 8, 2015 at 7:26 PM, Ate Douma a...@douma.nu wrote:
  Sorry for the too late response, but I would have voted +1 too :)
 
  Ate
 
 
  On 2015-07-08 20:27, Woonsan Ko wrote:
 
  Apache Commons Developers,
 
  This VOTE has passed with the following votes:
 
   +1 Dave Brosius
   +1 James Carman (PMC)
   +1 Gary Gregory (PMC)
   +1 Woonsan Ko
 
  Thank you all for voting!
 
  I will create an INFRA ticket soon and keep you updated about the
  progress and availabilities.
 
  Regards,
 
  Woonsan Ko
 
 
  On Wed, Jul 1, 2015 at 9:50 PM, Woonsan Ko woon...@apache.org wrote:
 
  Hi there,
 
  I think the experiences in Commons Math and Commons Lang using git as
  primary VCS have been successful. Also, we received requests from some
  new people about using git instead (through mailing list and JIRA
  tickets).
  So, I'd like to call a vote to migrate Commons SCXML to git, assuming
  most Commons SCXML developers feel comfortable with switching to git.
  (Please see [1] for summarized info about using git in Apache Commons
  project.)
 
  This vote by lazy consensus will close no sooner than 72 hours from
 now,
  i.e. after 2015-07-07 18:00 EDT.
 
  [ ] +1 go for it
  [ ] +0 OK, but...
  [ ] -0  Not happy about this, because...
  [ ] -1 We should not do this
 
  Thanks!
 
  Woonsan
 
  [1] https://wiki.apache.org/commons/UsingGIT
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




 --
 http://people.apache.org/~britter/
 http://www.systemoutprintln.de/
 http://twitter.com/BenediktRitter
 http://github.com/britter

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-12 Thread Otmar Ertl
On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

 On 07/12/2015 04:58 PM, Phil Steitz wrote:

 On 7/12/15 2:50 AM, Thomas Neidhart wrote:

 On 07/11/2015 09:43 PM, Phil Steitz wrote:

 On 7/11/15 12:29 PM, Thomas Neidhart wrote:

 On 07/11/2015 09:08 PM, Phil Steitz wrote:

 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:


 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

 Do you think it *should* be a private method of the K-S class?

 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.


 OK, for now we can make it private and use the reflection method
 above to test it.


 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.


 Please consider keeping this class.  Consider this a user request.
 I have quite a few applications that use this class for two reasons:


 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.


 Well, it is not really necessary to do everything together and refactor
 the distributions.

 Probably it is better to start the other way round, and describe what I
 want to add, and see how other things fit in:

  * I want a generic Sampler interface, i.e. something like this:
  ** nextSample()
  ** nextSamples(int size)
  ** nextSamples(double[] samples)


 +1

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.


 Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate
 future needs (LongSampler, BooleanSampler (?)).

I would prefer consistent names for distributions and corresponding
samplers. The support of distributions must be of the same data type
as the return values of the corresponding sampler. Therefore, I would
call the samplers for RealDistribution and IntegerDistribution
RealSampler and IntegerSampler, respectively.


 The distributions could either be changed to return such a sampler as
 Gilles proposed (with the advantage that no random instance is tied to
 the distribution itself), or implement the interface directly (with the
 advantage that we would not need to refactor too much).


 Unless I'm missing something, the refactoring would be fairly the same:
 The latter case needs 

Re: [math] random boolean arrays

2015-07-12 Thread Thomas Neidhart
On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 
 Do you think it *should* be a private method of the K-S class?

Right now, I do not see much uses outside the class, but if we decide to
make it public then I would prefer a special util class in the random
package to avoid cluttering the MathArrays class.

Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by Gilles.
One can then create a sampler for any distribution or from other
sources, e.g. when needing a fast and efficient sampler without
replacement (see MATH-1239).

I did not have time to complete a patch, but am working on it.

Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Jenkins build is back to stable : commons-jcs #115

2015-07-12 Thread Apache Jenkins Server
See https://builds.apache.org/job/commons-jcs/115/changes


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Jenkins build is back to stable : commons-jcs » Apache Commons JCS :: Core #115

2015-07-12 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/commons-jcs/org.apache.commons$commons-jcs-core/115/


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Jenkins build is still unstable: commons-jcs » Apache Commons JCS :: Core #114

2015-07-12 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/commons-jcs/org.apache.commons$commons-jcs-core/changes


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Jenkins build is still unstable: commons-jcs #114

2015-07-12 Thread Apache Jenkins Server
See https://builds.apache.org/job/commons-jcs/changes


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE][LAZY] Migrate Commons SCXML to Git

2015-07-12 Thread Woonsan Ko
GitHub integration (https://github.com/apache/commons-scxml) was done
successfully.
I validated it with a github pull request and merging it in the git
repo. It seems to take about 1 hr for the GitHub mirror to take the
changes from the git repo.

I'll also update the site documentation with this:
- https://issues.apache.org/jira/browse/SCXML-235

By the way, could someone please tweet this move on @ApacheCommons?

Enjoy!

Regards,

Woonsan

On Thu, Jul 9, 2015 at 6:14 PM, Woonsan Ko woon...@apache.org wrote:
 Hi there,

 GitHub integration request was made:
 - https://issues.apache.org/jira/browse/INFRA-9961

 Also, SCXML git repository is now writable:
 - https://git-wip-us.apache.org/repos/asf/commons-scxml.git

 Please enjoy working on the new git repo now!

 Cheers,

 Woonsan


 On Thu, Jul 9, 2015 at 2:50 AM, Benedikt Ritter brit...@apache.org wrote:
 Hello Woonson,

 one think you will have to do after the migration is to create a ticket for
 the github mirror. Otherwise it will continue mirroring the SVN repo.

 Benedikt

 2015-07-09 6:06 GMT+02:00 Woonsan Ko woon...@apache.org:

 Thanks for your support, Ate! :-)

 It was already migrated to git (INFRA-9952). I commented about my
 validations there. Everything seems very fine.
 I will announce it to user/dev community once it gets fully available.

 Cheers,

 Woonsan

 On Wed, Jul 8, 2015 at 7:26 PM, Ate Douma a...@douma.nu wrote:
  Sorry for the too late response, but I would have voted +1 too :)
 
  Ate
 
 
  On 2015-07-08 20:27, Woonsan Ko wrote:
 
  Apache Commons Developers,
 
  This VOTE has passed with the following votes:
 
   +1 Dave Brosius
   +1 James Carman (PMC)
   +1 Gary Gregory (PMC)
   +1 Woonsan Ko
 
  Thank you all for voting!
 
  I will create an INFRA ticket soon and keep you updated about the
  progress and availabilities.
 
  Regards,
 
  Woonsan Ko
 
 
  On Wed, Jul 1, 2015 at 9:50 PM, Woonsan Ko woon...@apache.org wrote:
 
  Hi there,
 
  I think the experiences in Commons Math and Commons Lang using git as
  primary VCS have been successful. Also, we received requests from some
  new people about using git instead (through mailing list and JIRA
  tickets).
  So, I'd like to call a vote to migrate Commons SCXML to git, assuming
  most Commons SCXML developers feel comfortable with switching to git.
  (Please see [1] for summarized info about using git in Apache Commons
  project.)
 
  This vote by lazy consensus will close no sooner than 72 hours from
 now,
  i.e. after 2015-07-07 18:00 EDT.
 
  [ ] +1 go for it
  [ ] +0 OK, but...
  [ ] -0  Not happy about this, because...
  [ ] -1 We should not do this
 
  Thanks!
 
  Woonsan
 
  [1] https://wiki.apache.org/commons/UsingGIT
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




 --
 http://people.apache.org/~britter/
 http://www.systemoutprintln.de/
 http://twitter.com/BenediktRitter
 http://github.com/britter

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-12 Thread Thomas Neidhart
On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.
 
 OK, for now we can make it private and use the reflection method
 above to test it.

ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the improved
sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.
 
 Please consider keeping this class.  Consider this a user request. 
 I have quite a few applications that use this class for two reasons:

ok, the reason why I thought the class should be deprecated is because
it was not kept up-to-date with all the new discrete and continuous
distributions that we added in the last 2-3 years. If you think it is
useful, then we can keep it of course.

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.
 
 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.
 
 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.

Well, it is not really necessary to do everything together and refactor
the distributions.

Probably it is better to start the other way round, and describe what I
want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)

there could be a DiscreteSampler and ContinuousSampler interface to
handle the cases for int / double values.

The distributions could either be changed to return such a sampler as
Gilles proposed (with the advantage that no random instance is tied to
the distribution itself), or implement the interface directly (with the
advantage that we would not need to refactor too much).

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).
 
 +1 for sequential sampling.  I don't follow exactly why that
 requires refactoring the distributions; but if it helps in a way I
 don't yet understand, that will help convince me that refactoring
 sampling in the distributions package is worth the user pain.  

as I said above, I wanted to combine two things in one step, maybe it is
better to go step by step.

Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


Re: [math] random boolean arrays

2015-07-12 Thread Gilles

On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

On 07/12/2015 04:58 PM, Phil Steitz wrote:

On 7/12/15 2:50 AM, Thomas Neidhart wrote:

On 07/11/2015 09:43 PM, Phil Steitz wrote:

On 7/11/15 12:29 PM, Thomas Neidhart wrote:

On 07/11/2015 09:08 PM, Phil Steitz wrote:

The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean 
arrays.
  Unfortunately, I think the implementation is not quite random 
(see
comments on the ticket).  To verify it, we need to be able to 
test
the random boolean array generation directly.  To do that, we 
have
to either expose the method (at least as protected) in the KS 
class

or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so 
the
API will have to be something like randomize(boolean[], nTrue).  
It
could go in the swelling MathArrays class, or 
RandomDataGenerator.
The latter probably makes more sense, but the API does not fit 
too

well.  Any ideas?
If it is just for testing purposes, you can also access the 
method in

question via reflection, see an example here:

http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Do you think it *should* be a private method of the K-S class?
Right now, I do not see much uses outside the class, but if we 
decide to
make it public then I would prefer a special util class in the 
random

package to avoid cluttering the MathArrays class.


OK, for now we can make it private and use the reflection method
above to test it.


ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the improved
sampling methods provided by Otmar.


Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by 
Gilles.


Please consider keeping this class.  Consider this a user request.
I have quite a few applications that use this class for two reasons:


ok, the reason why I thought the class should be deprecated is 
because

it was not kept up-to-date with all the new discrete and continuous
distributions that we added in the last 2-3 years. If you think it is
useful, then we can keep it of course.


1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2.  There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.


Well, it is not really necessary to do everything together and 
refactor

the distributions.

Probably it is better to start the other way round, and describe what 
I

want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)


+1


there could be a DiscreteSampler and ContinuousSampler interface to
handle the cases for int / double values.


Perhaps the name should be IntegerSampler and DoubleSampler, to 
accomodate

future needs (LongSampler, BooleanSampler (?)).


The distributions could either be changed to return such a sampler as
Gilles proposed (with the advantage that no random instance is tied 
to
the distribution itself), or implement the interface directly (with 
the

advantage that we would not need to refactor too much).


Unless I'm missing something, the refactoring would be fairly the same:
The latter case needs implementing 3 methods (2 new ones, one with a 
name

change).
The former needs implementing the factory method proposed in MATH-1158,
plus the same methods as above (wrapped in the object returned by the
factory method).


Gilles



One can then create a sampler for any distribution or from other
sources, e.g. when needing a fast and efficient sampler without
replacement (see MATH-1239).


+1 for sequential sampling.  I don't follow exactly why 

Re: [math] random boolean arrays

2015-07-12 Thread Phil Steitz
On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.

OK, for now we can make it private and use the reflection method
above to test it.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.

Please consider keeping this class.  Consider this a user request. 
I have quite a few applications that use this class for two reasons:

1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2.  There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).

+1 for sequential sampling.  I don't follow exactly why that
requires refactoring the distributions; but if it helps in a way I
don't yet understand, that will help convince me that refactoring
sampling in the distributions package is worth the user pain.  

Phil

 I did not have time to complete a patch, but am working on it

 Thomas

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org