Re: [math] random boolean arrays

2015-07-13 Thread Gilles

On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:

On 7/12/15 9:45 PM, Otmar Ertl wrote:
On Sun, Jul 12, 2015 at 8:16 PM, Gilles 
gil...@harfang.homelinux.org wrote:

On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

On 07/12/2015 04:58 PM, Phil Steitz wrote:

On 7/12/15 2:50 AM, Thomas Neidhart wrote:

On 07/11/2015 09:43 PM, Phil Steitz wrote:

On 7/11/15 12:29 PM, Thomas Neidhart wrote:

On 07/11/2015 09:08 PM, Phil Steitz wrote:
The code implemented in MATH-1242 to improve performance of 
KS
monteCarloP in-lines efficient generation of random boolean 
arrays.
  Unfortunately, I think the implementation is not quite 
random (see
comments on the ticket).  To verify it, we need to be able to 
test
the random boolean array generation directly.  To do that, we 
have
to either expose the method (at least as protected) in the KS 
class
or add it somewhere else.  I propose the latter but am not 
sure
where to put it.  For speed, we need to avoid array copies, 
so the
API will have to be something like randomize(boolean[], 
nTrue).  It
could go in the swelling MathArrays class, or 
RandomDataGenerator.
The latter probably makes more sense, but the API does not 
fit too

well.  Any ideas?
If it is just for testing purposes, you can also access the 
method in

question via reflection, see an example here:



http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Do you think it *should* be a private method of the K-S class?
Right now, I do not see much uses outside the class, but if we 
decide to
make it public then I would prefer a special util class in the 
random

package to avoid cluttering the MathArrays class.


OK, for now we can make it private and use the reflection method
above to test it.


ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the 
improved

sampling methods provided by Otmar.


Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by 
Gilles.


Please consider keeping this class.  Consider this a user 
request.
I have quite a few applications that use this class for two 
reasons:


ok, the reason why I thought the class should be deprecated is 
because
it was not kept up-to-date with all the new discrete and 
continuous
distributions that we added in the last 2-3 years. If you think it 
is

useful, then we can keep it of course.


1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for 
each

type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the 
code.


2.  There are quite a few methods in this class that have nothing 
to

do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess 
the

utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by 
this,

if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please 
lets

just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments 
are

needed to adapt to whatever we settle on for sampling in the
distributions package.


Well, it is not really necessary to do everything together and 
refactor

the distributions.

Probably it is better to start the other way round, and describe 
what I

want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)


+1

there could be a DiscreteSampler and ContinuousSampler interface 
to

handle the cases for int / double values.


Perhaps the name should be IntegerSampler and DoubleSampler, to 
accomodate

future needs (LongSampler, BooleanSampler (?)).

I would prefer consistent names for distributions and corresponding
samplers. The support of distributions must be of the same data type
as the return values of the corresponding sampler. Therefore, I 
would

call the samplers for RealDistribution and IntegerDistribution
RealSampler and IntegerSampler, respectively.


Ideally yes.  But there is always the lingering question of what Real
means: the mathematical abstraction or the numerical representation?
The same real distributions could be implemented with floats.
If we ever need implementing a discrete distribution that is able to
use the long range, the Integer in 

Re: [math] random boolean arrays

2015-07-13 Thread Phil Steitz
On 7/12/15 9:45 PM, Otmar Ertl wrote:
 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:


 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.

 OK, for now we can make it private and use the reflection method
 above to test it.

 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.

 Please consider keeping this class.  Consider this a user request.
 I have quite a few applications that use this class for two reasons:

 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.

 Well, it is not really necessary to do everything together and refactor
 the distributions.

 Probably it is better to start the other way round, and describe what I
 want to add, and see how other things fit in:

  * I want a generic Sampler interface, i.e. something like this:
  ** nextSample()
  ** nextSamples(int size)
  ** nextSamples(double[] samples)

 +1

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.

 Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate
 future needs (LongSampler, BooleanSampler (?)).
 I would prefer consistent names for distributions and corresponding
 samplers. The support of distributions must be of the same data type
 as the return values of the corresponding sampler. Therefore, I would
 call the samplers for RealDistribution and IntegerDistribution
 RealSampler and IntegerSampler, respectively.

+1

Phil

 The distributions could either be changed to return such a sampler as
 Gilles proposed (with the advantage that no random instance is tied to
 the distribution itself), or implement the interface directly (with the
 advantage that we would not need to refactor too much).

 Unless I'm missing something, the refactoring would be fairly the 

Re: [math] random boolean arrays

2015-07-13 Thread Otmar Ertl
On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org
wrote:
 On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:

 On 7/12/15 9:45 PM, Otmar Ertl wrote:

 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org
 wrote:

 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

 On 07/12/2015 04:58 PM, Phil Steitz wrote:

 On 7/12/15 2:50 AM, Thomas Neidhart wrote:

 On 07/11/2015 09:43 PM, Phil Steitz wrote:

 On 7/11/15 12:29 PM, Thomas Neidhart wrote:

 On 07/11/2015 09:08 PM, Phil Steitz wrote:

 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
 Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket). To verify it, we need to be able to test
 the random boolean array generation directly. To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else. I propose the latter but am not sure
 where to put it. For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).
 It
 could go in the swelling MathArrays class, or
RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit
too
 well. Any ideas?

 If it is just for testing purposes, you can also access the method
 in
 question via reflection, see an example here:





http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

 Do you think it *should* be a private method of the K-S class?

 Right now, I do not see much uses outside the class, but if we
decide
 to
 make it public then I would prefer a special util class in the
random
 package to avoid cluttering the MathArrays class.


 OK, for now we can make it private and use the reflection method
 above to test it.


 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by
Gilles.


 Please consider keeping this class. Consider this a user request.
 I have quite a few applications that use this class for two reasons:


 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1. One object instance tied to one PRNG that generates data from
 multiple different distributions. This is convenient. Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2. There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator. I guess the
 utility methods get moved out somewhere else. Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way. Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case. I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.


 Well, it is not really necessary to do everything together and
refactor
 the distributions.

 Probably it is better to start the other way round, and describe what
I
 want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)


 +1

Do we really need all those 3 methods? If the functionality provided by the
two latter methods is essential for convenience reasons, we could also
offer utility functions that are able to fill an array with random numbers
from a given sampler, e.g. MathArrays.fill(array, sampler) or
array=MathArrays.generate(sampler, size).

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.


 Perhaps the name should be IntegerSampler and DoubleSampler, to
 accomodate
 future needs (LongSampler, BooleanSampler (?)).

 I would prefer consistent names for distributions and corresponding
 samplers. The support of distributions must be of the same data type
 as the return values of the corresponding sampler. Therefore, I would
 

Re: [math] random boolean arrays

2015-07-13 Thread Phil Steitz
On 7/13/15 12:16 PM, Otmar Ertl wrote:
 On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
 On 7/12/15 9:45 PM, Otmar Ertl wrote:
 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
 Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket). To verify it, we need to be able to test
 the random boolean array generation directly. To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else. I propose the latter but am not sure
 where to put it. For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).
 It
 could go in the swelling MathArrays class, or
 RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit
 too
 well. Any ideas?
 If it is just for testing purposes, you can also access the method
 in
 question via reflection, see an example here:





 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we
 decide
 to
 make it public then I would prefer a special util class in the
 random
 package to avoid cluttering the MathArrays class.

 OK, for now we can make it private and use the reflection method
 above to test it.

 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by
 Gilles.

 Please consider keeping this class. Consider this a user request.
 I have quite a few applications that use this class for two reasons:

 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1. One object instance tied to one PRNG that generates data from
 multiple different distributions. This is convenient. Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2. There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator. I guess the
 utility methods get moved out somewhere else. Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way. Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case. I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.

 Well, it is not really necessary to do everything together and
 refactor
 the distributions.

 Probably it is better to start the other way round, and describe what
 I
 want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)

 +1

 Do we really need all those 3 methods? If the functionality provided by the
 two latter methods is essential for convenience reasons, we could also
 offer utility functions that are able to fill an array with random numbers
 from a given sampler, e.g. MathArrays.fill(array, sampler) or
 array=MathArrays.generate(sampler, size).

In some cases, there may be distribution-specific algorithms for
generating sequences of values that are more efficient than just
calling sample() repeatedly.  So while providing a default impl in
an abstract sampler that just calls sample() repeatedly would make
sense; it might also make sense to allow the distribution to
override with something more efficient.   An example (not exposed
now) is sampling from a 

Re: [math] random boolean arrays

2015-07-13 Thread Phil Steitz
On 7/13/15 12:55 PM, Phil Steitz wrote:
 On 7/13/15 12:16 PM, Otmar Ertl wrote:
 On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
 On 7/12/15 9:45 PM, Otmar Ertl wrote:
 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
 Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket). To verify it, we need to be able to test
 the random boolean array generation directly. To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else. I propose the latter but am not sure
 where to put it. For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).
 It
 could go in the swelling MathArrays class, or
 RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit
 too
 well. Any ideas?
 If it is just for testing purposes, you can also access the method
 in
 question via reflection, see an example here:





 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we
 decide
 to
 make it public then I would prefer a special util class in the
 random
 package to avoid cluttering the MathArrays class.
 OK, for now we can make it private and use the reflection method
 above to test it.
 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by
 Gilles.
 Please consider keeping this class. Consider this a user request.
 I have quite a few applications that use this class for two reasons:
 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1. One object instance tied to one PRNG that generates data from
 multiple different distributions. This is convenient. Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2. There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator. I guess the
 utility methods get moved out somewhere else. Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way. Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case. I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.
 Well, it is not really necessary to do everything together and
 refactor
 the distributions.

 Probably it is better to start the other way round, and describe what
 I
 want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)
 +1

 Do we really need all those 3 methods? If the functionality provided by the
 two latter methods is essential for convenience reasons, we could also
 offer utility functions that are able to fill an array with random numbers
 from a given sampler, e.g. MathArrays.fill(array, sampler) or
 array=MathArrays.generate(sampler, size).
 In some cases, there may be distribution-specific algorithms for
 generating sequences of values that are more efficient than just
 calling sample() repeatedly.  So while providing a default impl in
 an abstract sampler that just calls sample() repeatedly would make
 sense; it might also make sense to allow the distribution to
 override with something more efficient.   An 

Re: [math] random boolean arrays

2015-07-13 Thread Otmar Ertl
On Mon, Jul 13, 2015 at 9:59 PM, Phil Steitz phil.ste...@gmail.com wrote:
 On 7/13/15 12:55 PM, Phil Steitz wrote:
 On 7/13/15 12:16 PM, Otmar Ertl wrote:
 On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
 On 7/12/15 9:45 PM, Otmar Ertl wrote:
 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
 Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket). To verify it, we need to be able to test
 the random boolean array generation directly. To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else. I propose the latter but am not sure
 where to put it. For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).
 It
 could go in the swelling MathArrays class, or
 RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit
 too
 well. Any ideas?
 If it is just for testing purposes, you can also access the method
 in
 question via reflection, see an example here:





 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we
 decide
 to
 make it public then I would prefer a special util class in the
 random
 package to avoid cluttering the MathArrays class.
 OK, for now we can make it private and use the reflection method
 above to test it.
 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by
 Gilles.
 Please consider keeping this class. Consider this a user request.
 I have quite a few applications that use this class for two reasons:
 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1. One object instance tied to one PRNG that generates data from
 multiple different distributions. This is convenient. Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2. There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator. I guess the
 utility methods get moved out somewhere else. Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way. Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case. I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.
 Well, it is not really necessary to do everything together and
 refactor
 the distributions.

 Probably it is better to start the other way round, and describe what
 I
 want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)
 +1

 Do we really need all those 3 methods? If the functionality provided by the
 two latter methods is essential for convenience reasons, we could also
 offer utility functions that are able to fill an array with random numbers
 from a given sampler, e.g. MathArrays.fill(array, sampler) or
 array=MathArrays.generate(sampler, size).
 In some cases, there may be distribution-specific algorithms for
 generating sequences of values that are more efficient than just
 calling sample() repeatedly.  So while providing a default impl in
 an abstract sampler that just calls sample() repeatedly would make
 sense; it might also make sense to 

Re: [math] random boolean arrays

2015-07-13 Thread Phil Steitz
On 7/13/15 1:19 PM, Otmar Ertl wrote:
 On Mon, Jul 13, 2015 at 9:59 PM, Phil Steitz phil.ste...@gmail.com wrote:
 On 7/13/15 12:55 PM, Phil Steitz wrote:
 On 7/13/15 12:16 PM, Otmar Ertl wrote:
 On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
 On 7/12/15 9:45 PM, Otmar Ertl wrote:
 On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
 Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket). To verify it, we need to be able to test
 the random boolean array generation directly. To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else. I propose the latter but am not sure
 where to put it. For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).
 It
 could go in the swelling MathArrays class, or
 RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit
 too
 well. Any ideas?
 If it is just for testing purposes, you can also access the method
 in
 question via reflection, see an example here:





 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we
 decide
 to
 make it public then I would prefer a special util class in the
 random
 package to avoid cluttering the MathArrays class.
 OK, for now we can make it private and use the reflection method
 above to test it.
 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by
 Gilles.
 Please consider keeping this class. Consider this a user request.
 I have quite a few applications that use this class for two reasons:
 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1. One object instance tied to one PRNG that generates data from
 multiple different distributions. This is convenient. Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2. There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator. I guess the
 utility methods get moved out somewhere else. Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way. Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case. I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.
 Well, it is not really necessary to do everything together and
 refactor
 the distributions.

 Probably it is better to start the other way round, and describe what
 I
 want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)
 +1

 Do we really need all those 3 methods? If the functionality provided by the
 two latter methods is essential for convenience reasons, we could also
 offer utility functions that are able to fill an array with random numbers
 from a given sampler, e.g. MathArrays.fill(array, sampler) or
 array=MathArrays.generate(sampler, size).
 In some cases, there may be distribution-specific algorithms for
 generating sequences of values that are more efficient than just
 calling sample() repeatedly.  So while providing a default impl in
 an abstract sampler that just calls sample() repeatedly would make

Re: [math] random boolean arrays

2015-07-12 Thread Phil Steitz
On 7/12/15 10:38 AM, Thomas Neidhart wrote:
 On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.
 OK, for now we can make it private and use the reflection method
 above to test it.
 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.
 Please consider keeping this class.  Consider this a user request. 
 I have quite a few applications that use this class for two reasons:
 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

Thanks.  I don't think it needs to include data generation methods
for all distributions.  Just the ones most commonly used.  It is a
convenience class providing common random data generation methods
using a shared RandomGenerator.  All the distributions that I need
are there.  We can add others if / when users request them / provide
patches.

Phil

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.
 Well, it is not really necessary to do everything together and refactor
 the distributions.

 Probably it is better to start the other way round, and describe what I
 want to add, and see how other things fit in:

  * I want a generic Sampler interface, i.e. something like this:
  ** nextSample()
  ** nextSamples(int size)
  ** nextSamples(double[] samples)

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.

 The distributions could either be changed to return such a sampler as
 Gilles proposed (with the advantage that no random instance is tied to
 the distribution itself), or implement the interface directly (with the
 advantage that we would not need to refactor too much).

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).
 +1 for sequential sampling.  I don't follow exactly why that
 requires refactoring the distributions; but if it helps in a way I
 don't yet understand, 

Re: [math] random boolean arrays

2015-07-12 Thread Otmar Ertl
On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote:
 On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

 On 07/12/2015 04:58 PM, Phil Steitz wrote:

 On 7/12/15 2:50 AM, Thomas Neidhart wrote:

 On 07/11/2015 09:43 PM, Phil Steitz wrote:

 On 7/11/15 12:29 PM, Thomas Neidhart wrote:

 On 07/11/2015 09:08 PM, Phil Steitz wrote:

 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:


 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

 Do you think it *should* be a private method of the K-S class?

 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.


 OK, for now we can make it private and use the reflection method
 above to test it.


 ok, but I guess it is also fine to make it package private as sebb
 suggested. We did something similar recently for some of the improved
 sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.


 Please consider keeping this class.  Consider this a user request.
 I have quite a few applications that use this class for two reasons:


 ok, the reason why I thought the class should be deprecated is because
 it was not kept up-to-date with all the new discrete and continuous
 distributions that we added in the last 2-3 years. If you think it is
 useful, then we can keep it of course.

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.

 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.

 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.


 Well, it is not really necessary to do everything together and refactor
 the distributions.

 Probably it is better to start the other way round, and describe what I
 want to add, and see how other things fit in:

  * I want a generic Sampler interface, i.e. something like this:
  ** nextSample()
  ** nextSamples(int size)
  ** nextSamples(double[] samples)


 +1

 there could be a DiscreteSampler and ContinuousSampler interface to
 handle the cases for int / double values.


 Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate
 future needs (LongSampler, BooleanSampler (?)).

I would prefer consistent names for distributions and corresponding
samplers. The support of distributions must be of the same data type
as the return values of the corresponding sampler. Therefore, I would
call the samplers for RealDistribution and IntegerDistribution
RealSampler and IntegerSampler, respectively.


 The distributions could either be changed to return such a sampler as
 Gilles proposed (with the advantage that no random instance is tied to
 the distribution itself), or implement the interface directly (with the
 advantage that we would not need to refactor too much).


 Unless I'm missing something, the refactoring would be fairly the same:
 The latter case needs 

Re: [math] random boolean arrays

2015-07-12 Thread Thomas Neidhart
On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 
 Do you think it *should* be a private method of the K-S class?

Right now, I do not see much uses outside the class, but if we decide to
make it public then I would prefer a special util class in the random
package to avoid cluttering the MathArrays class.

Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by Gilles.
One can then create a sampler for any distribution or from other
sources, e.g. when needing a fast and efficient sampler without
replacement (see MATH-1239).

I did not have time to complete a patch, but am working on it.

Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-12 Thread Thomas Neidhart
On 07/12/2015 04:58 PM, Phil Steitz wrote:
 On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.
 
 OK, for now we can make it private and use the reflection method
 above to test it.

ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the improved
sampling methods provided by Otmar.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.
 
 Please consider keeping this class.  Consider this a user request. 
 I have quite a few applications that use this class for two reasons:

ok, the reason why I thought the class should be deprecated is because
it was not kept up-to-date with all the new discrete and continuous
distributions that we added in the last 2-3 years. If you think it is
useful, then we can keep it of course.

 1.  One object instance tied to one PRNG that generates data from
 multiple different distributions.   This is convenient.   Sure, I
 could refactor all of these apps to instantiate new objects for each
 type of generated data and hopefully still be able to peg them to
 one PRNG; but that is needless work that also complicates the code.
 
 2.  There are quite a few methods in this class that have nothing to
 do with sampling (nextPermutation, nextHexString, nextSecureXxx,
 etc) but which conveniently share the RandomGenerator.  I guess the
 utility methods get moved out somewhere else.  Again, I end up
 having to refactor all of my code that uses them and when I want
 simulations to be based on a single PRNG, I have to find a way to
 pass the RandomGenerator around to them.
 
 I don't yet see the need to refactor the sampling support in the
 distributions package; but as my own apps are not impacted by this,
 if everyone else sees the user impact of the refactoring as
 outweighed by the benefit, I won't stand in the way.   Please lets
 just keep the RandomDataGenerator convenience class in the random
 package in any case.  I will take care of whatever adjustments are
 needed to adapt to whatever we settle on for sampling in the
 distributions package.

Well, it is not really necessary to do everything together and refactor
the distributions.

Probably it is better to start the other way round, and describe what I
want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)

there could be a DiscreteSampler and ContinuousSampler interface to
handle the cases for int / double values.

The distributions could either be changed to return such a sampler as
Gilles proposed (with the advantage that no random instance is tied to
the distribution itself), or implement the interface directly (with the
advantage that we would not need to refactor too much).

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).
 
 +1 for sequential sampling.  I don't follow exactly why that
 requires refactoring the distributions; but if it helps in a way I
 don't yet understand, that will help convince me that refactoring
 sampling in the distributions package is worth the user pain.  

as I said above, I wanted to combine two things in one step, maybe it is
better to go step by step.

Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


Re: [math] random boolean arrays

2015-07-12 Thread Gilles

On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:

On 07/12/2015 04:58 PM, Phil Steitz wrote:

On 7/12/15 2:50 AM, Thomas Neidhart wrote:

On 07/11/2015 09:43 PM, Phil Steitz wrote:

On 7/11/15 12:29 PM, Thomas Neidhart wrote:

On 07/11/2015 09:08 PM, Phil Steitz wrote:

The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean 
arrays.
  Unfortunately, I think the implementation is not quite random 
(see
comments on the ticket).  To verify it, we need to be able to 
test
the random boolean array generation directly.  To do that, we 
have
to either expose the method (at least as protected) in the KS 
class

or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so 
the
API will have to be something like randomize(boolean[], nTrue).  
It
could go in the swelling MathArrays class, or 
RandomDataGenerator.
The latter probably makes more sense, but the API does not fit 
too

well.  Any ideas?
If it is just for testing purposes, you can also access the 
method in

question via reflection, see an example here:

http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Do you think it *should* be a private method of the K-S class?
Right now, I do not see much uses outside the class, but if we 
decide to
make it public then I would prefer a special util class in the 
random

package to avoid cluttering the MathArrays class.


OK, for now we can make it private and use the reflection method
above to test it.


ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the improved
sampling methods provided by Otmar.


Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by 
Gilles.


Please consider keeping this class.  Consider this a user request.
I have quite a few applications that use this class for two reasons:


ok, the reason why I thought the class should be deprecated is 
because

it was not kept up-to-date with all the new discrete and continuous
distributions that we added in the last 2-3 years. If you think it is
useful, then we can keep it of course.


1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2.  There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.


Well, it is not really necessary to do everything together and 
refactor

the distributions.

Probably it is better to start the other way round, and describe what 
I

want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)


+1


there could be a DiscreteSampler and ContinuousSampler interface to
handle the cases for int / double values.


Perhaps the name should be IntegerSampler and DoubleSampler, to 
accomodate

future needs (LongSampler, BooleanSampler (?)).


The distributions could either be changed to return such a sampler as
Gilles proposed (with the advantage that no random instance is tied 
to
the distribution itself), or implement the interface directly (with 
the

advantage that we would not need to refactor too much).


Unless I'm missing something, the refactoring would be fairly the same:
The latter case needs implementing 3 methods (2 new ones, one with a 
name

change).
The former needs implementing the factory method proposed in MATH-1158,
plus the same methods as above (wrapped in the object returned by the
factory method).


Gilles



One can then create a sampler for any distribution or from other
sources, e.g. when needing a fast and efficient sampler without
replacement (see MATH-1239).


+1 for sequential sampling.  I don't follow exactly why 

Re: [math] random boolean arrays

2015-07-12 Thread Phil Steitz
On 7/12/15 2:50 AM, Thomas Neidhart wrote:
 On 07/11/2015 09:43 PM, Phil Steitz wrote:
 On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
 Do you think it *should* be a private method of the K-S class?
 Right now, I do not see much uses outside the class, but if we decide to
 make it public then I would prefer a special util class in the random
 package to avoid cluttering the MathArrays class.

OK, for now we can make it private and use the reflection method
above to test it.

 Regarding the RandomDataGenerator: I think this class should be
 deprecated and replaced by a Sampler interface as proposed by Gilles.

Please consider keeping this class.  Consider this a user request. 
I have quite a few applications that use this class for two reasons:

1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2.  There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.

 One can then create a sampler for any distribution or from other
 sources, e.g. when needing a fast and efficient sampler without
 replacement (see MATH-1239).

+1 for sequential sampling.  I don't follow exactly why that
requires refactoring the distributions; but if it helps in a way I
don't yet understand, that will help convince me that refactoring
sampling in the distributions package is worth the user pain.  

Phil

 I did not have time to complete a patch, but am working on it

 Thomas

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Phil Steitz
On 7/11/15 12:29 PM, Thomas Neidhart wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?
 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Do you think it *should* be a private method of the K-S class?

Phik
 Thomas

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[math] random boolean arrays

2015-07-11 Thread Phil Steitz
The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean arrays.
  Unfortunately, I think the implementation is not quite random (see
comments on the ticket).  To verify it, we need to be able to test
the random boolean array generation directly.  To do that, we have
to either expose the method (at least as protected) in the KS class
or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so the
API will have to be something like randomize(boolean[], nTrue).  It
could go in the swelling MathArrays class, or RandomDataGenerator. 
The latter probably makes more sense, but the API does not fit too
well.  Any ideas?

Phil


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Thomas Neidhart
On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator. 
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

If it is just for testing purposes, you can also access the method in
question via reflection, see an example here:
http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread sebb
On 11 July 2015 at 20:29, Thomas Neidhart thomas.neidh...@gmail.com wrote:
 On 07/11/2015 09:08 PM, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).  It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

 If it is just for testing purposes, you can also access the method in
 question via reflection, see an example here:
 http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Or make it package-protected (with a comment saying why this was done)
and create the unit test in the same package.

If the tests really need to go in a different package, then add a
public access method in the test class in the same package

 Thomas

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Phil Steitz
On 7/11/15 3:38 PM, Gilles wrote:
 On Sat, 11 Jul 2015 15:12:29 -0700, Phil Steitz wrote:
 On 7/11/15 3:06 PM, Gilles wrote:
 On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean
 arrays.
   Unfortunately, I think the implementation is not quite random
 (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS
 class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).

 Are you sure that copying is an issue for speed?
 I mean, the randomization will anyways copy new values into the
 array...

 The algorithm is used in a tight loop that would be slowed down
 considerably by allocating and copying arrays to / from the stack.

 Numbers?
 IIRC, someone said: First make it correct, then make it faster (if
 necessary). [I.e. the computation may always be inlined later.]

This is part of an optimization.  It replaces a much slower
implementation that did a lot more copying of data.

 [And I don't get the copying to / from the stack...]

Yeah, that's not the issue.  It's all the allocations and garbage
collection that would slow things down. 

Phil

 Gilles

 It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

 Isn't there some relationship with
   https://issues.apache.org/jira/browse/MATH-1158
 ?

 No.

 Phil


 Gilles


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Gilles

On Sat, 11 Jul 2015 15:12:29 -0700, Phil Steitz wrote:

On 7/11/15 3:06 PM, Gilles wrote:

On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote:

The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean arrays.
  Unfortunately, I think the implementation is not quite random 
(see

comments on the ticket).  To verify it, we need to be able to test
the random boolean array generation directly.  To do that, we have
to either expose the method (at least as protected) in the KS class
or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so the
API will have to be something like randomize(boolean[], nTrue).


Are you sure that copying is an issue for speed?
I mean, the randomization will anyways copy new values into the
array...


The algorithm is used in a tight loop that would be slowed down
considerably by allocating and copying arrays to / from the stack.


Numbers?
IIRC, someone said: First make it correct, then make it faster (if
necessary). [I.e. the computation may always be inlined later.]

[And I don't get the copying to / from the stack...]

Gilles


It
could go in the swelling MathArrays class, or RandomDataGenerator.
The latter probably makes more sense, but the API does not fit too
well.  Any ideas?


Isn't there some relationship with
  https://issues.apache.org/jira/browse/MATH-1158
?


No.

Phil



Gilles



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Ole Ersoy

How about a Randomize utility class?

Cheers,
- Ole



On 07/11/2015 02:40 PM, sebb wrote:

On 11 July 2015 at 20:29, Thomas Neidhart thomas.neidh...@gmail.com wrote:

On 07/11/2015 09:08 PM, Phil Steitz wrote:

The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
comments on the ticket).  To verify it, we need to be able to test
the random boolean array generation directly.  To do that, we have
to either expose the method (at least as protected) in the KS class
or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so the
API will have to be something like randomize(boolean[], nTrue).  It
could go in the swelling MathArrays class, or RandomDataGenerator.
The latter probably makes more sense, but the API does not fit too
well.  Any ideas?

If it is just for testing purposes, you can also access the method in
question via reflection, see an example here:
http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes

Or make it package-protected (with a comment saying why this was done)
and create the unit test in the same package.

If the tests really need to go in a different package, then add a
public access method in the test class in the same package


Thomas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

.




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Gilles

On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote:

The code implemented in MATH-1242 to improve performance of KS
monteCarloP in-lines efficient generation of random boolean arrays.
  Unfortunately, I think the implementation is not quite random (see
comments on the ticket).  To verify it, we need to be able to test
the random boolean array generation directly.  To do that, we have
to either expose the method (at least as protected) in the KS class
or add it somewhere else.  I propose the latter but am not sure
where to put it.  For speed, we need to avoid array copies, so the
API will have to be something like randomize(boolean[], nTrue).


Are you sure that copying is an issue for speed?
I mean, the randomization will anyways copy new values into the 
array...



It
could go in the swelling MathArrays class, or RandomDataGenerator.
The latter probably makes more sense, but the API does not fit too
well.  Any ideas?


Isn't there some relationship with
  https://issues.apache.org/jira/browse/MATH-1158
?


Gilles


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] random boolean arrays

2015-07-11 Thread Phil Steitz
On 7/11/15 3:06 PM, Gilles wrote:
 On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote:
 The code implemented in MATH-1242 to improve performance of KS
 monteCarloP in-lines efficient generation of random boolean arrays.
   Unfortunately, I think the implementation is not quite random (see
 comments on the ticket).  To verify it, we need to be able to test
 the random boolean array generation directly.  To do that, we have
 to either expose the method (at least as protected) in the KS class
 or add it somewhere else.  I propose the latter but am not sure
 where to put it.  For speed, we need to avoid array copies, so the
 API will have to be something like randomize(boolean[], nTrue).

 Are you sure that copying is an issue for speed?
 I mean, the randomization will anyways copy new values into the
 array...

The algorithm is used in a tight loop that would be slowed down
considerably by allocating and copying arrays to / from the stack. 

 It
 could go in the swelling MathArrays class, or RandomDataGenerator.
 The latter probably makes more sense, but the API does not fit too
 well.  Any ideas?

 Isn't there some relationship with
   https://issues.apache.org/jira/browse/MATH-1158
 ?

No.

Phil


 Gilles


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org