Re: [math] random boolean arrays
On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). I would prefer consistent names for distributions and corresponding samplers. The support of distributions must be of the same data type as the return values of the corresponding sampler. Therefore, I would call the samplers for RealDistribution and IntegerDistribution RealSampler and IntegerSampler, respectively. Ideally yes. But there is always the lingering question of what Real means: the mathematical abstraction or the numerical representation? The same real distributions could be implemented with floats. If we ever need implementing a discrete distribution that is able to use the long range, the Integer in
Re: [math] random boolean arrays
On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). I would prefer consistent names for distributions and corresponding samplers. The support of distributions must be of the same data type as the return values of the corresponding sampler. Therefore, I would call the samplers for RealDistribution and IntegerDistribution RealSampler and IntegerSampler, respectively. +1 Phil The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). Unless I'm missing something, the refactoring would be fairly the
Re: [math] random boolean arrays
On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org wrote: On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size). there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). I would prefer consistent names for distributions and corresponding samplers. The support of distributions must be of the same data type as the return values of the corresponding sampler. Therefore, I would
Re: [math] random boolean arrays
On 7/13/15 12:16 PM, Otmar Ertl wrote: On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org wrote: On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size). In some cases, there may be distribution-specific algorithms for generating sequences of values that are more efficient than just calling sample() repeatedly. So while providing a default impl in an abstract sampler that just calls sample() repeatedly would make sense; it might also make sense to allow the distribution to override with something more efficient. An example (not exposed now) is sampling from a
Re: [math] random boolean arrays
On 7/13/15 12:55 PM, Phil Steitz wrote: On 7/13/15 12:16 PM, Otmar Ertl wrote: On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org wrote: On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size). In some cases, there may be distribution-specific algorithms for generating sequences of values that are more efficient than just calling sample() repeatedly. So while providing a default impl in an abstract sampler that just calls sample() repeatedly would make sense; it might also make sense to allow the distribution to override with something more efficient. An
Re: [math] random boolean arrays
On Mon, Jul 13, 2015 at 9:59 PM, Phil Steitz phil.ste...@gmail.com wrote: On 7/13/15 12:55 PM, Phil Steitz wrote: On 7/13/15 12:16 PM, Otmar Ertl wrote: On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org wrote: On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size). In some cases, there may be distribution-specific algorithms for generating sequences of values that are more efficient than just calling sample() repeatedly. So while providing a default impl in an abstract sampler that just calls sample() repeatedly would make sense; it might also make sense to
Re: [math] random boolean arrays
On 7/13/15 1:19 PM, Otmar Ertl wrote: On Mon, Jul 13, 2015 at 9:59 PM, Phil Steitz phil.ste...@gmail.com wrote: On 7/13/15 12:55 PM, Phil Steitz wrote: On 7/13/15 12:16 PM, Otmar Ertl wrote: On Mon, Jul 13, 2015 at 3:51 PM, Gilles gil...@harfang.homelinux.org wrote: On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: On 7/12/15 9:45 PM, Otmar Ertl wrote: On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size). In some cases, there may be distribution-specific algorithms for generating sequences of values that are more efficient than just calling sample() repeatedly. So while providing a default impl in an abstract sampler that just calls sample() repeatedly would make
Re: [math] random boolean arrays
On 7/12/15 10:38 AM, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. Thanks. I don't think it needs to include data generation methods for all distributions. Just the ones most commonly used. It is a convenience class providing common random data generation methods using a shared RandomGenerator. All the distributions that I need are there. We can add others if / when users request them / provide patches. Phil 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand,
Re: [math] random boolean arrays
On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). I would prefer consistent names for distributions and corresponding samplers. The support of distributions must be of the same data type as the return values of the corresponding sampler. Therefore, I would call the samplers for RealDistribution and IntegerDistribution RealSampler and IntegerSampler, respectively. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). Unless I'm missing something, the refactoring would be fairly the same: The latter case needs
Re: [math] random boolean arrays
On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). I did not have time to complete a patch, but am working on it. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand, that will help convince me that refactoring sampling in the distributions package is worth the user pain. as I said above, I wanted to combine two things in one step, maybe it is better to go step by step. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). Unless I'm missing something, the refactoring would be fairly the same: The latter case needs implementing 3 methods (2 new ones, one with a name change). The former needs implementing the factory method proposed in MATH-1158, plus the same methods as above (wrapped in the object returned by the factory method). Gilles One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why
Re: [math] random boolean arrays
On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand, that will help convince me that refactoring sampling in the distributions package is worth the user pain. Phil I did not have time to complete a patch, but am working on it Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Phik Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[math] random boolean arrays
The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? Phil - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 11 July 2015 at 20:29, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Or make it package-protected (with a comment saying why this was done) and create the unit test in the same package. If the tests really need to go in a different package, then add a public access method in the test class in the same package Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 7/11/15 3:38 PM, Gilles wrote: On Sat, 11 Jul 2015 15:12:29 -0700, Phil Steitz wrote: On 7/11/15 3:06 PM, Gilles wrote: On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). Are you sure that copying is an issue for speed? I mean, the randomization will anyways copy new values into the array... The algorithm is used in a tight loop that would be slowed down considerably by allocating and copying arrays to / from the stack. Numbers? IIRC, someone said: First make it correct, then make it faster (if necessary). [I.e. the computation may always be inlined later.] This is part of an optimization. It replaces a much slower implementation that did a lot more copying of data. [And I don't get the copying to / from the stack...] Yeah, that's not the issue. It's all the allocations and garbage collection that would slow things down. Phil Gilles It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? Isn't there some relationship with https://issues.apache.org/jira/browse/MATH-1158 ? No. Phil Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On Sat, 11 Jul 2015 15:12:29 -0700, Phil Steitz wrote: On 7/11/15 3:06 PM, Gilles wrote: On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). Are you sure that copying is an issue for speed? I mean, the randomization will anyways copy new values into the array... The algorithm is used in a tight loop that would be slowed down considerably by allocating and copying arrays to / from the stack. Numbers? IIRC, someone said: First make it correct, then make it faster (if necessary). [I.e. the computation may always be inlined later.] [And I don't get the copying to / from the stack...] Gilles It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? Isn't there some relationship with https://issues.apache.org/jira/browse/MATH-1158 ? No. Phil Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
How about a Randomize utility class? Cheers, - Ole On 07/11/2015 02:40 PM, sebb wrote: On 11 July 2015 at 20:29, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Or make it package-protected (with a comment saying why this was done) and create the unit test in the same package. If the tests really need to go in a different package, then add a public access method in the test class in the same package Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org . - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). Are you sure that copying is an issue for speed? I mean, the randomization will anyways copy new values into the array... It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? Isn't there some relationship with https://issues.apache.org/jira/browse/MATH-1158 ? Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 7/11/15 3:06 PM, Gilles wrote: On Sat, 11 Jul 2015 12:08:16 -0700, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). Are you sure that copying is an issue for speed? I mean, the randomization will anyways copy new values into the array... The algorithm is used in a tight loop that would be slowed down considerably by allocating and copying arrays to / from the stack. It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? Isn't there some relationship with https://issues.apache.org/jira/browse/MATH-1158 ? No. Phil Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org