Re: [math] random boolean arrays
On 7/12/15 10:38 AM, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. Thanks. I don't think it needs to include data generation methods for all distributions. Just the ones most commonly used. It is a convenience class providing common random data generation methods using a shared RandomGenerator. All the distributions that I need are there. We can add others if / when users request them / provide patches. Phil 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand,
[dbcp] 2.2 plan - feedback requested
There are a few decisions to make before we start rolling RCs for DBCP 2.2. I would appreciate feedback / patches for the following: DBCP-436 Seems a reasonable request but hard to implement and test. I would say either WONT_FIX or bump to 3.0. I am curious about how important this actually is with modern drivers (i.e., whether there actually is much cost in what we are doing in the 2.x line). DBCP-438 I need help reviewing the code here. The null check fix makes the symptom go away, but I suspect there is a deeper problem here. DBCP-427 I agree with Vladmir's comment on the re-open, but am hesitant to change default behavior in a . release. Interested in other opinions on this. DBCP-388 Can moves down the road as [pool] does not yet support this. So bump to 2.3. Thanks in advance! Phil - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-collections pull request: Improve ListUtils#longestCommonS...
GitHub user kaching88 opened a pull request: https://github.com/apache/commons-collections/pull/13 Improve ListUtils#longestCommonSubsequence methods. Improved version of ListUtils#longestCommonSubsequence to take third or more parameters. Rawtype lists is because in java 1.6 there is no way to prevent Type safety: A generic array is created for a varargs parameter warning every method call. I also deprecated older two arguments versions of method and write some new unit tests to handle threeOrMore-parameter version. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kaching88/commons-collections longestCommonSubsequence Alternatively you can review and apply these changes as the patch at: https://github.com/apache/commons-collections/pull/13.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13 commit 5e49f80e17bc317c7e84d883a20c1b8ea1d54a43 Author: kaching88 wa...@o2.pl Date: 2015-07-12T20:11:30Z Improve ListUtils#longestCommonSubsequence to take third or more parameters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE][LAZY] Migrate Commons SCXML to Git
Hi Benedikt, I think the svn tree of scxml should go to _moved_to_git, leaving only SCXMLNowUsesGit.txt. [1] Would you or someone be able to do that? I don't think I have write access there. Thanks in advance, Woonsan [1] http://wiki.apache.org/commons/MovingToGit On Thu, Jul 9, 2015 at 2:50 AM, Benedikt Ritter brit...@apache.org wrote: Hello Woonson, one think you will have to do after the migration is to create a ticket for the github mirror. Otherwise it will continue mirroring the SVN repo. Benedikt 2015-07-09 6:06 GMT+02:00 Woonsan Ko woon...@apache.org: Thanks for your support, Ate! :-) It was already migrated to git (INFRA-9952). I commented about my validations there. Everything seems very fine. I will announce it to user/dev community once it gets fully available. Cheers, Woonsan On Wed, Jul 8, 2015 at 7:26 PM, Ate Douma a...@douma.nu wrote: Sorry for the too late response, but I would have voted +1 too :) Ate On 2015-07-08 20:27, Woonsan Ko wrote: Apache Commons Developers, This VOTE has passed with the following votes: +1 Dave Brosius +1 James Carman (PMC) +1 Gary Gregory (PMC) +1 Woonsan Ko Thank you all for voting! I will create an INFRA ticket soon and keep you updated about the progress and availabilities. Regards, Woonsan Ko On Wed, Jul 1, 2015 at 9:50 PM, Woonsan Ko woon...@apache.org wrote: Hi there, I think the experiences in Commons Math and Commons Lang using git as primary VCS have been successful. Also, we received requests from some new people about using git instead (through mailing list and JIRA tickets). So, I'd like to call a vote to migrate Commons SCXML to git, assuming most Commons SCXML developers feel comfortable with switching to git. (Please see [1] for summarized info about using git in Apache Commons project.) This vote by lazy consensus will close no sooner than 72 hours from now, i.e. after 2015-07-07 18:00 EDT. [ ] +1 go for it [ ] +0 OK, but... [ ] -0 Not happy about this, because... [ ] -1 We should not do this Thanks! Woonsan [1] https://wiki.apache.org/commons/UsingGIT - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On Sun, Jul 12, 2015 at 8:16 PM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). I would prefer consistent names for distributions and corresponding samplers. The support of distributions must be of the same data type as the return values of the corresponding sampler. Therefore, I would call the samplers for RealDistribution and IntegerDistribution RealSampler and IntegerSampler, respectively. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). Unless I'm missing something, the refactoring would be fairly the same: The latter case needs
Re: [math] random boolean arrays
On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). I did not have time to complete a patch, but am working on it. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Jenkins build is back to stable : commons-jcs #115
See https://builds.apache.org/job/commons-jcs/115/changes - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Jenkins build is back to stable : commons-jcs » Apache Commons JCS :: Core #115
See https://builds.apache.org/job/commons-jcs/org.apache.commons$commons-jcs-core/115/ - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Jenkins build is still unstable: commons-jcs » Apache Commons JCS :: Core #114
See https://builds.apache.org/job/commons-jcs/org.apache.commons$commons-jcs-core/changes - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Jenkins build is still unstable: commons-jcs #114
See https://builds.apache.org/job/commons-jcs/changes - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE][LAZY] Migrate Commons SCXML to Git
GitHub integration (https://github.com/apache/commons-scxml) was done successfully. I validated it with a github pull request and merging it in the git repo. It seems to take about 1 hr for the GitHub mirror to take the changes from the git repo. I'll also update the site documentation with this: - https://issues.apache.org/jira/browse/SCXML-235 By the way, could someone please tweet this move on @ApacheCommons? Enjoy! Regards, Woonsan On Thu, Jul 9, 2015 at 6:14 PM, Woonsan Ko woon...@apache.org wrote: Hi there, GitHub integration request was made: - https://issues.apache.org/jira/browse/INFRA-9961 Also, SCXML git repository is now writable: - https://git-wip-us.apache.org/repos/asf/commons-scxml.git Please enjoy working on the new git repo now! Cheers, Woonsan On Thu, Jul 9, 2015 at 2:50 AM, Benedikt Ritter brit...@apache.org wrote: Hello Woonson, one think you will have to do after the migration is to create a ticket for the github mirror. Otherwise it will continue mirroring the SVN repo. Benedikt 2015-07-09 6:06 GMT+02:00 Woonsan Ko woon...@apache.org: Thanks for your support, Ate! :-) It was already migrated to git (INFRA-9952). I commented about my validations there. Everything seems very fine. I will announce it to user/dev community once it gets fully available. Cheers, Woonsan On Wed, Jul 8, 2015 at 7:26 PM, Ate Douma a...@douma.nu wrote: Sorry for the too late response, but I would have voted +1 too :) Ate On 2015-07-08 20:27, Woonsan Ko wrote: Apache Commons Developers, This VOTE has passed with the following votes: +1 Dave Brosius +1 James Carman (PMC) +1 Gary Gregory (PMC) +1 Woonsan Ko Thank you all for voting! I will create an INFRA ticket soon and keep you updated about the progress and availabilities. Regards, Woonsan Ko On Wed, Jul 1, 2015 at 9:50 PM, Woonsan Ko woon...@apache.org wrote: Hi there, I think the experiences in Commons Math and Commons Lang using git as primary VCS have been successful. Also, we received requests from some new people about using git instead (through mailing list and JIRA tickets). So, I'd like to call a vote to migrate Commons SCXML to git, assuming most Commons SCXML developers feel comfortable with switching to git. (Please see [1] for summarized info about using git in Apache Commons project.) This vote by lazy consensus will close no sooner than 72 hours from now, i.e. after 2015-07-07 18:00 EDT. [ ] +1 go for it [ ] +0 OK, but... [ ] -0 Not happy about this, because... [ ] -1 We should not do this Thanks! Woonsan [1] https://wiki.apache.org/commons/UsingGIT - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand, that will help convince me that refactoring sampling in the distributions package is worth the user pain. as I said above, I wanted to combine two things in one step, maybe it is better to go step by step. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] random boolean arrays
On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: On 07/12/2015 04:58 PM, Phil Steitz wrote: On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. ok, but I guess it is also fine to make it package private as sebb suggested. We did something similar recently for some of the improved sampling methods provided by Otmar. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is useful, then we can keep it of course. 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. Well, it is not really necessary to do everything together and refactor the distributions. Probably it is better to start the other way round, and describe what I want to add, and see how other things fit in: * I want a generic Sampler interface, i.e. something like this: ** nextSample() ** nextSamples(int size) ** nextSamples(double[] samples) +1 there could be a DiscreteSampler and ContinuousSampler interface to handle the cases for int / double values. Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate future needs (LongSampler, BooleanSampler (?)). The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the advantage that we would not need to refactor too much). Unless I'm missing something, the refactoring would be fairly the same: The latter case needs implementing 3 methods (2 new ones, one with a name change). The former needs implementing the factory method proposed in MATH-1158, plus the same methods as above (wrapped in the object returned by the factory method). Gilles One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why
Re: [math] random boolean arrays
On 7/12/15 2:50 AM, Thomas Neidhart wrote: On 07/11/2015 09:43 PM, Phil Steitz wrote: On 7/11/15 12:29 PM, Thomas Neidhart wrote: On 07/11/2015 09:08 PM, Phil Steitz wrote: The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too well. Any ideas? If it is just for testing purposes, you can also access the method in question via reflection, see an example here: http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes Do you think it *should* be a private method of the K-S class? Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random package to avoid cluttering the MathArrays class. OK, for now we can make it private and use the reflection method above to test it. Regarding the RandomDataGenerator: I think this class should be deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. One can then create a sampler for any distribution or from other sources, e.g. when needing a fast and efficient sampler without replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand, that will help convince me that refactoring sampling in the distributions package is worth the user pain. Phil I did not have time to complete a patch, but am working on it Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org