Re: Post Meetup Meetup was Re: Unit test lag?
On Mon, Jan 18, 2010 at 4:46 PM, Grant Ingersoll wrote: > > On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote: > > > If it's SF on Thursday, someone will have to have a beer as my proxy. > > I volunteer ;-) > You're on. > Sounds like a we have a post meetup meetup brewing. I'm not familiar with > the area, anyone know where we can go afterwards? Also, I'll need a ride > back to San Mateo if possible. > There are lots of places if we can build a caravan to get to Castro street (Mountain View, 0.5-1 miles) or Murphy Street (Sunnyvale 2 miles). My house could even be available, but is a bit more of a mess than guests are usually allowed to see and doesn't have a beer tap. Regarding the ride to San Mateo, I would be happy to help transport you to the train which might satisfy your needs if there isn't somebody else headed that way. I am also happy to help transport anybody coming south on the train to the Dojo. Mountain View would probably be the better train stop to aim for if you want to take me up on my offer. -- Ted Dunning, CTO DeepDyve
Post Meetup Meetup was Re: Unit test lag?
On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote: > If it's SF on Thursday, someone will have to have a beer as my proxy. I volunteer ;-) Sounds like a we have a post meetup meetup brewing. I'm not familiar with the area, anyone know where we can go afterwards? Also, I'll need a ride back to San Mateo if possible. -Grant
Re: Unit test lag?
Hmm, if all you guys are going to be there, I may need to push back my flight - I'm scheduled to fly *out* of SFO right around the time of the Meetup, but if I can push back that flight, I will. -jake On Mon, Jan 18, 2010 at 1:24 PM, Ted Dunning wrote: > I'll be there. > > Sean, are you really going to be there? That would be fantastic. > > On Mon, Jan 18, 2010 at 6:02 AM, Grant Ingersoll >wrote: > > > > > On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: > > > > > We should have a beer some time anyway and the beers we owe you for > > cleaning > > > up Colt more than cancel any potential beer on this issue so I will be > > happy > > > to buy (Sean, you are included for similar reasons if we ever see each > > > other). > > > > After the Meetup (http://www.meetup.com/SFBay-Lucene-Solr-Meetup/) on > > Thursday? Looks like Sean will be there. What other Mahouts are > planning > > on attending? > > > > -Grant > > > > > -- > Ted Dunning, CTO > DeepDyve >
Re: Unit test lag?
Yes, I'm on the west coast for a week from tomorrow for various reasons and so will certainly stop in. Looking forward to it. Sean On Mon, Jan 18, 2010 at 9:24 PM, Ted Dunning wrote: > I'll be there. > > Sean, are you really going to be there? That would be fantastic.
Re: Unit test lag?
I'll be there. Sean, are you really going to be there? That would be fantastic. On Mon, Jan 18, 2010 at 6:02 AM, Grant Ingersoll wrote: > > On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: > > > We should have a beer some time anyway and the beers we owe you for > cleaning > > up Colt more than cancel any potential beer on this issue so I will be > happy > > to buy (Sean, you are included for similar reasons if we ever see each > > other). > > After the Meetup (http://www.meetup.com/SFBay-Lucene-Solr-Meetup/) on > Thursday? Looks like Sean will be there. What other Mahouts are planning > on attending? > > -Grant -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
If it's SF on Thursday, someone will have to have a beer as my proxy. I'll be back here in the snow. On Mon, Jan 18, 2010 at 12:21 PM, Jeff Eastman wrote: > I'm planning on attending > Jeff > > > Grant Ingersoll wrote: >> >> On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: >> >> >>> >>> We should have a beer some time anyway and the beers we owe you for >>> cleaning >>> up Colt more than cancel any potential beer on this issue so I will be >>> happy >>> to buy (Sean, you are included for similar reasons if we ever see each >>> other). >>> >> >> After the Meetup (http://www.meetup.com/SFBay-Lucene-Solr-Meetup/) on >> Thursday? Looks like Sean will be there. What other Mahouts are planning >> on attending? >> >> -Grant >> > >
Re: Unit test lag?
I'm planning on attending Jeff Grant Ingersoll wrote: On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: We should have a beer some time anyway and the beers we owe you for cleaning up Colt more than cancel any potential beer on this issue so I will be happy to buy (Sean, you are included for similar reasons if we ever see each other). After the Meetup (http://www.meetup.com/SFBay-Lucene-Solr-Meetup/) on Thursday? Looks like Sean will be there. What other Mahouts are planning on attending? -Grant
Re: Unit test lag?
On Mon, Jan 18, 2010 at 9:42 AM, Sean Owen wrote: > You can punt the choice all the way up to fix that. Then regular > callers are forced to instantiate and supply the RNG in all cases, and > the API has Randoms all over the place, and I suppose I don't quite > like that aesthetically. Point taken. I suspect there may be ways around this ugliness in many cases, but you certainly know the code infinitely better than I do. >> RandomUtils.useTestSeed() is called once in a VM all other callers of >> RandomUtils.getRandom() will get a test seed. > > ... yep and I think this is cleaner than the option above. That may be > the only delta between what we're saying. Yes, this is the only difference. Thanks for taking the time to understand my point of view and taking steps to resolve the slowness issue. Both are very much appreciated. Drew
Re: Unit test lag?
On Mon, Jan 18, 2010 at 2:36 PM, Drew Farris wrote: > I'm suggesting that the instantiator/caller of the class choose > between a regular and test-friendly RNG. In some classes that creator > will be a unit test in other cases the creator will be another piece > of production code. In some cases the decision as to which type of RNG > to use will need to be made further up the object graph than the > immediate instantiator/caller, and generally it should be made as > close to main() or setUp() as possible. Yes, but the problem is the production code. The test code knows it's in test mode. The production code does not, since it's executed in test and non-test mode. It can't make this choice independently. You can punt the choice all the way up to fix that. Then regular callers are forced to instantiate and supply the RNG in all cases, and the API has Randoms all over the place, and I suppose I don't quite like that aesthetically. > RandomUtils essentially achieves the same thing in a static fashion. > The class itself decides that it will always delegate to RandomUtils > and random utils provides the different strategies. Currently if > RandomUtils.useTestSeed() is called once in a VM all other callers of > RandomUtils.getRandom() will get a test seed. ... yep and I think this is cleaner than the option above. That may be the only delta between what we're saying. (Separately I'd like to hijack MAHOUT-260 now to talk about the still-existing repeatability problem, which is a different question. Any thoughts on that? it patches this up pretty well but isn't entirely pretty.)
Re: Unit test lag?
On Mon, Jan 18, 2010 at 9:23 AM, Sean Owen wrote: > You're suggesting the class choose between a regular and test-friendly > RNG, by calling one of two methods. Doesn't that put the decision with > the class instead of externally? Right now it's already external. > RandomUtils decides what to instantiate. I'm suggesting that the instantiator/caller of the class choose between a regular and test-friendly RNG. In some classes that creator will be a unit test in other cases the creator will be another piece of production code. In some cases the decision as to which type of RNG to use will need to be made further up the object graph than the immediate instantiator/caller, and generally it should be made as close to main() or setUp() as possible. RandomUtils essentially achieves the same thing in a static fashion. The class itself decides that it will always delegate to RandomUtils and random utils provides the different strategies. Currently if RandomUtils.useTestSeed() is called once in a VM all other callers of RandomUtils.getRandom() will get a test seed. Drew
Re: Unit test lag?
You're suggesting the class choose between a regular and test-friendly RNG, by calling one of two methods. Doesn't that put the decision with the class instead of externally? Right now it's already external. RandomUtils decides what to instantiate. On Mon, Jan 18, 2010 at 2:21 PM, Drew Farris wrote: > You get it entirelym Moving around the injection in this case produces > more testable code in that you don't have a class-defined behavior for > the RNG. Instead it becomes an externally-defined behavior.
Re: Unit test lag?
On Mon, Jan 18, 2010 at 9:06 AM, Sean Owen wrote: > (Separately you could argue we're going about this all wrong, by > trying to depend on the exact output of the RNG.. No argument here. In practice I don't think we can really get around using a pre-seeded RNG for tests. > You've moved around the injection, but nothing else I think. Am I > misunderstanding because that seems to be why I'm not following > getTestRandom(). You get it entirelym Moving around the injection in this case produces more testable code in that you don't have a class-defined behavior for the RNG. Instead it becomes an externally-defined behavior. > (Taking it as a constructor param is the conventional way to set up > for injecting, but from an API perspective I don't quite like it. I > understand why an evaluator necessarily needs a Recommender to exist, > but why do I need to give it a Random, conceptually?) It really depends on the evaluator implementation. In the case of GenericRecommenderIRStatsEvaluator the evaluator happens use randomness to perform the evaluation function. I agree that these sorts of injections should not be accommodated at the interface level and shouldn't pollute the API. Drew
Re: Unit test lag?
On Mon, Jan 18, 2010 at 2:00 PM, Drew Farris wrote: > In what cases would you want to reset them all remotely, at the > beginning of each test? You pretty much said it -- tests should start from a known, fixed state, so that the result is the same each time, and we can assert about the output. This means setting the entire library and test fixture state to a known state -- that's why there's a need to not just control the initial seed but reset it. (Separately you could argue we're going about this all wrong, by trying to depend on the exact output of the RNG, and should be writing tests that assert only what's true no matter what the outcome, or else, assert things that should be true in 99.% of all RNG sequences. But let's resort to that argument later.) > In tests you call > > Random r = RandomUtil.getTestRandom() > ev = new GenericRecommenderIRStatsEvaluator(r); > > In production code you call: > > Random r = RandomUtil.getRandom(); > ev = new GenericRecommenderIRStatsEvaluator(r); And you're suggesting getRandom() returns a randomly-seeded RNG? Then this just returns to the original problem: the test is not repeatable. You've moved around the injection, but nothing else I think. Am I misunderstanding because that seems to be why I'm not following getTestRandom(). (Taking it as a constructor param is the conventional way to set up for injecting, but from an API perspective I don't quite like it. I understand why an evaluator necessarily needs a Recommender to exist, but why do I need to give it a Random, conceptually?)
Re: Unit test lag?
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: > We should have a beer some time anyway and the beers we owe you for cleaning > up Colt more than cancel any potential beer on this issue so I will be happy > to buy (Sean, you are included for similar reasons if we ever see each > other). After the Meetup (http://www.meetup.com/SFBay-Lucene-Solr-Meetup/) on Thursday? Looks like Sean will be there. What other Mahouts are planning on attending? -Grant
Re: Unit test lag?
On Mon, Jan 18, 2010 at 3:58 AM, Sean Owen wrote: > The real fix is centralizing management of Random, tracking them, and > being able to reset them all "remotely". In what cases would you want to reset them all remotely, at the beginning of each test? > It is injected already -- that's the purpose of having "getRandom()" > and not "getTestRandom()". That's the means by which a different > fixed-seed RNG can be provided when run in a test harness. You > couldn't do that with two methods: they'd each return a normal or > fixed RNG, and the code could only call one. I was suggesting the RNG could be specified outside of the code instead of inside of the code. For example, instead of: GenericRecommenderIRStatsEvaluator() { random = RandomUtils.getRandom(); } You could have: GenericRecommenderIRStatsEvaluator(Random r) { random = r; } In tests you call Random r = RandomUtil.getTestRandom() ev = new GenericRecommenderIRStatsEvaluator(r); In production code you call: Random r = RandomUtil.getRandom(); ev = new GenericRecommenderIRStatsEvaluator(r); The alternative would be to leave the constructor for GenericRecommenderIRStatsEvaluator at it is and provde a way to set the value of the field 'random' to something for testing, e.g: the return value of RandomUtil.getTestRandom() For example in tests: Random r = RantomUtil.getTestRandom() ev = new GenericRecommenderIRStatsEvaluator(); ev.setRandomSeed(r); While production code would remain unchanged. This won't necessarily work in all cases depending upon what sort of things happen in the constructor. As Ted pointed out, doing the above would require a fair amount of thought and work, there's no single approach to introducing this type of injection that will work everywhere. Drew
Re: Unit test lag?
Same here, I don't like Spring myself as it smells like overengineering -- certainly for this case. I'm otherwise a luddite though and could more broadly be convinced. On Mon, Jan 18, 2010 at 2:49 AM, Ted Dunning wrote: > I have had too many unpleasant experiences using Spring to be enthused about > jumping fully into it for this one use case.
Re: Unit test lag?
On Mon, Jan 18, 2010 at 2:24 AM, Drew Farris wrote: > On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote: >> There are already cases where code needs to control the seed (mostly >> to serialize/deserialize the exact state of an object). I don't think >> that's the issue per se? The issue is when an RNG lives beyond one >> test, and there are legitimate reasons that may be so. > > Ahh, ok, I wasn't really considering this. Would it be sufficient to > assign the RNG to a static field in the test class in this case? If it > needed to live across multiple classes, it could be public. > Nevertheless.. Well that would create the problem rather than solve it but since the problem already does (or will) exist legitimately in the main code, you could say sure, why not? After all in some class that's instantiate a lot, doesn't make sense from an efficiency standpoint to initialize an RNG per instance, and so it's static, and there you go, problem. (This wouldn't be a good change for tests though since a statically-initialized RNG would be created before setUp() set the framework to test mode. But then again, that's another example of the actual issue at hand.) The real fix is centralizing management of Random, tracking them, and being able to reset them all "remotely". I know how that could be done. > I suspect I'm missing something here because I don't understand how > randomness is used in the non-test code or specifically how the RNG's > are managed. I was (falsely, likely) assuming that the non-test code > didn't obtain the RNG itself but rather had it provided/injected by an > external source. In the context of a test, something from > getTestRandom() which uses a fixed seed could be injected, while in > production code something else would be. > Randomness is used outside of tests to, for example, sample 10% of a data set for example. Or k-means. It is injected already -- that's the purpose of having "getRandom()" and not "getTestRandom()". That's the means by which a different fixed-seed RNG can be provided when run in a test harness. You couldn't do that with two methods: they'd each return a normal or fixed RNG, and the code could only call one.
Re: Unit test lag?
The Guice user guide is also very good at describing the benefits of injection. http://code.google.com/docreader/#p=google-guice&s=google-guice&t=Motivation I also like the level of complexity that Guice introduces (nearly zero). My major problem with Spring is that it introduces and mixes a bunch of different concepts at the same time. This makes it hard to take a small bite. Guice looks like a small bite and just defining constructors for hand-done injection is a still smaller bite. On Sun, Jan 17, 2010 at 6:59 PM, Drew Farris wrote: > However, we can support the concept of injection without having to > commit to using one framework or another. Every class is instantiated > somewhere, so manual injection can be performed sans framework at that > point. Speaking specifically for this case, the contract would be that > anything that requires a RNG gets it injected by the class that > instantiates it instead of obtaining one through some method of its > own. > > There's a great series of posts that describe the advantages to this > approach when it comes to testability that's reachable from: > http://misko.hevery.com/2008/09/10/where-have-all-the-new-operators-gone/ > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
I prefer the injection method as well. On Sun, Jan 17, 2010 at 7:51 PM, Drew Farris wrote: > > If we want to go in Drew's suggested direction, we have to decide what > > to do about seeds. We either need to define an > > 'RandomNumberGeneratorFactory' interface which takes seeds and return > > generators, or we want to inject Random objects and expect the > > injector to worry about constructing and dealing with seeds. > > I vote for the latter. Those Random objects could be created via a > factory by whomever is injecting them. > > FWIW, RandomNumberGeneratorFactory pretty much exists today as > RandomUtils, I suspect we would just want to get rid of the static > boolean that determines whether a test seed or random seed is used for > getRandom(). -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
On Sun, Jan 17, 2010 at 10:31 PM, Benson Margulies wrote: > Have a look at the patch I posted to MAHOUT-260. It ducks the > injection question for now. This looks reasonable. > However, what's perhaps most interesting is that it makes tests fail! > Some tests get different answers with the stock JDK rng. Which tests are failing? I'm having some issues with non-patched head ATM. > If we want to go in Drew's suggested direction, we have to decide what > to do about seeds. We either need to define an > 'RandomNumberGeneratorFactory' interface which takes seeds and return > generators, or we want to inject Random objects and expect the > injector to worry about constructing and dealing with seeds. I vote for the latter. Those Random objects could be created via a factory by whomever is injecting them. FWIW, RandomNumberGeneratorFactory pretty much exists today as RandomUtils, I suspect we would just want to get rid of the static boolean that determines whether a test seed or random seed is used for getRandom().
Re: Unit test lag?
Have a look at the patch I posted to MAHOUT-260. It ducks the injection question for now. However, what's perhaps most interesting is that it makes tests fail! Some tests get different answers with the stock JDK rng. If we want to go in Drew's suggested direction, we have to decide what to do about seeds. We either need to define an 'RandomNumberGeneratorFactory' interface which takes seeds and return generators, or we want to inject Random objects and expect the injector to worry about constructing and dealing with seeds. On Sun, Jan 17, 2010 at 9:59 PM, Drew Farris wrote: > I've used spring a great deal as well and generally look pretty > favorably upon it, but readily admit there are definite cons to it to. > > However, we can support the concept of injection without having to > commit to using one framework or another. Every class is instantiated > somewhere, so manual injection can be performed sans framework at that > point. Speaking specifically for this case, the contract would be that > anything that requires a RNG gets it injected by the class that > instantiates it instead of obtaining one through some method of its > own. > > There's a great series of posts that describe the advantages to this > approach when it comes to testability that's reachable from: > http://misko.hevery.com/2008/09/10/where-have-all-the-new-operators-gone/ > > This sort of injection strategy can be introduced steps across the > codebase using manual injection techniques and then as/if needed a > dynamic injection framwork can be folded in. It seems that plugging in > RNG's might be a good place to start. > > Drew > > On Sun, Jan 17, 2010 at 9:35 PM, Benson Margulies > wrote: >> One moral equivalent of Spring is a String property with a >> fully-qualified class name which RandomUtils instantiates to get its >> RNG. Another is to actually inject the RNG object. Spring would get >> really tempting here. >> >> I've had an extended immersion in Spring via CXF, so I have a low >> threshold for introducing it. >> >> >> >> On Sun, Jan 17, 2010 at 9:24 PM, Drew Farris wrote: >>> On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote: There are already cases where code needs to control the seed (mostly to serialize/deserialize the exact state of an object). I don't think that's the issue per se? The issue is when an RNG lives beyond one test, and there are legitimate reasons that may be so. >>> >>> Ahh, ok, I wasn't really considering this. Would it be sufficient to >>> assign the RNG to a static field in the test class in this case? If it >>> needed to live across multiple classes, it could be public. >>> Nevertheless.. >>> I don't see how a getTestRandom() method fixes something... I can't call this in my non-test code, and then tests can't control those RNGs. The non-test code can't make this decision which is why they don't. I don't think this is the problem/solution but rather having a way to globally reset all RNGs. >>> >>> I suspect I'm missing something here because I don't understand how >>> randomness is used in the non-test code or specifically how the RNG's >>> are managed. I was (falsely, likely) assuming that the non-test code >>> didn't obtain the RNG itself but rather had it provided/injected by an >>> external source. In the context of a test, something from >>> getTestRandom() which uses a fixed seed could be injected, while in >>> production code something else would be. >>> >> >
Re: Unit test lag?
I've used spring a great deal as well and generally look pretty favorably upon it, but readily admit there are definite cons to it to. However, we can support the concept of injection without having to commit to using one framework or another. Every class is instantiated somewhere, so manual injection can be performed sans framework at that point. Speaking specifically for this case, the contract would be that anything that requires a RNG gets it injected by the class that instantiates it instead of obtaining one through some method of its own. There's a great series of posts that describe the advantages to this approach when it comes to testability that's reachable from: http://misko.hevery.com/2008/09/10/where-have-all-the-new-operators-gone/ This sort of injection strategy can be introduced steps across the codebase using manual injection techniques and then as/if needed a dynamic injection framwork can be folded in. It seems that plugging in RNG's might be a good place to start. Drew On Sun, Jan 17, 2010 at 9:35 PM, Benson Margulies wrote: > One moral equivalent of Spring is a String property with a > fully-qualified class name which RandomUtils instantiates to get its > RNG. Another is to actually inject the RNG object. Spring would get > really tempting here. > > I've had an extended immersion in Spring via CXF, so I have a low > threshold for introducing it. > > > > On Sun, Jan 17, 2010 at 9:24 PM, Drew Farris wrote: >> On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote: >>> There are already cases where code needs to control the seed (mostly >>> to serialize/deserialize the exact state of an object). I don't think >>> that's the issue per se? The issue is when an RNG lives beyond one >>> test, and there are legitimate reasons that may be so. >> >> Ahh, ok, I wasn't really considering this. Would it be sufficient to >> assign the RNG to a static field in the test class in this case? If it >> needed to live across multiple classes, it could be public. >> Nevertheless.. >> >>> I don't see how a getTestRandom() method fixes something... I can't >>> call this in my non-test code, and then tests can't control those >>> RNGs. The non-test code can't make this decision which is why they >>> don't. I don't think this is the problem/solution but rather having a >>> way to globally reset all RNGs. >> >> I suspect I'm missing something here because I don't understand how >> randomness is used in the non-test code or specifically how the RNG's >> are managed. I was (falsely, likely) assuming that the non-test code >> didn't obtain the RNG itself but rather had it provided/injected by an >> external source. In the context of a test, something from >> getTestRandom() which uses a fixed seed could be injected, while in >> production code something else would be. >> >
Re: Unit test lag?
OK, then the class name appeals to me. I'll propose a patch. On Sun, Jan 17, 2010 at 9:49 PM, Ted Dunning wrote: > I have had too many unpleasant experiences using Spring to be enthused about > jumping fully into it for this one use case. > > On Sun, Jan 17, 2010 at 6:35 PM, Benson Margulies > wrote: > >> One moral equivalent of Spring is a String property with a >> fully-qualified class name which RandomUtils instantiates to get its >> RNG. Another is to actually inject the RNG object. Spring would get >> really tempting here. >> >> I've had an extended immersion in Spring via CXF, so I have a low >> threshold for introducing it. >> >> >
Re: Unit test lag?
I have had too many unpleasant experiences using Spring to be enthused about jumping fully into it for this one use case. On Sun, Jan 17, 2010 at 6:35 PM, Benson Margulies wrote: > One moral equivalent of Spring is a String property with a > fully-qualified class name which RandomUtils instantiates to get its > RNG. Another is to actually inject the RNG object. Spring would get > really tempting here. > > I've had an extended immersion in Spring via CXF, so I have a low > threshold for introducing it. > >
Re: Unit test lag?
One moral equivalent of Spring is a String property with a fully-qualified class name which RandomUtils instantiates to get its RNG. Another is to actually inject the RNG object. Spring would get really tempting here. I've had an extended immersion in Spring via CXF, so I have a low threshold for introducing it. On Sun, Jan 17, 2010 at 9:24 PM, Drew Farris wrote: > On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote: >> There are already cases where code needs to control the seed (mostly >> to serialize/deserialize the exact state of an object). I don't think >> that's the issue per se? The issue is when an RNG lives beyond one >> test, and there are legitimate reasons that may be so. > > Ahh, ok, I wasn't really considering this. Would it be sufficient to > assign the RNG to a static field in the test class in this case? If it > needed to live across multiple classes, it could be public. > Nevertheless.. > >> I don't see how a getTestRandom() method fixes something... I can't >> call this in my non-test code, and then tests can't control those >> RNGs. The non-test code can't make this decision which is why they >> don't. I don't think this is the problem/solution but rather having a >> way to globally reset all RNGs. > > I suspect I'm missing something here because I don't understand how > randomness is used in the non-test code or specifically how the RNG's > are managed. I was (falsely, likely) assuming that the non-test code > didn't obtain the RNG itself but rather had it provided/injected by an > external source. In the context of a test, something from > getTestRandom() which uses a fixed seed could be injected, while in > production code something else would be. >
Re: Unit test lag?
On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote: > There are already cases where code needs to control the seed (mostly > to serialize/deserialize the exact state of an object). I don't think > that's the issue per se? The issue is when an RNG lives beyond one > test, and there are legitimate reasons that may be so. Ahh, ok, I wasn't really considering this. Would it be sufficient to assign the RNG to a static field in the test class in this case? If it needed to live across multiple classes, it could be public. Nevertheless.. > I don't see how a getTestRandom() method fixes something... I can't > call this in my non-test code, and then tests can't control those > RNGs. The non-test code can't make this decision which is why they > don't. I don't think this is the problem/solution but rather having a > way to globally reset all RNGs. I suspect I'm missing something here because I don't understand how randomness is used in the non-test code or specifically how the RNG's are managed. I was (falsely, likely) assuming that the non-test code didn't obtain the RNG itself but rather had it provided/injected by an external source. In the context of a test, something from getTestRandom() which uses a fixed seed could be injected, while in production code something else would be.
Re: Unit test lag?
On Sun, Jan 17, 2010 at 6:10 PM, Sean Owen wrote: > There are already cases where code needs to control the seed (mostly > to serialize/deserialize the exact state of an object). > That is an important case, but it should be deterministic and thus not a problem for testing. Really the RNG is being used more as a good hash function in these cases. > I don't think > that's the issue per se? > I don't think that the serialization trick is a problem at all. > The issue is when an RNG lives beyond one > test, and there are legitimate reasons that may be so. > Hmm... I can't think of any off-hand. You probably have something in mind. Can you say what reasons there are for this? > I don't see how a getTestRandom() method fixes something... I can't > call this in my non-test code, and then tests can't control those > RNGs. The non-test code can't make this decision which is why they > don't. I don't think this is the problem/solution but rather having a > way to globally reset all RNGs. > I think that the problem that we are talking around here is whether we commit to having RNG's be injectable whereever they are used. Half measures are the problem here (IMHO). Real injection would solve all the questions by giving complete control to the test case. I don't think that we need Guice or Spring here, just a way to say "use this RNG, if you don't mind".
Re: Unit test lag?
This could be my fault though my tests are passing. Let me look. On Jan 18, 2010 2:15 AM, "Drew Farris" wrote: Spoke too soon of course, some tests fail strangely locally: /u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48,25] type parameter org.apache.hadoop.io.LongWritable is not within its bound /u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48,92] type parameter org.apache.hadoop.io.LongWritable is not within its bound Looks like this was discussed way back in http://issues.apache.org/jira/browse/MAHOUT-127, but for the life of my I can'y figure out why I'm running into it now. On Sun, Jan 17, 2010 at 8:38 PM, Drew Farris wrote: > On Sun, Jan 17, 2010 ...
Re: Unit test lag?
Spoke too soon of course, some tests fail strangely locally: /u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48,25] type parameter org.apache.hadoop.io.LongWritable is not within its bound /u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48,92] type parameter org.apache.hadoop.io.LongWritable is not within its bound Looks like this was discussed way back in http://issues.apache.org/jira/browse/MAHOUT-127, but for the life of my I can'y figure out why I'm running into it now. On Sun, Jan 17, 2010 at 8:38 PM, Drew Farris wrote: > On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote: >> Am I right that running tests in 1 JVM instead of n JVMs helps >> mitigate this? because I just committed that change. >> > > I just updated to HEAD, and this seems to have fixed the problem. Unit > tests are completing in times in-line with those reported by the tests > themselves. > > Since this was happening at class loading time, running all of the > tests in a single VM does mitigate this becuase less forks mean less > entropy drain and there is more time to collect entropy between forks. >
Re: Unit test lag?
There are already cases where code needs to control the seed (mostly to serialize/deserialize the exact state of an object). I don't think that's the issue per se? The issue is when an RNG lives beyond one test, and there are legitimate reasons that may be so. I don't see how a getTestRandom() method fixes something... I can't call this in my non-test code, and then tests can't control those RNGs. The non-test code can't make this decision which is why they don't. I don't think this is the problem/solution but rather having a way to globally reset all RNGs. On Mon, Jan 18, 2010 at 1:55 AM, Drew Farris wrote: > The potential issue I see is if any tests expected to run using a seed >>other< than the test seed. Now that we are no longer forking, calling > RandomUtils.useTestSeed() in test A will cause the test seed to be > used in B, C, D, E etc. In this case it makes sense to avoid using a > stateful static classes like RandomUtils, probably to condense this > down to RandomUtils.getTestRandom(). > > RandomUtils.getRandom() will reset the seed in any case, to a default > seed if useTestSeed() as ever been called, to something random if > useTestSeed() has never been called. >
Re: Unit test lag?
Ted, It depends on the test implementation itself. Generally, I believe the pattern that is followed is: RandomUtils.useTestSeed(); Random r = RandomUtils.getRandom(); The potential issue I see is if any tests expected to run using a seed >other< than the test seed. Now that we are no longer forking, calling RandomUtils.useTestSeed() in test A will cause the test seed to be used in B, C, D, E etc. In this case it makes sense to avoid using a stateful static classes like RandomUtils, probably to condense this down to RandomUtils.getTestRandom(). RandomUtils.getRandom() will reset the seed in any case, to a default seed if useTestSeed() as ever been called, to something random if useTestSeed() has never been called. On Sun, Jan 17, 2010 at 8:39 PM, Ted Dunning wrote: > Do the RandomUtils reset the seed for every test as desired? > > On Sun, Jan 17, 2010 at 5:38 PM, Drew Farris wrote: > >> On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote: >> > Am I right that running tests in 1 JVM instead of n JVMs helps >> > mitigate this? because I just committed that change. >> > >> >> I just updated to HEAD, and this seems to have fixed the problem. Unit >> tests are completing in times in-line with those reported by the tests >> themselves. >> >> Since this was happening at class loading time, running all of the >> tests in a single VM does mitigate this becuase less forks mean less >> entropy drain and there is more time to collect entropy between forks. >> > > > > -- > Ted Dunning, CTO > DeepDyve >
Re: Unit test lag?
I can imagine ways to nuke the problem as well. On Sun, Jan 17, 2010 at 5:46 PM, Sean Owen wrote: > I can imagine some semi-elaborate ways to actually explicitly manage > and address this with a wrapper class. > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
Not quite, and you have a good point. Each instance of an RNG is seeded identically when testing. But if something holds an RNG open across tests, it won't be reset somehow. I could imagine that if there's a static RNG somewhere in a class, which would be reasonable. (Or if a test isn't quite using setUp() properly vis-a-vis RNGs, but that's fixable.) I can imagine some semi-elaborate ways to actually explicitly manage and address this with a wrapper class. On Mon, Jan 18, 2010 at 1:39 AM, Ted Dunning wrote: > Do the RandomUtils reset the seed for every test as desired?
Re: Unit test lag?
Do the RandomUtils reset the seed for every test as desired? On Sun, Jan 17, 2010 at 5:38 PM, Drew Farris wrote: > On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote: > > Am I right that running tests in 1 JVM instead of n JVMs helps > > mitigate this? because I just committed that change. > > > > I just updated to HEAD, and this seems to have fixed the problem. Unit > tests are completing in times in-line with those reported by the tests > themselves. > > Since this was happening at class loading time, running all of the > tests in a single VM does mitigate this becuase less forks mean less > entropy drain and there is more time to collect entropy between forks. > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
And I think that we need to be robust in the face of either behavior. It should be fine to initialize once. On Sun, Jan 17, 2010 at 5:36 PM, Sean Owen wrote: > I think you are right in that JVMs are allowed to wait until first use > to load a class, but the one time I checked the Sun JVM it didn't work > that way. It actively loaded the class (which is also allowed). I > would bet dollars to donuts we'd find it doesn't wait. > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote: > Am I right that running tests in 1 JVM instead of n JVMs helps > mitigate this? because I just committed that change. > I just updated to HEAD, and this seems to have fixed the problem. Unit tests are completing in times in-line with those reported by the tests themselves. Since this was happening at class loading time, running all of the tests in a single VM does mitigate this becuase less forks mean less entropy drain and there is more time to collect entropy between forks.
Re: Unit test lag?
I think you are right in that JVMs are allowed to wait until first use to load a class, but the one time I checked the Sun JVM it didn't work that way. It actively loaded the class (which is also allowed). I would bet dollars to donuts we'd find it doesn't wait. On Mon, Jan 18, 2010 at 1:22 AM, Benson Margulies wrote: > Sean, that's not how class loaders work AFAIK. the mere presence of an > import does not trigger the load. You have to touch it. > > HOWEVER, if I am wrong, I will (a) buy the beer, and (b) add the > reflective code to get rid of the import.
Re: Unit test lag?
We should have a beer some time anyway and the beers we owe you for cleaning up Colt more than cancel any potential beer on this issue so I will be happy to buy (Sean, you are included for similar reasons if we ever see each other). Does the difference here matter? If we have zero or one class load, we should be fine relative to bits consumed. The problem is n-class loads. On Sun, Jan 17, 2010 at 5:22 PM, Benson Margulies wrote: > Sean, that's not how class loaders work AFAIK. the mere presence of an > import does not trigger the load. You have to touch it. > > HOWEVER, if I am wrong, I will (a) buy the beer, and (b) add the > reflective code to get rid of the import. > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
No. We won't. The JDK RNG is fine for pretty much everything we do. I agree that we should use a better generator for production use, but for deterministic tests, there isn't an issue. And frankly, I try to use algorithms are robust about the generator they use. Some applications are really good at exposing flaws and some are fine with anything better than ROT-13(n++). I think that all we have are the latter kind so far. On Sun, Jan 17, 2010 at 4:19 PM, Benson Margulies wrote: > So the question to me is whether we lose any test quality by using the JDK > RNG. -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
Sean, that's not how class loaders work AFAIK. the mere presence of an import does not trigger the load. You have to touch it. HOWEVER, if I am wrong, I will (a) buy the beer, and (b) add the reflective code to get rid of the import. On Sun, Jan 17, 2010 at 7:26 PM, Sean Owen wrote: > Nope, since it imports MersenneTwisterRNG, that class will be > initialized the moment RandomUtils is loaded. > > On Mon, Jan 18, 2010 at 12:19 AM, Benson Margulies > wrote: >> That would make a difference. If the code in RandomUtils never new's >> the Mersenne class, then it's static blocks would never run. If >> necessary, the Mersenne class could by loaded explicitly, but I don't >> think we have to go that far. >> >> So the question to me is whether we lose any test quality by using the JDK >> RNG. >> >
Re: Unit test lag?
Nope, since it imports MersenneTwisterRNG, that class will be initialized the moment RandomUtils is loaded. On Mon, Jan 18, 2010 at 12:19 AM, Benson Margulies wrote: > That would make a difference. If the code in RandomUtils never new's > the Mersenne class, then it's static blocks would never run. If > necessary, the Mersenne class could by loaded explicitly, but I don't > think we have to go that far. > > So the question to me is whether we lose any test quality by using the JDK > RNG. >
Re: Unit test lag?
That would make a difference. If the code in RandomUtils never new's the Mersenne class, then it's static blocks would never run. If necessary, the Mersenne class could by loaded explicitly, but I don't think we have to go that far. So the question to me is whether we lose any test quality by using the JDK RNG. On Sun, Jan 17, 2010 at 7:07 PM, Sean Owen wrote: > It sounds like the slow code gets triggered at class-loading time, so > no I don't think this would make a difference. But with the change I > committed we should only have one class loader in play, I think. > > On Mon, Jan 18, 2010 at 12:00 AM, Benson Margulies > wrote: >> What if we used the plain old JDK rng when in test mode? >
Re: Unit test lag?
It sounds like the slow code gets triggered at class-loading time, so no I don't think this would make a difference. But with the change I committed we should only have one class loader in play, I think. On Mon, Jan 18, 2010 at 12:00 AM, Benson Margulies wrote: > What if we used the plain old JDK rng when in test mode?
Re: Unit test lag?
What if we used the plain old JDK rng when in test mode? On Sun, Jan 17, 2010 at 3:16 PM, Olivier Grisel wrote: > 2010/1/17 Sean Owen : >> Am I right that running tests in 1 JVM instead of n JVMs helps >> mitigate this? because I just committed that change. > > I have the feeling it helps yes. I haven't timed the tests though. > > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >
Re: Unit test lag?
2010/1/17 Sean Owen : > Am I right that running tests in 1 JVM instead of n JVMs helps > mitigate this? because I just committed that change. I have the feeling it helps yes. I haven't timed the tests though. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
This is a way of saying "I don't know". On Sun, Jan 17, 2010 at 12:02 PM, Ted Dunning wrote: > That might help if the random class is loaded only once. > > If the different tests each use a new class loader (seems unlikely) then > the static stuff will be executed multiply and the problem will be retained. > > > > On Sun, Jan 17, 2010 at 11:55 AM, Sean Owen wrote: > >> Am I right that running tests in 1 JVM instead of n JVMs helps >> mitigate this? because I just committed that change. >> >> On Sun, Jan 17, 2010 at 7:49 PM, Ted Dunning >> wrote: >> > It doesn't affect the random numbers being generated. >> > >> > But it does eat bits of entropy from /dev/random. That can then get >> starved >> > and block until more entropy is derived. Since the reading is done in a >> > static block instead of on construction, the cost can't be avoided. >> > >> > > > > -- > Ted Dunning, CTO > DeepDyve > > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
That might help if the random class is loaded only once. If the different tests each use a new class loader (seems unlikely) then the static stuff will be executed multiply and the problem will be retained. On Sun, Jan 17, 2010 at 11:55 AM, Sean Owen wrote: > Am I right that running tests in 1 JVM instead of n JVMs helps > mitigate this? because I just committed that change. > > On Sun, Jan 17, 2010 at 7:49 PM, Ted Dunning > wrote: > > It doesn't affect the random numbers being generated. > > > > But it does eat bits of entropy from /dev/random. That can then get > starved > > and block until more entropy is derived. Since the reading is done in a > > static block instead of on construction, the cost can't be avoided. > > > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
Am I right that running tests in 1 JVM instead of n JVMs helps mitigate this? because I just committed that change. On Sun, Jan 17, 2010 at 7:49 PM, Ted Dunning wrote: > It doesn't affect the random numbers being generated. > > But it does eat bits of entropy from /dev/random. That can then get starved > and block until more entropy is derived. Since the reading is done in a > static block instead of on construction, the cost can't be avoided. >
Re: Unit test lag?
It doesn't affect the random numbers being generated. But it does eat bits of entropy from /dev/random. That can then get starved and block until more entropy is derived. Since the reading is done in a static block instead of on construction, the cost can't be avoided. On Sun, Jan 17, 2010 at 4:31 AM, Sean Owen wrote: > But does that affect code which instantiates a MersenneTwisterRNG with > its own seed? > > On Sun, Jan 17, 2010 at 12:24 PM, Benson Margulies > wrote: > >> I don't know of any further issues with MersenneTwisterRNG though -- > >> what's the issue? Don't care what it does with /dev/random as long as > >> in test mode we are seeding it with the same seed, and that's what > >> > > > > Olivier and I found the Mersenne code touching the > > SecureRandomNumberGenerator, which goes and talks to /dev/random, all > > in static blocks before any seeds are used. > -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
2010/1/17 Drew Farris : > Olivier, > > If you are still interested in trying to debug these, you could > configure the surefire-plugin to use the options for opening up a port > for remote debugging when it forks off the java process. > > see: > http://maven.apache.org/plugins/maven-surefire-plugin/examples/debugging.html > > The examples there will suspend the vm until you connect with the > debugger. If you don't know this already, mvn can be convinced to run > individual tests using the -Dtest=testname argument (sans package > name, e.g: mvn test -Dtest=TransactionTreeTest) Thanks for the hint it did not know about that one. However I have added a log dump the stacktrace in RandomUtils whenever useSeed is false. It will be less tedious than waiting for the breakpoints to fire-up in eclipse. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
Olivier, If you are still interested in trying to debug these, you could configure the surefire-plugin to use the options for opening up a port for remote debugging when it forks off the java process. see: http://maven.apache.org/plugins/maven-surefire-plugin/examples/debugging.html The examples there will suspend the vm until you connect with the debugger. If you don't know this already, mvn can be convinced to run individual tests using the -Dtest=testname argument (sans package name, e.g: mvn test -Dtest=TransactionTreeTest) Hope this helps, Drew On Sun, Jan 17, 2010 at 8:49 AM, Olivier Grisel wrote: > Ok I have found three non deterministic tests so far that actually > consume entropy by calling generateSeed: > > TransactionTreeTest > CacheTest > AverageAbsoluteDifferenceRecommenderEvaluatorTest > > But using eclipse is not really helpful since I am forced to set the > forkMode to "never" to make my debugger able to attach and then have > to manual introspect what's happening. I'll try again with a log > statement and setting to fork mode back to "always". > > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >
Re: Unit test lag?
On Sun, Jan 17, 2010 at 1:36 PM, Drew Farris wrote: > Using a fixed seed doesn't solve the problem due to the way > SecureRandomSeedGenerator is loaded by MerseneTwisterRNG OK yeah I understand now. I thought this thread was addressing the determinism issue, but you're talking about performance. My bad. That's why I was confused. Well I'm keen to solve the determinism issue too, I'll try that.
Re: Unit test lag?
Ok I have found three non deterministic tests so far that actually consume entropy by calling generateSeed: TransactionTreeTest CacheTest AverageAbsoluteDifferenceRecommenderEvaluatorTest But using eclipse is not really helpful since I am forced to set the forkMode to "never" to make my debugger able to attach and then have to manual introspect what's happening. I'll try again with a log statement and setting to fork mode back to "always". -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
The real problem I originally brought up was that the unit tests were horribly slow due to blocking on /dev/random. On Sun, Jan 17, 2010 at 8:21 AM, Sean Owen wrote: > I think I must be missing something -- > > We don't use SecureRandom directly, so what would these effects have > to do with slow unit tests in our project? SecureRandom is referenced from the uncommons-maths class SecureRandomSeedGenerator via a private static final, so when SecureRandomSeedGenerator gets class loaded, we incur the penalty of the first SecureRandomSeed constructor's read of /dev/random. Since we fork for each unit test, this happens rapidly and quickly consumes the availble entropy on the system, leading to the blocking behavior we're seeing. Using a fixed seed doesn't solve the problem due to the way SecureRandomSeedGenerator is loaded by MerseneTwisterRNG Eliminating forking from the unit tests will probably be acceptable because I believe that Olivier has shown that the read from /dev/random only happens once either at SecureRandom class load time, or upon first call to its ctor. Drew
Re: Unit test lag?
I'm sorry I really think I'm off on my own planet. What issue are you trying to solve? Performance, or deterministic tests? I'm concerned with the latter and still do not understand what this has to do with it. On Sun, Jan 17, 2010 at 1:31 PM, Olivier Grisel wrote: > 2010/1/17 Sean Owen : >> I think I must be missing something -- >> >> We don't use SecureRandom directly, so what would these effects have >> to do with slow unit tests in our project? > > Classloading MersenneTwisterRNG in turn class loads > DefaultSeedGenerator which has the following static block: > > private static final SeedGenerator[] GENERATORS = new SeedGenerator[] > { > new DevRandomSeedGenerator(), > new RandomDotOrgSeedGenerator(), > new SecureRandomSeedGenerator() > }; > > And further rely upon an instance of java.security.SecureRandom for each fork. > > I am currently tracing a complete maven surefire run with eclipse to > see if we actually call generateSeed in the tests. So far this is the > case only in TransactionTreeTest which need a fix to use the test > seed. > > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >
Re: Unit test lag?
2010/1/17 Sean Owen : > I think I must be missing something -- > > We don't use SecureRandom directly, so what would these effects have > to do with slow unit tests in our project? Classloading MersenneTwisterRNG in turn class loads DefaultSeedGenerator which has the following static block: private static final SeedGenerator[] GENERATORS = new SeedGenerator[] { new DevRandomSeedGenerator(), new RandomDotOrgSeedGenerator(), new SecureRandomSeedGenerator() }; And further rely upon an instance of java.security.SecureRandom for each fork. I am currently tracing a complete maven surefire run with eclipse to see if we actually call generateSeed in the tests. So far this is the case only in TransactionTreeTest which need a fix to use the test seed. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
I think I must be missing something -- We don't use SecureRandom directly, so what would these effects have to do with slow unit tests in our project? And also am I right that, if we use our own seed in MersenneTwisterRNG, we still get deterministic behavior? I'm going to change all our tests to make sure we use a fixed seed, and I'm still not clear why this wouldn't address the randomness issue? I don't know about performance, why this would have a bearing or why it's recently slowed. Is that the issue you guys are looking at? On Sun, Jan 17, 2010 at 1:11 PM, Olivier Grisel wrote: > 2010/1/17 Benson Margulies : >> On Sun, Jan 17, 2010 at 7:31 AM, Sean Owen wrote: >>> But does that affect code which instantiates a MersenneTwisterRNG with >>> its own seed? >> >> That's what it looked like to me, but I may have been depending on >> Olivier's analysis. > > I confirm that the first call to the java.security.SecureRandom > constructor (which is in the static part of uncommons math init) does > two system calls to /dev/random: > > $ strace -o /tmp/clj.strace.out -F -f java $JAVA_OPTS \ > -cp .:..:/usr/share/java/jline.jar:$LIBS \ > jline.ConsoleRunner clojure.lang.Repl > > user=> (java.security.SecureRandom.) > # > user=> (java.security.SecureRandom.) > # > > while in a separate console: > > $ tail -f /tmp/clj.strace.out | grep "/dev/random" > 18354 stat64("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, > 8), ...}) = 0 > 18354 open("/dev/random", O_RDONLY|O_LARGEFILE) = 19 > > Further calls to the constructor or the generateSeed reuse the same > file descriptor (no further calls to open on /dev/random). > > I can instantiate many (10) SecureRandom instances without > blocking the process while calling generateSeed actually consume > entropy as expected and blocks the app after a couple of hundred > bytes. > > In our case it is possible that only the first call to the > SecureRandom constructor in each forked tests is enough to block > slowdown them all even if we don't call generateSeed. > > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >
Re: Unit test lag?
2010/1/17 Benson Margulies : > On Sun, Jan 17, 2010 at 7:31 AM, Sean Owen wrote: >> But does that affect code which instantiates a MersenneTwisterRNG with >> its own seed? > > That's what it looked like to me, but I may have been depending on > Olivier's analysis. I confirm that the first call to the java.security.SecureRandom constructor (which is in the static part of uncommons math init) does two system calls to /dev/random: $ strace -o /tmp/clj.strace.out -F -f java $JAVA_OPTS \ -cp .:..:/usr/share/java/jline.jar:$LIBS \ jline.ConsoleRunner clojure.lang.Repl user=> (java.security.SecureRandom.) # user=> (java.security.SecureRandom.) # while in a separate console: $ tail -f /tmp/clj.strace.out | grep "/dev/random" 18354 stat64("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0 18354 open("/dev/random", O_RDONLY|O_LARGEFILE) = 19 Further calls to the constructor or the generateSeed reuse the same file descriptor (no further calls to open on /dev/random). I can instantiate many (10) SecureRandom instances without blocking the process while calling generateSeed actually consume entropy as expected and blocks the app after a couple of hundred bytes. In our case it is possible that only the first call to the SecureRandom constructor in each forked tests is enough to block slowdown them all even if we don't call generateSeed. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
On Sun, Jan 17, 2010 at 7:31 AM, Sean Owen wrote: > But does that affect code which instantiates a MersenneTwisterRNG with > its own seed? That's what it looked like to me, but I may have been depending on Olivier's analysis. > > On Sun, Jan 17, 2010 at 12:24 PM, Benson Margulies > wrote: >>> I don't know of any further issues with MersenneTwisterRNG though -- >>> what's the issue? Don't care what it does with /dev/random as long as >>> in test mode we are seeding it with the same seed, and that's what >>> >> >> Olivier and I found the Mersenne code touching the >> SecureRandomNumberGenerator, which goes and talks to /dev/random, all >> in static blocks before any seeds are used. >> >
Re: Unit test lag?
2010/1/17 Benson Margulies : >> I don't know of any further issues with MersenneTwisterRNG though -- >> what's the issue? Don't care what it does with /dev/random as long as >> in test mode we are seeding it with the same seed, and that's what >> > > Olivier and I found the Mersenne code touching the > SecureRandomNumberGenerator, which goes and talks to /dev/random, all > in static blocks before any seeds are used. > I am not 100% that the SecureRandom() constructor that is called in the static part of the uncommons math packagte is actually doing any blocking call to /dev/random on linux. I would have thought that this is only the case when calling the generateSeed method called by the MersenneTwisterRNG constructor iwhen called by RandomUtils.getRandom() throughout Mahout components if RandomUtils. useTestSeed() is not called first. Maybe this is not the case. I will try to connect the eclipse debugger to investigate further. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
But does that affect code which instantiates a MersenneTwisterRNG with its own seed? On Sun, Jan 17, 2010 at 12:24 PM, Benson Margulies wrote: >> I don't know of any further issues with MersenneTwisterRNG though -- >> what's the issue? Don't care what it does with /dev/random as long as >> in test mode we are seeding it with the same seed, and that's what >> > > Olivier and I found the Mersenne code touching the > SecureRandomNumberGenerator, which goes and talks to /dev/random, all > in static blocks before any seeds are used. >
Re: Unit test lag?
> I don't know of any further issues with MersenneTwisterRNG though -- > what's the issue? Don't care what it does with /dev/random as long as > in test mode we are seeding it with the same seed, and that's what > Olivier and I found the Mersenne code touching the SecureRandomNumberGenerator, which goes and talks to /dev/random, all in static blocks before any seeds are used.
Re: Unit test lag?
Not sure what's going on or why that revision would have anything to do with the slowdown... the only thing of substance it did was actually let the SamplingIterator test run but it doesn't take long. I agree with not forking a JVM per test, so will make that change. Also, yes, we need tests to be deterministic. This is the theory behind why all code should obtain a Random from RandomUtils, and, all tests should configure RandomUtils to use a fixed seed in setUp(). This isn't 100% true. Mind if I indeed finally fix this? I don't know of any further issues with MersenneTwisterRNG though -- what's the issue? Don't care what it does with /dev/random as long as in test mode we are seeding it with the same seed, and that's what RandomUtils does. On Sun, Jan 17, 2010 at 7:29 AM, deneche abdelhakim wrote: > removing the maven repository does not solve the problem, neither a > fresh checkout of the trunk. > > but older revisions don't show any slowdown!!! I tried the following > revisions: > > Those old revisions seem Ok: > > r896946 | srowen | 2010-01-07 19:02:41 +0100 (Thu, 07 Jan 2010) | 1 line > MAHOUT-238 > > r897134 | robinanil | 2010-01-08 09:23:22 +0100 (Fri, 08 Jan 2010) | 1 line > MAHOUT-221 Missed out two files while checking in FP-Bonsai > > r897405 | adeneche | 2010-01-09 11:02:49 +0100 (Sat, 09 Jan 2010) | 1 line > MAHOUT-216 > > The slowdowns start at this revision !!! > > r897440 | srowen | 2010-01-09 13:53:25 +0100 (Sat, 09 Jan 2010) | 1 line > Code style adjustments; enabled/fixed TestSamplingIterator >
Re: Unit test lag?
removing the maven repository does not solve the problem, neither a fresh checkout of the trunk. but older revisions don't show any slowdown!!! I tried the following revisions: Those old revisions seem Ok: r896946 | srowen | 2010-01-07 19:02:41 +0100 (Thu, 07 Jan 2010) | 1 line MAHOUT-238 r897134 | robinanil | 2010-01-08 09:23:22 +0100 (Fri, 08 Jan 2010) | 1 line MAHOUT-221 Missed out two files while checking in FP-Bonsai r897405 | adeneche | 2010-01-09 11:02:49 +0100 (Sat, 09 Jan 2010) | 1 line MAHOUT-216 >>> The slowdowns start at this revision !!! r897440 | srowen | 2010-01-09 13:53:25 +0100 (Sat, 09 Jan 2010) | 1 line Code style adjustments; enabled/fixed TestSamplingIterator On Sun, Jan 17, 2010 at 5:47 AM, deneche abdelhakim wrote: > I'm getting similar slowdowns with my VirtualBox Ubuntu 9.04 > > I'm suspecting that the problem is not -only- caused by RandomUtils because: > > 1. I'm familiar with MerseneTwisterRNG slowdowns (I use it a lot) but > the test time used to be reported accurately by maven. Now maven > reports that a test took less than a second but it actually took a lot > more ! > > 2. Most of my tests actually call RandomUtils.useTestSeed() in setup() > (InMemInputSplitTest included) but the tests still take a lot of time, > and again its not reported accurately by maven > > 3. I generally launch a 'mvn clean install' every Thursday. I never > got this slowdowns until last Thursday (dit we change anything that > could have caused this slowdowns) > > On Sun, Jan 17, 2010 at 12:33 AM, Benson Margulies > wrote: >>> Unit tests should generally be using a fixed seed and not need to load a >>> secure seed from dev/random. I would say that RandomUtils is probably the >>> problem here. The secure seed should be loaded lazily only if the test seed >>> is not in use. >> >> The problem, as I see it, is that the uncommons-math package start >> initializing a random seed as soon as you touch it, whether you need >> it or not. RandomUtils can only avoid this by avoiding uncommons-math >> in unit test mode. >> >>> >>> >>> >>> -- >>> Ted Dunning, CTO >>> DeepDyve >>> >> >
Re: Unit test lag?
I'm getting similar slowdowns with my VirtualBox Ubuntu 9.04 I'm suspecting that the problem is not -only- caused by RandomUtils because: 1. I'm familiar with MerseneTwisterRNG slowdowns (I use it a lot) but the test time used to be reported accurately by maven. Now maven reports that a test took less than a second but it actually took a lot more ! 2. Most of my tests actually call RandomUtils.useTestSeed() in setup() (InMemInputSplitTest included) but the tests still take a lot of time, and again its not reported accurately by maven 3. I generally launch a 'mvn clean install' every Thursday. I never got this slowdowns until last Thursday (dit we change anything that could have caused this slowdowns) On Sun, Jan 17, 2010 at 12:33 AM, Benson Margulies wrote: >>> >> Unit tests should generally be using a fixed seed and not need to load a >> secure seed from dev/random. I would say that RandomUtils is probably the >> problem here. The secure seed should be loaded lazily only if the test seed >> is not in use. > > The problem, as I see it, is that the uncommons-math package start > initializing a random seed as soon as you touch it, whether you need > it or not. RandomUtils can only avoid this by avoiding uncommons-math > in unit test mode. > >> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> >
Re: Unit test lag?
>> > Unit tests should generally be using a fixed seed and not need to load a > secure seed from dev/random. I would say that RandomUtils is probably the > problem here. The secure seed should be loaded lazily only if the test seed > is not in use. The problem, as I see it, is that the uncommons-math package start initializing a random seed as soon as you touch it, whether you need it or not. RandomUtils can only avoid this by avoiding uncommons-math in unit test mode. > > > > -- > Ted Dunning, CTO > DeepDyve >
Re: Unit test lag?
On Sat, Jan 16, 2010 at 1:40 PM, Drew Farris wrote: > Mahout does per-test forking, which means we're forking off a new JVM > for each unit text execution, this adds overhead to tests that takes > 0.2s to complete. Is per-test forking strictly needed? > It shouldn't be. I would count it a bug if it were. > ... wall time 30s (!) or so. ... attempting to reading from /dev/random. > > Unit tests should generally be using a fixed seed and not need to load a secure seed from dev/random. I would say that RandomUtils is probably the problem here. The secure seed should be loaded lazily only if the test seed is not in use. -- Ted Dunning, CTO DeepDyve
Re: Unit test lag?
Some tests are probably not calling: RandomUtils.useTestSeed(); in a setUp() or static init. Maybe a mixin class MahoutTestCase base class with a default static init that calls it would do. Otherwise, I confirm that setting forkModel to "once" in maven/pom.xml solves the issue (and all tests pass). -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
Oh, I see. We have to give up on the MerseneTwisterRNG in tests and just use the JRE. Is that OK? On Sat, Jan 16, 2010 at 5:44 PM, Olivier Grisel wrote: > 2010/1/16 Drew Farris : >> On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies >> wrote: >>> . Running through strace showed that something was attempting to reading from /dev/random. Sometimes it ran fine, but at least 25-30% it ended up blocking until the entropy pool is refilled. To test I moved /dev/random, and created a link from /dev/urandom to /dev/random (the former doesn't block, but isn't cryptographically secure). It looks as if this could be related to the loading of the SecureRandomSeedGenerator class. >>> >>> Why not use a fixed random seed for unit tests? That would make them >>> more repeatable and avoid this problem, no? >>> >> >> It appears we are. in RandomUtils: >> >> public static Random getRandom() { >> return testSeed ? new MersenneTwisterRNG(STANDARD_SEED) : new >> MersenneTwisterRNG(); >> } >> >> But something somewhere is forcing SecureRandomSeedGenerator to get >> loaded by the classloader which in turn does a 'new SecureRandom()' in >> a private static final field assignment. Trying to track down what is >> causing the generator to get loaded in the first place. >> >> But something is forcing the SecureRandomSeedGenerator class to get >> loaded, which I suspect >> > > > MersenneTwisterRNG constructor calls: > > this(DefaultSeedGenerator.getInstance().generateSeed(SEED_SIZE_BYTES)); > > Which in turn calls: > > private static final SeedGenerator[] GENERATORS = new SeedGenerator[] > { > new DevRandomSeedGenerator(), > new RandomDotOrgSeedGenerator(), > new SecureRandomSeedGenerator() > }; > > In the definition of the class: DefaultSeedGenerator > > Unless the forking tests are disabled I don't see how to prevent the > MersenneTwisterRNG to inderctly fetch entropy from /dev/random / > SecureRandom. > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >
Re: Unit test lag?
I see a way, but it involves loading this class explicitly with reflection. I'll make a patch.
Re: Unit test lag?
2010/1/16 Drew Farris : > On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies > wrote: >> . Running through strace showed >>> that something was attempting to reading from /dev/random. Sometimes >>> it ran fine, but at least 25-30% it ended up blocking until the >>> entropy pool is refilled. To test I moved /dev/random, and created a >>> link from /dev/urandom to /dev/random (the former doesn't block, but >>> isn't cryptographically secure). It looks as if this could be related >>> to the loading of the SecureRandomSeedGenerator class. >> >> Why not use a fixed random seed for unit tests? That would make them >> more repeatable and avoid this problem, no? >> > > It appears we are. in RandomUtils: > > public static Random getRandom() { > return testSeed ? new MersenneTwisterRNG(STANDARD_SEED) : new > MersenneTwisterRNG(); > } > > But something somewhere is forcing SecureRandomSeedGenerator to get > loaded by the classloader which in turn does a 'new SecureRandom()' in > a private static final field assignment. Trying to track down what is > causing the generator to get loaded in the first place. > > But something is forcing the SecureRandomSeedGenerator class to get > loaded, which I suspect > MersenneTwisterRNG constructor calls: this(DefaultSeedGenerator.getInstance().generateSeed(SEED_SIZE_BYTES)); Which in turn calls: private static final SeedGenerator[] GENERATORS = new SeedGenerator[] { new DevRandomSeedGenerator(), new RandomDotOrgSeedGenerator(), new SecureRandomSeedGenerator() }; In the definition of the class: DefaultSeedGenerator Unless the forking tests are disabled I don't see how to prevent the MersenneTwisterRNG to inderctly fetch entropy from /dev/random / SecureRandom. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
This is going to be a lot of fun. That class is in uncommons-math, and the connection to it from Mahout is hardly obvious. On Sat, Jan 16, 2010 at 5:34 PM, Benson Margulies wrote: > It looks as if this could be related to the loading of the SecureRandomSeedGenerator class. >>> > > Let's fix that class to defer until there's a good reason to make a seed. >
Re: Unit test lag?
It looks as if this could be related >>> to the loading of the SecureRandomSeedGenerator class. >> Let's fix that class to defer until there's a good reason to make a seed.
Re: Unit test lag?
2010/1/16 Benson Margulies : > . Running through strace showed >> that something was attempting to reading from /dev/random. Sometimes >> it ran fine, but at least 25-30% it ended up blocking until the >> entropy pool is refilled. To test I moved /dev/random, and created a >> link from /dev/urandom to /dev/random (the former doesn't block, but >> isn't cryptographically secure). It looks as if this could be related >> to the loading of the SecureRandomSeedGenerator class. >> I also experience the same slowdown Drew describes. ubuntu machines too. > Why not use a fixed random seed for unit tests? That would make them > more repeatable and avoid this problem, no? > +1 for the fixed seed (42 is my favorite seed). -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
Re: Unit test lag?
On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies wrote: > . Running through strace showed >> that something was attempting to reading from /dev/random. Sometimes >> it ran fine, but at least 25-30% it ended up blocking until the >> entropy pool is refilled. To test I moved /dev/random, and created a >> link from /dev/urandom to /dev/random (the former doesn't block, but >> isn't cryptographically secure). It looks as if this could be related >> to the loading of the SecureRandomSeedGenerator class. > > Why not use a fixed random seed for unit tests? That would make them > more repeatable and avoid this problem, no? > It appears we are. in RandomUtils: public static Random getRandom() { return testSeed ? new MersenneTwisterRNG(STANDARD_SEED) : new MersenneTwisterRNG(); } But something somewhere is forcing SecureRandomSeedGenerator to get loaded by the classloader which in turn does a 'new SecureRandom()' in a private static final field assignment. Trying to track down what is causing the generator to get loaded in the first place. But something is forcing the SecureRandomSeedGenerator class to get loaded, which I suspect
Re: Unit test lag?
. Running through strace showed > that something was attempting to reading from /dev/random. Sometimes > it ran fine, but at least 25-30% it ended up blocking until the > entropy pool is refilled. To test I moved /dev/random, and created a > link from /dev/urandom to /dev/random (the former doesn't block, but > isn't cryptographically secure). It looks as if this could be related > to the loading of the SecureRandomSeedGenerator class. > Why not use a fixed random seed for unit tests? That would make them more repeatable and avoid this problem, no?
Unit test lag?
Recently I've been noticing that Mahout's unit tests generally take a considerably long time to run, generally longer than what is reported in the individual test output. I took a look as to why this was the case and found a couple things: Mahout does per-test forking, which means we're forking off a new JVM for each unit text execution, this adds overhead to tests that takes 0.2s to complete. Is per-test forking strictly needed? I captured the command-line used to execute one of the forked tests (InMemInputSplitTest) by running mvn -X and executed it from the shell repeatedly using time see what was going on. In one of every few invocations, the test in question would report completion in 3s, but time reported a wall time 30s (!) or so. Running through strace showed that something was attempting to reading from /dev/random. Sometimes it ran fine, but at least 25-30% it ended up blocking until the entropy pool is refilled. To test I moved /dev/random, and created a link from /dev/urandom to /dev/random (the former doesn't block, but isn't cryptographically secure). It looks as if this could be related to the loading of the SecureRandomSeedGenerator class. I'm running on Ubuntu 9.04, kernel 2.6.28-17-server with the latest patches. Is anyone else experiencing similar slowness? Drew