Paul Rubin <http://[EMAIL PROTECTED]> wrote
(re <http://www.sailmaker.co.uk/newfiles/dice.py>) > I looked at this for a minute Thanks for taking the time to look over it; I greatly appreciate it! > and found it carefully written but somewhat hard to understand. Obviously, as the author, I have a blind spot about others understanding it, but I always strive for my code to be self-commenting and understandable at first sight, so I would genuinely welcome feedback on which parts are difficult to understand and need more commenting. Perhaps it is insufficient commenting of the implementations of the various subclasses of class Dice? I admit /they/ get a but funky in places. > I think it's maybe more OOP-y than > Python programs usually are; perhaps it's Java-like. That's interesting: while my background is certainly more OO than functional, Java is the language I use least. Certainly I relish the freedom to mix OO, functional and procedural paradigms that you get with Python, so if there are ways in which that mixture can improve dice.py, please enlighten me! On the other hand, is "being more OOP-y than Python programs usually are" a bad thing in itself, if the object analysis is cleanly designed? > There are some > efficiency issues that would be serious if the code were being used on > large problems, but probably no big deal for "3D6" or the like. In > particular, you call histogram.numSamples() in a bunch of places and > each call results in scanning through the dict. A good point which I did consider, however after practical tests I ruled in favour of code simplicity for the case in hand, while being careful to leave the Histogram class open to optimisations such as you describe via subclassing. An earlier draft of the Histogram class did indeed have each method do a single pass through the dict, accumulating what is now numSamples(), sigma() and sigmaSquared() in local variables. But in practice I found that for my problem domain, even a 'large' problem would have relatively few histogram buckets, and computing the summary statistics for the histogram took a /miniscule/ amount of time compared to the time taken by generating the data. On that basis I refactored the Histogram class to follow the "once and only once" coding principle, resulting in a much clearer and more concise Histogram class, with very simple and natural code that any high-school statisician can immediately see is correct. I found no measurable perfomance hit. > You might want to cache this value in the histogram instance. On the same basis as above, I decided against caching any results in the Histogram class, preferring simplicity of code to the additional logic that would be required to manage cache invalidation upon arrival of new samples. I've learned the hard way that caching, while a dramatic optimisation in the right place, introduces a lot of complexity and pitfalls (even more so in multi-threaded scenarios), so these days I only do it when performance analysis reveals the need. Of course, for histograms with huge numbers of buckets, there may well be mileage in subclassing the Histogram class to implement the above optimisations, and I think the class is sufficiently cleanly designed to facilitate that. > Making histogram a subclass of dict also seems a little bit peculiar. Why? Genuine question. Keep in mind that I might want to re-use the Histogram class for cases where the buckets may be floating point, so I can't just use a vector. I wanted a mapping of bucket->frequency with an easy way to add a new sample by incrementing the frequency, so dict seemed a natural choice. I am of course open to better ideas! > You might find it more in the functional programming spirit to use > gencomps rather than listcomps in a few places: > > return sum([freq * val for val, freq in self.items()]) > becomes > return sum((freq * val) for val, freq in self.iteritems()) Nice: thanks for making me aware of gencomps! I agree that they're better, but when at home I'm on Mac OS X and so stuck with Python 2.3 unless I engage in the necessary shenanigans to slap Python 2.4 onto Mac OS X 10.4.7. Yes, I know how to do it, I've done it before, but it's a PITA and I don't want to deal with it any more. Moreover I want my code to run on vanilla installations of Mac OS X 10.4.x, so I just put up with it and code to Python 2.3. I'm on Apple's developer program, so if anyone has a Radar bug number I can cite while requesting they pull their socks up and ship Python 2.4 with the OS, just let me know and I'll cite it to Apple. Richard. -- http://mail.python.org/mailman/listinfo/python-list