Devin Jeanpierre wrote: > On Sat, Jan 4, 2014 at 6:27 PM, Steven D'Aprano > <steve+comp.lang.pyt...@pearwood.info> wrote: >> Fast is never more important than correct. It's just that sometimes you >> might compromise a little (or a lot) on what counts as correct in order >> for some speed. > > Is this statement even falsifiable? Can you conceive of a circumstance > where someone has traded correctness for speed, but where one couldn't > describe it that latter way? I can't.
Every time some programmer "optimises" a piece of code (or, more often, *thinks* they have optimised it) which introduces bugs into the software, that's a case where somebody has traded correctness for speed where my statement doesn't apply. Sometimes the response to the subsequent bug report is "will not fix", and a retroactive change in the software requirements. ("Oh, did we say that indexing a string would return a character? We meant it would return a character, so long as the string only includes no Unicode characters in the astral planes.") Sometimes it is to revert the optimisation or otherwise fix the bug. I accept that there is sometimes a fine line here. I'm assuming that software applications have their requirements fully documented, which in the real world is hardly ever the case. Although, even if the requirements aren't always written down, often they are implicitly understood. (Although it gets very interesting when the users' understanding and the developers' understanding is different.) Take as an example this "torture test" for a mathematical sum function, where the built-in sum() gets the wrong answer but math.fsum() gets it right: py> from math import fsum py> values = [1e12, 0.0001, -1e12, 0.0001]*10000 py> fsum(values) 2.0 py> sum(values) 2.4413841796875 Here's another example of the same thing, just to prove it's not a fluke: py> values = [1e17, 1, 1, -1e17] py> fsum(values) 2.0 py> sum(values) 0.0 The reason for the different results is that fsum() tries hard to account for intermediate rounding errors and sum() does not. If you benchmark the two functions, you'll find that sum() is significantly faster than fsum. So the question to be asked is, does sum() promise to calculate floating point sums accurately? If so, then this is a bug, probably introduced by the desire for speed. But in fact, sum() does not promise to calculate floating point sums accurately. What it promises to do is to calculate the equivalent of a + b + c + ... for as many values as given, and that's exactly what it does. Conveniently, that's faster than fsum(), and usually accurate enough for most uses. Is sum() buggy? No, of course not. It does what it promises, it's just that what it promises to do falls short of "calculate floating point summations to high accuracy". Now, here's something which *would* be a bug, if sum() did it: class MyInt(int): def __add__(self, other): return MyInt(super(MyInt, self).__add__(other)) def __radd__(self, other): return MyInt(super(MyInt, self).__radd__(other)) def __repr__(self): return "MyInt(%d)" % self Adding a zero MyInt to an int gives a MyInt: py> MyInt(0) + 23 MyInt(23) so sum() should do the same thing. If it didn't, if it optimised away the actual addition because "adding zero to a number can't change anything", it would be buggy. But in fact, sum() does the right thing: py> sum([MyInt(0), 23]) MyInt(23) > I think by definition you can > always describe it that way, you just make "what counts as > correctness" be "what the customer wants given the resources > available". Not quite. "Correct" means "does what the customer wants". Or if there is no customer, it's "does what you say it will do". How do we tell when software is buggy? We compare what it actually does to the promised behaviour, or expected behaviour, and if there is a discrepancy, we call it a bug. We don't compare it to some ideal that cannot be met. A bug report that math.pi does not have infinite number of decimal places would be closed as "Will Not Fix". Likewise, if your customer pays you to solve the Travelling Salesman Problem exactly, even if it takes a week to calculate, then nothing short of a program that solves the Travelling Salesman Problem exactly will satisfy their requirements. It's no good telling the customer that you can calculate a non-optimal answer twenty times faster if they want the actual optimal answer. (Of course, you may try to persuade them that they don't really need the optimal solution, or that they cannot afford it, or that you cannot deliver and they need to compromise.) > The conventional definition, however, is "what the > customer wants, imagining that you have infinite resources". I don't think the resources really come into it. At least, certainly not *infinite* resources. fsum() doesn't require infinite resources to calculate floating point summations to high accuracy. An even more accurate (but even slower) version would convert each float into a Fraction, then add the Fractions. > With just > a little redefinition that seems reasonable, you can be made never to > be wrong! I'm never wrong because I'm always right! *wink* Let's bring this back to the claim made at the beginning. Someone (Mark?) made a facetious comment about preferring fast code to correct code. Someone else (I forget who, and am too lazy to look it up -- Roy Smith perhaps?) suggested that we accept incorrect code if it is fast quite often. But I maintain that we don't. If we did, we'd explicitly say: "Sure, I know this program calculates the wrong answer, but gosh look how fast it is!" much like a anecdote I gave about the roadie driving in the wrong direction who stated "Who cares, we're making great time!". I maintain that people don't as a rule justify incorrect code on the basis of it being fast. They claim the code isn't incorrect, that any compromises made are deliberate and not bugs: - "sum() doesn't promise to calculate floats to high accuracy, it promises to give the same answer as if you repeatedly added them with the + operator." - "We never promised 100% uptime, we promised four nines uptime." - "Our anti-virus scanner is blindingly fast, while still identifying at least 99% of all known computer viruses!" - "The Unix 'locate' command doesn't do a live search of the file system because that would be too slow, it uses a snapshot of the state of the file system." Is locate buggy because it tells you what files existed the last time the updatedb command ran, instead of what files exist right now? No, of course not. locate does exactly what it promises to do. -- Steven -- https://mail.python.org/mailman/listinfo/python-list