On Wed, 05 Sep 2012 11:43:08 +0200, Johannes Bauer wrote: > On 05.09.2012 11:24, Johannes Bauer wrote: > >> Consider sorting a large dictionary. Sorting is in O(n log(n)), but >> this assumes O(1) comparisons! If the comparison operation itself were >> in O(n), the total sorting complexity would be O(n^2 log(n)),
This shows some significant confusion about Big Oh complexity analysis here. When you say that sorting the list of words is O(N log N), the N refers to the number of words in the dictionary. But when you speak about comparisons being O(N), the N doesn't refer to the size of the dict, but the size of the individual strings being compared. There is no reasonable comparison algorithm where the effort required to compare two strings depends on the size of the dictionary they have come from. Since these are *completely different Ns*, you can't combine them to get O(N**2 log N) as the overall algorithmic complexity. Doing so is sheer nonsense. If you state it like this: sorting the list of words is O(N log N) where N = length of the list comparing the words is O(M) where M = the average length of the words then the overall complexity is O(N log N)*O(M), but since M is a constant for any particular dictionary (its just the average length of the words) that reduces down to O(N log N). To say that sorting a dictionary is O(N log N) does *not* imply that string comparisons are O(1). It only implies that string comparisons don't depend on the size of the dictionary. Comparing: s1 = 'a'*30002 s2 = 'a'*30001 + 'b' takes the same amount of time whether they are in a dictionary of 100 words or one of 100000 words. For the purposes of calculating the algorithmic complexity of sorting a large list of words, we can treat comparisons as an elementary operation even though they actually aren't. >> which is >> definitely false. Most comparisons will abort *very* early (after the >> first character). You are making unjustified assumptions about the distribution of letters in the words. This might be a list of long chemical compounds where the words typically differ only in their suffix. It might be a list of people with titles: Herr Professor Frederick Schmidt Herr Professor Frederick Wagner ... But either way, it doesn't matter for the Big Oh behaviour of sorting the words. > I've just written a program to demonstrate this (below it's attached). I have no idea why you think this program demonstrates anything about string equality comparisons. What you are counting is not the average number of comparisons for string equality, but the number of comparisons when sorting. These are *completely different*, because sorting a list does not require testing each string against every other string. Whatever you have measured, it has nothing to do with the Big Oh complexity of string comparisons. It seems to me that you have misunderstood the purpose of Big Oh notation. It gives an asymptotic upper bound on the amount of work done, ignoring all lower terms and multiplicative factors, not an exact formula. If something is O(N), then this tells you: 1) the number of algorithmic steps is proportional to N, plus or minus some terms which are less than N (e.g. sqrt(N), or log(N), or 1/N, or some constant, etc.); 2) if you ignore those other terms, and keep all other influences equal, then doubling N will *at worst* double the number of algorithmic steps; 3) if all those algorithmic steps take exactly the same amount of time, then doubling N will take twice as much time. But of course *none* of those assumptions hold for measurements of actual elapsed time. In real life, operations rarely take constant time, you don't always see worst case behaviour, and the lower terms are not necessarily insignificant enough to ignore. [...] > So, interestingly, in this real-world case(tm), the complexity does not > scale with O(n). It actually goes down (which is probably due to the > limit amount of words of higher length). I would guess that it probably has more to do with the ability of the timsort algorithm to exploit local order within a list and avoid performing comparisons. > In summary, let's please not debate "feelings" or such. I have no idea what feelings you are referring to. -- Steven -- http://mail.python.org/mailman/listinfo/python-list