On Tue, May 01, 2018 at 05:13:13PM +0000, IntegratedDimensions via Digitalmars-d wrote: [...] > The point of O is for the most dominant rate of growth(asymptotic > behavior). In mathematics, one only cares about n as it approaches > infinity and so any constant term will eventually be dwarfed. So > technically, in the theory, it is "incorrect" to have any extraneous > terms.
Well, yes. Of course the whole idea behind big O is asymptotic behaviour, i.e., behaviour as n becomes arbitrarily large. Unfortunately, as you point out below, this is not an accurate depiction of the real world: > In CS, it is used to approximate the time for large n to be able to > compare different algorithms in the "long run". Since computers cannot > process infinite n, n will be finite and generally relatively > small(e.g., less than 10^100, which is quite small compared to > infinity). Exactly, and therefore the model does not quite match reality, and so when the scale of n matters in reality, then the conclusions you draw from big O will be inaccurate at best, outright wrong at worst. > So the failure you are pointing out is really because the application. > In some cases the constant term may be applicable and same cases it > isn't. Since it depends on n and we cannot use the simplification > that n is infinity, it matters what n is. This is why it is also > important to know which algorithms do better for small n because if n > is small during program use one might be using the wrong algorithm. Exactly. Yet this important point is often overlooked / neglected to be mentioned when big O is taught to CS students. I'm not saying big O is useless -- it has its uses, but people need to be aware of its limitations rather than blindly assuming (or worse, being told by an authority) that it's a magical pill that will solve everything. > But once one goes down this road one then has to not count ideal > cycles but real cycles and include all the other factors involved. Big > O was not mean to give real world estimates because it is a > mathematical domain. It may or may not work well depending on how > poorly it is used, sorta like statistics. Generally though, they are > such a great simplification tool that it works for many processes that > are well behaved. I agree that big O is a wonderful simplification in many cases. But this comes with caveats, and my complaint was that said caveats are more often than not overlooked or neglected. To use a concrete example: traditional big O analysis says a hashtable is fastest, being O(1), especially when the hash function minimizes collisions. Minimal collisions means short linear search chains when multiple entries fall into the same bucket, i.e., we stay close to O(1) instead of the worst case of O(n) (or O(log n), depending on how you implement your buckets). In certain situations, however, it may actually be advantageous to *increase* collisions with a locality-sensitive hash function, because it increases the likelihood that the next few lookups may already be in cache and therefore doesn't incur the cost of yet another cache miss and RAM/disk roundtrip. The buckets are bigger, and according to big O analysis "slower", because each lookup incurs the cost of O(n) or O(log n) search within a bucket. However, in practice it's faster, because it's expensive to load a bucket into the cache (or incur disk I/O to read a bucket from disk). If lookups are clustered around similar keys and end up in the same few buckets, then once the buckets are cached any subsequent lookups become very cheap. Large buckets actually work better because instead of having to incur the cost of loading k small buckets, you just pay once for one large bucket that contains many entries that you will soon access in the near future. (And larger buckets are more likely to contain entries you will need soon.) Also, doing an O(n) linear search within small-ish buckets may actually be faster than fancy O(log n) binary trees, due to the CPU's cache predictor. A linear scan is easy for the cache predictor to recognize and load in a series of consecutive cache lines, thus amortizing away the RAM roundtrip costs, whereas with a fancy binary tree the subsequent memory access is hard to predict (or may have no predictable pattern), so the predictor can't help you, and you have to pay for the RAM roundtrips. When n gets large, of course, the binary tree will overtake the performance of the linear search. But the way big O is taught in CS courses gives the wrong impression that O(n) linear search is always inferior and therefore bad and to be avoided. Students need to be told that this is not always the case, and that there are times when O(n) is actually better than O(log n). > Ideally it would be better to count the exact number of cycles used by > an algorithm and have them normalized to some standard cycle that > could be compared across different architectures. Machines can do the > accounting easily. Even then, many anomalous behavior will generally > be seen but it would be more accurate, although possibly not much more > informative, than Big O. [...] Sorry, counting cycles does not solve the problem. That may have worked back in the days of the 8086, but CPU architectures have moved on a long way since then. These days, cache behaviour is arguably far more important than minimizing cycles, because your cycle-minimized algorithm will do you no good if every other cycle the instruction pipeline has to be stalled because of branch hazards, or because of a cache miss that entails a RAM roundtrip. Furthermore, due to the large variety of cache structures out there, it's unrealistic to expect a single generalized cycle-counting model to work for all CPU architectures. You'd be drowning in nitty gritty details instead of getting useful results from your analysis. CPU instruction pipelines, out-of-order execution, speculative execution, etc., will complicate the analysis so much that a unified model that works across all CPUs would pretty much be impossible. A more promising approach that has been pursued in the recent years is the cache-oblivious model, where the algorithm is designed to take advantage of a cache hierarchy, but *without* depending on any specific one. I.e., it is assumed that linear access of N elements sequentially across blocks of size B will be faster than N random accessese, but the algorithm is designed in such a way that it does not depend on specific values of B and N, and it does not need to be tuned to any specific value of B and N. This model has shown a lot of promise in algorithms that may in theory have "poor" big O behaviour, but in practice operates measurably faster because they take advantage of the modern CPU cache hierarchy. As an added bonus, the cache hierarchy analysis naturally extends to include secondary storage like disk I/O, and the beauty of cache oblivious algorithms is that they can automatically take advantage of this without needing massive redesign, unlike the earlier situation where you have to know beforehand whether your code is going to be CPU-bound or disk-bound, and you may have to use completely different algorithms in either case. Or you may have to hand-tune the parameters of your algorithm (such as optimize for specific disk block sizes). A cache-oblivious can Just Work(tm) without further ado, and without sudden performance hits. T -- One reason that few people are aware there are programs running the internet is that they never crash in any significant way: the free software underlying the internet is reliable to the point of invisibility. -- Glyn Moody, from the article "Giving it all away"
