Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)
Claus, you're absolutely right that my paper could be improved in a number of ways, and that there is actually a surprising amount of meat on these bones for further analysis. We'll see what happens. If it actually gets accepted for JFP, it'll undoubtedly finally appear as a revised version with less controversial language. In hindsight, using loaded words was a huge mistake, because of the way it distracts from my main points. But one thing I've learned is that it's better to do something imperfect and get it out there for people to read than to try to make it perfect and end up with a perpetual work in progress. I had a busy summer last summer, and the paper was as good as I could make it in the very short amount of time I had. It may have annoyed you in some ways, but hopefully it informed you in others. Claus wrote: something in me balks at the idea that the folkore version of the sieve is not "the" sieve *algorithm*, in spite of your detailed cross-off footprint and performance curve arguments. I don't think you're alone in that viewpoint. But it's clear to me that while some people are aware of the differences in crossing-off behavior, the performance costs that entails, and the way the ``folklore sieve'' becomes a dual of the very-naive-primes algorithm (trial division by all the primes less than n) -- and despite those differences, want to say it's all "okay"; there also are plenty of people who are unaware of these subtleties and believe that they have an honest-to-goodness functional implementation of one of the fastest prime finding methods out there, are hugely disappointed at the performance they get, and ascribe it not to poor algorithmic choices, but to the language they are working with. As I allude to in my paper, I've talked to various people locally who've seen the ``folklore sieve'', and it's fascinating to watch the light go on for them, and hear exclamations like ``Oh! So *that's* why it's so slow!''. (FWIW, I think it was those reactions, which included for some a sense of having been duped, that lead me to feel okay with the rather strong terms I used in my paper.) Claus also wrote: but we are in the grey area between "permute until sorted" and "sort like this", and thinking of the modulus of a number as a static property (that might be derived incrementally from that of its predecessor) rather than an explicit check I think modulus is a bit of a red herring -- its a symptom, not the disease. I would argue that if "the agent that crosses off 17s" has to look at 19 and say "that's okay, let that one pass through", whether it passes it through based on modulus, or because it is less than 289, it's still an examination of 19 by the cross-off-17 agent where that agent has the power to allow 19 through or not. Seen that way, I have a very hard time accepting the algorithm as a faithful implementation of the Sieve of Eratosthenes. Similarly, I have a hard time accepting the faithfulness an algorithm where 70 doesn't get a visit from the cross-off agents for 5 and 7. But I agree though, that it may be possible to perform progressive transformations from a ``folklore sieve'', which doesn't cross off according to the proper conventions to something that does. Somewhere along the way, there's going to be a grey area, and arguing about definitions in that grey area is likely to be a waste of time. At some point in the middle, you'll have something that can be viewed from both perspectives. But the existence of such a progression of algorithms between algorithm A and algorithm B doesn't mean that A and B are fundamentally the same. as it happens, even if we run folklore3 over [2..], 70 will be crossed off exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap left behind, as they do in folklore and folklore2 (one could move that gap jumping from sieve3 to insert, though..). the sieves for 5 and 7 "know" about 70 the same way that (`mod`5) and (`mod`7) know about 70, but that knowledge isn't called upon for sieving, only to find greater numbers with modulus 0. Perhaps we're saying the same thing and misunderstanding each other, but here's what I see on an instrumented version of your code that shows the state of affairs at each loop iteration: ... (Prime 67,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]), (Loops 68,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]), (Loops 69,[(3,69),(2,70),(5,70),(7,70),(11,121), ...]), (Loops 70,[(2,70),(5,70),(7,70),(3,72),(11,121), ...]), (Loops 71,[(5,70),(7,70),(2,72),(3,72),(11,121), ...]), (Loops 71,[(7,70),(2,72),(3,72),(5,75),(11,121), ...]), (Prime 71,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]), (Loops 72,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]), (Loops 73,[(3,72),(2,74),(5,75),(7,77),(11,121), ...]), (Prime 73,[(2,74),(3,75),(5,75),(7,77),(11,121), ...]),
Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)
This isn't a variant of the "folklore sieve", it's actually the real one! :-) well, one thing i was trying for (and what perhaps one might like to see in a paper), is a way of relating the various code alternatives to each other, with suitably similar intermediate stages. not quite as detailed as deriving one from the other by program transformation, and not always meaning-preserving, but small enough steps that it becomes easier to understand the differences, and how one version might grow out of another, based on which performance concerns and implementation decisions. your paper does some of that, but the steps feel more like jumps to me. we don't usually have compiler optimisations that could effectively transform the folklore version of the sieve into any of the other prime enumerators we've seen. and it is nice to see sieve added to the infamous quicksort as an example of a well-known to be beautiful, but little-known to be inefficiently realised specification (Lennart provided a convincing alternative for that other one, too;) - they are advertised and accepted unquestioned all too often. but something in me balks at the idea that the folkore version of the sieve is not "the" sieve *algorithm*, in spite of your detailed cross-off footprint and performance curve arguments. i'm not saying you're wrong, and as one of those who prefer operational semantics over denotational semantics in many contexts, i realise that it may be seen as odd if i prefer to see the high-level versions as specifications of algorithms, when the implementation behaviour might differ substantially from the intended algorithm. but we are in the grey area between "permute until sorted" and "sort like this", and thinking of the modulus of a number as a static property (that might be derived incrementally from that of its predecessor) rather than an explicit check for divisibility looks very similar to "this is quicksort, but i didn't really mean (++) to do appends". if one has to use quicksort on binary lists, one better uses Augustsson's append-free qsort instead of the folklore version, but is that a better implementation, or a different algorithm? does it depend on how we look at the question? if you could make that part of your argument in a less controversial style, without loaded words like "real", "phony", "fooling", focussing more on observations than on any agenda (however innocent), this reader at least would find it easier to focus on the really interesting issues you raise for discussion. also, the present thread (one of many incarnations) demonstrates that there are other issues to be exposed and explored when discussing styles of prime sieves. not to mention the modern-again issue of transforming executable specifications to efficient implementations. the progression mod sieves (folklore) ->start (sieve p) at p*p ->incremental sieves (folklore2) ->merged incremental sieves (folklore3) ->more efficient representation of merged sieves (your priority queue version) ->other useful optimisations and tweaks that further hide the ideas (your fastest versions) .. makes it easier for me to see how the initial program relates to the others, and the closeness of folklore3 to one of your versions is intentional, as are the differences. the versions are not quite equivalent (eg folklore2/3 go wrong earlier than folklore when using Int instead of Integer), but if there is any difference wrt the cross-out footprint (apart from the p*p thing), it is likely to be in the precise organisation of sieve merging after folklore2. otherwise, they seem to embody fairly natural improvement steps of the original specification-level code. Let's take a look at ``pns'' at the 20th prime, having just found that 67 is prime (we're about to discover that 71 is prime): [(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), (29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), (59,3481),(61,3721),(67,4489)] As you can see, 70 will be crossed off twice, once by the 5 iterator and once by the iterator for 7. And we won't think about 11, 13, etc. until they're needed. as it happens, even if we run folklore3 over [2..], 70 will be crossed off exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap left behind, as they do in folklore and folklore2 (one could move that gap jumping from sieve3 to insert, though..). the sieves for 5 and 7 "know" about 70 the same way that (`mod`5) and (`mod`7) know about 70, but that knowledge isn't called upon for sieving, only to find greater numbers with modulus 0. in contrast, sieves in genuine will replace non-primes by zero, so later sieves can still run into the same position. whether we count the zero in that position as a gap, to be left intact, or as a ghost of the non-prime, to be zeroed again, would seem to be a matter of viewpoint? for me, the question is more on
Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)
Claus Reinke wrote: folklore3: merging the sieves, instead of stacking them as implicit thunks Here's Claus's code: primes3 = 2 : folklore3 [] [3,5..] folklore3 pns xs = x : folklore3 (insert (x,x*x) pns') xs' where (x,pns',xs') = sieve3 pns xs sieve3 ((p,n):pns) (x:xs) | x | otherwise = sieve3 (insert (p,(n+p)) pns) ([x|x>n]++xs) sieve3 [] (x:xs) = (x,[],xs) insert (p,n) []= [(p,n)] insert (p,n) ((p',n'):pns) | n<=n' = (p,n):(p',n'):pns | otherwise = (p',n'):insert (p,n) pns This isn't a variant of the "folklore sieve", it's actually the real one! Let's take a look at ``pns'' at the 20th prime, having just found that 67 is prime (we're about to discover that 71 is prime): [(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), (29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), (59,3481),(61,3721),(67,4489)] As you can see, 70 will be crossed off twice, once by the 5 iterator and once by the iterator for 7. And we won't think about 11, 13, etc. until they're needed. This algorithm is doing pretty much exactly what mine does, except that in mine I use a priority queue to represent this information. In fact, with my most recent code, I'd only store (3,69), (5,70), (7,70) in the priority queue and keep (11,121), (13,169), ..., (67,4489) in a feeder queue so that I only use the power of the priority queue when I need it. Melissa. P.S. Mine actually wouldn't have quite the same numbers, because I go to a bit of trouble to skip numbers like 70 which will never actually show up in the x:xs list. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)
Algorithm 10^310^4 10^5 10^6 10^7 10^8 Reinke0.04 1.2141.00- - - - Reinke is the ``applyAt'' algorithm Claus Reinke posted here while I'm not unhappy that this code is performing reasonably well, it was mostly an attempt to get closer to what some perceived as the original sieve specification, with some minor modifications. for performance comparisons, that has the huge disadvantage of having to lug around an increasing ballast of crossed-out numbers. with a straightforward run-length encoding of those zeroes, the code is likely to scale rather better (one dash less, and closer to Naive;-). there are also some obvious improvements to the folklore sieve that bring it closer (in spirit, and partially in performance) to the faster versions. attached is the code for my own experiments, for your entertainment, where folklore: the folklore "sieve" folklore2: without mod, starting (sieve p) from p*p folklore3: merging the sieves, instead of stacking them as implicit thunks genuine: the applyAt version, dubbed "Reinke" genuineRLC: with run-length coding for all those zeroed-out positions dual: the not so "naive", fairly fast one; for reference (speed limit for above) btw, given that the "naive" algorithm performs so well, perhaps it isn't all that naive after all? it does seem to encode the observed effects of nested sieves, without encoding the sieves themselves and their ballast. perhaps there is a corresponding, "dual" version of the wheels as well, eliminating their currently major disadvantage of managing the ever growing wheels in explicit form? for instance, the current naive/dual algorithm computes the (x `mod` p) from scratch for each x, instead of incrementally from ((x-1)`mod`p). claus Sieve.hs Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe