Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)

2007-02-25 Thread Melissa O'Neill
Claus, you're absolutely right that my paper could be improved in a  
number of ways, and that there is actually a surprising amount of  
meat on these bones for further analysis.  We'll see what happens.   
If it actually gets accepted for JFP, it'll undoubtedly finally  
appear as a revised version with less controversial language.  In  
hindsight, using loaded words was a huge mistake,  because of the way  
it distracts from my main points.


But one thing I've learned is that it's better to do something  
imperfect and get it out there for people to read than to try to make  
it perfect and end up with a perpetual work in progress.  I had a  
busy summer last summer, and the paper was as good as I could make it  
in the very short amount of time I had.  It may have annoyed you in  
some ways, but hopefully it informed you in others.


Claus wrote:
something in me balks at the idea that the folkore version of the  
sieve is not "the" sieve *algorithm*, in spite of your detailed  
cross-off footprint and performance curve arguments.


I don't think you're alone in that viewpoint.

But it's clear to me that while some people are aware of the  
differences in crossing-off behavior, the performance costs that  
entails, and the way the ``folklore sieve'' becomes a dual of the  
very-naive-primes algorithm (trial division by all the primes less  
than n) -- and despite those differences, want to say it's all  
"okay"; there also are plenty of people who are unaware of these  
subtleties and believe that they have an honest-to-goodness  
functional implementation of one of the fastest prime finding methods  
out there, are hugely disappointed at the performance they get, and  
ascribe it not to poor algorithmic choices, but to the language they  
are working with.


As I allude to in my paper, I've talked to various people locally  
who've seen the ``folklore sieve'', and it's fascinating to watch the  
light go on for them, and hear exclamations like ``Oh!  So *that's*  
why it's so slow!''.  (FWIW, I think it was those reactions, which  
included for some a sense of having been duped, that lead me to feel  
okay with the rather strong terms I used in my paper.)


Claus also wrote:
but we are in the grey area between "permute until sorted" and  
"sort like this", and thinking of the modulus of a number as a  
static property (that might be derived incrementally from that of  
its predecessor) rather than an explicit check


I think modulus is a bit of a red herring -- its a symptom, not the  
disease.  I would argue that if "the agent that crosses off 17s" has  
to look at 19 and say "that's okay, let that one pass through",  
whether it passes it through based on modulus, or because it is less  
than 289, it's still an examination of 19 by the cross-off-17 agent  
where that agent has the power to allow 19 through or not.   Seen  
that way, I have a very hard time accepting the algorithm as a  
faithful implementation of the Sieve of Eratosthenes.   Similarly, I  
have a hard time accepting the faithfulness an algorithm where 70  
doesn't get a visit from the cross-off agents for 5 and 7.


But I agree though, that it may be possible to perform progressive  
transformations from a ``folklore sieve'', which doesn't cross off  
according to the proper conventions to something that does.   
Somewhere along the way, there's going to be a grey area, and arguing  
about definitions in that grey area is likely to be a waste of time.   
At some point in the middle, you'll have something that can be viewed  
from both perspectives.   But the existence of such a progression of  
algorithms between algorithm A and algorithm B doesn't mean that A  
and B are fundamentally the same.


as it happens, even if we run folklore3 over [2..], 70 will be  
crossed off
exactly once, by the sieve for 2. the sieves for 5 and 7 run over  
the gap

left behind, as they do in folklore and folklore2 (one could move that
gap jumping from sieve3 to insert, though..). the sieves for 5 and  
7 "know" about 70 the same way that (`mod`5) and (`mod`7) know  
about 70, but that knowledge isn't called upon for sieving, only to  
find greater numbers with modulus 0.


Perhaps we're saying the same thing and misunderstanding each other,  
but here's what I see on an instrumented version of your code that  
shows the state of affairs at each loop iteration:

...
   (Prime 67,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]),
   (Loops 68,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]),
   (Loops 69,[(3,69),(2,70),(5,70),(7,70),(11,121), ...]),
   (Loops 70,[(2,70),(5,70),(7,70),(3,72),(11,121), ...]),
   (Loops 71,[(5,70),(7,70),(2,72),(3,72),(11,121), ...]),
   (Loops 71,[(7,70),(2,72),(3,72),(5,75),(11,121), ...]),
   (Prime 71,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]),
   (Loops 72,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]),
   (Loops 73,[(3,72),(2,74),(5,75),(7,77),(11,121), ...]),
   (Prime 73,[(2,74),(3,75),(5,75),(7,77),(11,121), ...]),
 

Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)

2007-02-25 Thread Claus Reinke
This isn't a variant of the "folklore sieve", it's actually the real one!  

:-)

well, one thing i was trying for (and what perhaps one might like to see in a 
paper), is a way of relating the various code alternatives to each other, with 
suitably similar intermediate stages. not quite as detailed as deriving one from 
the other by program transformation, and not always meaning-preserving, but 
small enough steps that it becomes easier to understand the differences, and 
how one version might grow out of another, based on which performance 
concerns and implementation decisions. your paper does some of that, but 
the steps feel more like jumps to me.


we don't usually have compiler optimisations that could effectively transform
the folklore version of the sieve into any of the other prime enumerators we've
seen. and it is nice to see sieve added to the infamous quicksort as an example 
of a well-known to be beautiful, but little-known to be inefficiently realised 
specification (Lennart provided a convincing alternative for that other one, too;) 
- they are advertised and accepted unquestioned all too often. but something in 
me balks at the idea that the folkore version of the sieve is not "the" sieve 
*algorithm*, in spite of your detailed cross-off footprint and performance 
curve arguments.


i'm not saying you're wrong, and as one of those who prefer operational 
semantics
over denotational semantics in many contexts, i realise that it may be seen as 
odd
if i prefer to see the high-level versions as specifications of algorithms, 
when the
implementation behaviour might differ substantially from the intended algorithm.

but we are in the grey area between "permute until sorted" and "sort like this", 
and thinking of the modulus of a number as a static property (that might be 
derived incrementally from that of its predecessor) rather than an explicit check
for divisibility looks very similar to "this is quicksort, but i didn't really mean 
(++) to do appends". if one has to use quicksort on binary lists, one better 
uses Augustsson's append-free qsort instead of the folklore version, but is that

a better implementation, or a different algorithm? does it depend on how we
look at the question?

if you could make that part of your argument in a less controversial style, without 
loaded words like "real", "phony", "fooling", focussing more on observations than 
on any agenda (however innocent), this reader at least would find it easier to focus 
on the really interesting issues you raise for discussion. also, the present thread 
(one of many incarnations) demonstrates that there are other issues to be 
exposed and explored when discussing styles of prime sieves. not to mention 
the modern-again issue of transforming executable specifications to efficient 
implementations.


the progression 

   mod sieves (folklore) 
   ->start (sieve p) at p*p

   ->incremental sieves (folklore2)
   ->merged incremental sieves (folklore3)
   ->more efficient representation of merged sieves (your priority queue 
version)
   ->other useful optimisations and tweaks that further hide the ideas (your 
fastest
   versions) ..

makes it easier for me to see how the initial program relates to the others, and the 
closeness of folklore3 to one of your versions is intentional, as are the differences.


the versions are not quite equivalent (eg folklore2/3 go wrong earlier than 
folklore when using Int instead of Integer), but if there is any difference wrt
the cross-out footprint (apart from the p*p thing), it is likely to be in the precise 
organisation of sieve merging after folklore2. otherwise, they seem to embody 
fairly natural improvement steps of the original specification-level code.


Let's take a look at ``pns'' at the 20th prime, having just  
found that 67 is prime (we're about to discover that 71 is prime):


[(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), 
(29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), 
(59,3481),(61,3721),(67,4489)]


As you can see, 70 will be crossed off twice, once by the 5 iterator  
and once by the iterator for 7.  And we won't think about 11, 13,  
etc. until they're needed.


as it happens, even if we run folklore3 over [2..], 70 will be crossed off
exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap
left behind, as they do in folklore and folklore2 (one could move that
gap jumping from sieve3 to insert, though..). the sieves for 5 and 7 
"know" about 70 the same way that (`mod`5) and (`mod`7) know 
about 70, but that knowledge isn't called upon for sieving, only to find 
greater numbers with modulus 0.


in contrast, sieves in genuine will replace non-primes by zero, so later 
sieves can still run into the same position. whether we count the zero in
that position as a gap, to be left intact, or as a ghost of the non-prime, to 
be zeroed again, would seem to be a matter of viewpoint?


for me, the question is more on

Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)

2007-02-25 Thread Melissa O'Neill

Claus Reinke wrote:
   folklore3: merging the sieves, instead of stacking them as  
implicit thunks


Here's Claus's code:
primes3 = 2 : folklore3 [] [3,5..]


folklore3 pns xs = x : folklore3 (insert (x,x*x) pns') xs'
  where (x,pns',xs') = sieve3 pns xs

sieve3 ((p,n):pns) (x:xs) | x  | otherwise = sieve3 (insert (p,(n+p))  
pns) ([x|x>n]++xs)

sieve3 []  (x:xs) = (x,[],xs)

insert (p,n) []= [(p,n)]
insert (p,n) ((p',n'):pns) | n<=n' = (p,n):(p',n'):pns
   | otherwise = (p',n'):insert (p,n) pns


This isn't a variant of the "folklore sieve", it's actually the real  
one!  Let's take a look at ``pns'' at the 20th prime, having just  
found that 67 is prime (we're about to discover that 71 is prime):


[(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), 
(29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), 
(59,3481),(61,3721),(67,4489)]


As you can see, 70 will be crossed off twice, once by the 5 iterator  
and once by the iterator for 7.  And we won't think about 11, 13,  
etc. until they're needed.


This algorithm is doing pretty much exactly what mine does, except  
that in mine I use a priority queue to represent this information.   
In fact, with my most recent code, I'd only store (3,69), (5,70),  
(7,70) in the priority queue and keep (11,121), (13,169), ...,  
(67,4489) in a feeder queue so that I only use the power of the  
priority queue when I need it.


Melissa.

P.S. Mine actually wouldn't have quite the same numbers, because I go  
to a bit of trouble to skip numbers like 70 which will never actually  
show up in the x:xs list.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)

2007-02-25 Thread Claus Reinke

Algorithm 10^310^4 10^5 10^6 10^7 10^8
 Reinke0.04  1.2141.00- - -
- Reinke is the ``applyAt'' algorithm Claus Reinke posted here


while I'm not unhappy that this code is performing reasonably well, it was 
mostly an attempt to get closer to what some perceived as the original sieve 
specification, with some minor modifications. for performance comparisons, 
that has the huge disadvantage of having to lug around an increasing ballast
of crossed-out numbers. with a straightforward run-length encoding of those 
zeroes, the code is likely to scale rather better (one dash less, and closer to

Naive;-).

there are also some obvious improvements to the folklore sieve that bring it
closer (in spirit, and partially in performance) to the faster versions. 


attached is the code for my own experiments, for your entertainment, where

   folklore: the folklore "sieve"
   folklore2: without mod, starting (sieve p) from p*p
   folklore3: merging the sieves, instead of stacking them as implicit thunks

   genuine: the applyAt version, dubbed "Reinke"
   genuineRLC: with run-length coding for all those zeroed-out positions

   dual: the not so "naive", fairly fast one; for reference (speed limit for 
above)


btw, given that the "naive" algorithm performs so well, perhaps it isn't all that naive 
after all? it does seem to encode the observed effects of nested sieves, without 
encoding the sieves themselves and their ballast. perhaps there is a corresponding,
"dual" version of the wheels as well, eliminating their currently major disadvantage 
of managing the ever growing wheels in explicit form? for instance, the current

naive/dual algorithm computes the (x `mod` p) from scratch for each x, instead
of incrementally from ((x-1)`mod`p).

claus


Sieve.hs
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe