Re: [Haskell-cafe] Re: FASTER primes

Daniel Fischer Mon, 04 Jan 2010 15:34:51 -0800

Am Montag 04 Januar 2010 19:16:32 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness:
> > > Euler's sieve is
> > >
> > >  sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
> > >                       where (h,t) = span (< p*p) xs
> >
> > Not quite. That's a variant of the postponed filters, it crosses off e.g.
> > 45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed
> > by the first, so let's say 45 appears in two lists of numbers to be
> > removed if present).
>
> there won't be any such. whatever is removed first, is just skipped second
> (and third, etc). 45 does appear twice on the two multiples ists (3- and
> 5-). But it is "removed" by first, and skipped by second. And there's no
> removal here in the first place. There is no pre-filled storage. All there
> is, is some lists comparison, and lists are just generators (I so keep
> hoping).


((45:(offer 47 when demanded)) `minus` (45:(next will be 51 when demanded))) 
`minus` (45:
(next will be 55 when demanded))

So there are two attempts to tell the generator to not output 45. To the 
second, it 
answers "I already knew that", but the request is made nevertheless.

Lists sometimes are data structures occupying physical storage (cons cell, 
point to value, 
pointer to next...), sometimes generators (when asked for the next value, I'll 
answer 
this). I say a value is removed from a list regardless of whether it takes 
relinking 
pointers in a list of cons cells or it involves telling the generator not to 
give out that 
value. I could also say 'eliminated'. There are two attempts to eliminate 45.

>
> I don't see any problem here. As Melissa (and yourself, I think) have
> shown, double hits on multiples are few and far between.

It's not a problem, it just means it's not Euler's sieve, because that attempts 
to 
eliminate each composite exactly once.

>
> Also, it uses no filters, i.e. no individual number divisibility testing.
> The "filters" refers first of all to testing an individual number to decide
> whether to keep it in or not.

Umm, the postponed filters says keep everything until p^2, then eliminate 
(filter out, 
remove) multiples of p in the remainder, after that, pick next prime.
That's precisely what the above does. It doesn't do the filtering out by 
divisibility 
testing but by minus (hence more efficiently). I would say that's a variant of 
the 
postponed filters.

> Euler's sieve removes multiples in advance,
> so there's no testing and no filtering, only comparison. It follows the
> _Postponed_ Filter's framework in postponing the action until the right
> moment; the action itself is two lists comparison and skipping of the
> equals (i.e. the "minus" action).
>
> > Euler's sieve is never attempting to remove a number more than once,
> > that's
>
> How's that possible?

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Euler.27s_Sieve

"Euler in his Proof of the Euler product formula for the Riemann zeta function 
came up 
with a version of the sieve of Eratosthenes, *better in the sense that each 
number was 
eliminated exactly once*."

C) The number after the previous prime is also a prime. *Multiply each number 
in the list 
starting from this prime by this prime and discard the products*.

> On wikipedia is says, it removes multiples of 3; then
> multiples of 5; etc. So it just looks for each number it is removing, if it
> is there, and if so, removes it. I would argue that looking whether to
> remove or not, is part of attempting to remove. It can't have foresight,
> right?
>

But it has :) By only trying to eliminates products of the prime p currently 
under 
consideration *with numbers (>= p) which have not yet been eliminated from the 
list*, it 
is known in advance that all these products are still in the list.

When p is under consideration, the list contains (apart from the primes < p) 
precisely the 
numbers whose smallest prime factor is >= p.

> > the outstanding feature of it. Unfortunately, that also makes it hard to
> > implement efficiently. The problem is that you can't forget the primes
> > between p and p^2, you must remember them for the removal of multiples of
> > p later on.
>
> you're right, every formulation of these algorithms is always done in the
> imperative, mutable-storage setting. They always speak of "removing"
> numbers etc.

sed s/remove/eliminate/ ?

>
> The more pressing problem with that code is its linear structure of course
> which gets addressed by the tree-folding merge etc.
>

Which unfortunately introduces its own space problem :(

> >  An (inefficient but faithful) implementation would be
> >
> > euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
>
> I think it does exactly the same thing, computationally,

No, in map (*p) ks, there are no numbers with a smaller prime factor than p, 
while in 
map (*p) [p, p+2 .. ] there are (for p > 3)

> except it does so,
> again,  _prematurely_  exactly in Turner's sieve's fashion - for each prime
> found, _as_ _soon_ as it is found. If it is faithful, so is my code. :)
>

It is faithful in that it tries to eliminate each composite exactly once.

Try a different minus:

xxs@(x:xs) `minus` yys@(y:ys)
   = case compare x y of
       LT -> x : xs `minus` yys
       EQ -> xs `minus` ys
       GT -> error ("trying to remove " ++ show y ++ " a second time")

Your code is not. It is, however, much faster.

> > Its performance is really horrible though.
>
> makes sense, as for Turner's.

It's far worse than Turner's. The "map (*p) ks" holds on to things really 
really long.
Measure it, just for teh lulz.

> See, the _when_ thing really helps to see
> what's going on here.
>
> > A much better implementation is
> >
> > primes = 2:3:euler hprs [] (scanl (+) 5 (cycle [2,4]))
> >   where
> >     hprs = 5:drop 3 primes
> >     euler (p:ps) acc cs = h ++ euler ps (tail (acc ++ h)) (t `minus`
> > comps) where
> >         (h,t) = span (< p*p) cs
> >         comps = map (*p) (acc ++ cs)
>
> this look like {2,3} wheel reimplemented and inlined.

The {2,3} wheel thing isn't important. It could equally well (well, a bit 
slower) be

primes = euler hprs [] [2 .. ]
   where
      hprs = 2:tail primes

It just occured to me that the accumulation list is just (takeWhile (< head cs) 
(p:ps)), 
so we could improve it as

euler (p:ps) cs = h ++ euler ps (t `minus` comps)
      where
        (h,t) = span (< p*p) cs
        comps = map (*p) (takeWhile (< head cs) (p:ps) ++ cs)

Still Bad.

> No point in improving
> anything until the linear structure isn't turned into a tree.
>
You're a tree hugger? :D
No, the point isn't linear vs. tree, it's that Euler's sieve is a) great for 
theoretical 
reasoning (look at a proof of Euler's product formula for the (Riemann) 
Zeta-function to 
appreciate it) b) reasonably good for sieving with a list of numbers written on 
paper, but 
c) not good to implement on a computer. For the computer, keeping track of 
which numbers 
are left is so much more complicated than just ticking off numbers a couple of 
times that 
it isn't even worth attempting it.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: FASTER primes

Reply via email to