Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://www.haskell.org/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. Re: Performance of Prime Generator (Zhi-Qiang Lei) 2. numerical integration over lists (Thomas Engel) 3. Re: Is foldr slow? (Zhi-Qiang Lei) ---------------------------------------------------------------------- Message: 1 Date: Sun, 5 Feb 2012 13:35:52 +0800 From: Zhi-Qiang Lei <zhiqiang....@gmail.com> Subject: Re: [Haskell-beginners] Performance of Prime Generator To: Haskell Beginer <beginners@haskell.org> Message-ID: <cabced26-fb06-400c-b7d2-044256ef9...@gmail.com> Content-Type: text/plain; charset=iso-8859-1 Hi Guys, I just find the STUArray you are using is strange for me. Is there any tutorial for that? Google didn't help. Thanks. On Feb 5, 2012, at 10:30 AM, Daniel Fischer wrote: > Sorry for the late reply, haven't checked my mail for a while. > > On Tuesday 24 January 2012, 13:50:32, Ertugrul S?ylemez wrote: >> Daniel Fischer <daniel.is.fisc...@googlemail.com> wrote: >>>>> Well, thanks, so far I have tried wheel only and Sieve of >>>>> Eratosthenes only approaches. They both failed. When the numbers >>>>> is between 999900000 and 1000000000, they can take more than a >>>>> hour on my laptop. >>> >>> They shouldn't. But you have to use the right data structures. >>> >>> For a prime sieve, you need unboxed mutable arrays if you want it to >>> be fast. You can use STUArrays from Data.Array.ST or mutable unboxed >>> vectors from Data.Vector.Mutable.Unboxed. >> >> That's what I've used. You find the code on hpaste [1]. It's a >> carefully optimized Sieve of Eratosthenes and needs around 20 seconds >> for the primes up to 10^9. See the refinement in the annotation, which >> I've added just now. Before that it took around 35 seconds. >> >> I considered that to be "too slow". > > Right you are. Unless you need the primes to perform some time-consuming > calculations with them after sieving, that is too slow. > > But let us compare it with a similar C implementation. > >> >> [1] http://hpaste.org/56866 > > The interesting parts are: > ======================== > soeST :: forall s. Int -> ST s (STUArray s Int Bool) > soeST n = do > arr <- newArray (0, n) True > mapM_ (\i -> writeArray arr i False) [0, 1] > let n2 = floor (sqrt (fromIntegral n)) > > let loop :: Int -> ST s () > loop i | i > n2 = return () > loop i = do > b <- readArray arr i > > let reset :: Int -> ST s () > reset j | j > n = return () > reset j = writeArray arr j False >> reset (j + i) > > when b (reset (i*i)) > > loop (succ i) > > loop 2 > return arr > > soeCount :: Int -> Int > soeCount = length . filter id . elems . soeA > ======================== > > With ghc-7.2.2 and -O2, that took 16.8 seconds here to count the 50847534 > primes up to 10^9. That's in the same ballpark as your time, maybe a bit > faster, depends on what 'around 20 seconds' means exactly. > > Let's be unfriendly first: > > Arracc.h: > ============ > #include <stdint.h> > > inline int readArray(uint64_t *arr, int n, int i); > inline void writeArray(uint64_t *arr, int n, int i, int b); > ============ > > Arracc.c: > ============ > #include <stdlib.h> > #include "Arracc.h" > > inline int readArray(uint64_t *arr, int n, int i){ > if (i < 0 || i > n){ > perror("Error in array index\n"); > exit(EXIT_FAILURE); > } > return (arr[i >> 6] >> (i & 63)) & 1; > } > > inline void writeArray(uint64_t *arr, int n, int i, int b){ > if (i < 0 || i > n){ > perror("Error in array index\n"); > exit(EXIT_FAILURE); > } > if (b) { > arr[i >> 6] |= (1ull << (i&63)); > } else { > arr[i >> 6] &= ~(1ull << (i&63)); > } > } > ============ > > main.c: > ============ > #include <stdlib.h> > #include <stdio.h> > #include <stdint.h> > #include <math.h> > #include "Arracc.h" > > uint64_t *soeA(int n); > int pCount(uint64_t *arr, int n); > > int main(int argc, char *argv[]){ > int limit = (argc > 1) ? strtoul(argv[1],NULL,0) : 100000000; > uint64_t *sieve = soeA(limit); > printf("%d\n",pCount(sieve,limit)); > free(sieve); > return EXIT_SUCCESS; > } > > uint64_t *soeA(int n){ > int s = (n >> 6)+1; > uint64_t *arr = malloc(s*sizeof *arr); > if (arr == NULL){ > perror("Allocation failed\n"); > exit(EXIT_FAILURE); > } > int i, j, n2 = (int)sqrt(n); > for(i = 0; i < s; ++i){ > arr[i] = -1; > } > writeArray(arr,n,0,0); > writeArray(arr,n,1,0); > for(i = 2; i <= n2; ++i){ > if (readArray(arr,n,i)){ > for(j = i*i; j <= n; j += i){ > writeArray(arr,n,j,0); > } > } > } > return arr; > } > > int pCount(uint64_t *arr, int n){ > int i, count = 0; > for(i = 0; i <= n; ++i){ > if (readArray(arr,n,i)){ > ++count; > } > } > return count; > } > ============ > > > $ gcc -c -O3 Arracc.c > $ gcc -c -O3 main.c > $ gcc -O3 main.o Arracc.o -lm > > Takes 18.3 seconds. Oops, slower than the Haskell programme. > > Okay, ghc does some cross-module inlining and that enables further > optimisations which we denied gcc here, so recompile everything with -flto > additionally. That brings the C version down to 12.8 seconds. > > Much better, but still not overwhelming. gcc can do a little better with > everything in one file, 12.72 seconds. > > We can squeeze out a little more by omitting the bounds checks for array > indexing, 12.63 seconds. > > But if we do the same for our Haskell programme, replace > readArray/writeArray with unsafeRead/unsafeWrite, that becomes faster too, > 14.3 seconds. > > I have to admit that I don't know why the bounds-checking costs very little > in C and a substantial amount in Haskell, but hey, who refuses free > optimisations. > > Lesson 1: Omit pointless array bounds checks. > An array bounds check is pointless if you have just checked the validity of > the index on the line above. > > Where are we? > > We have two not overwhelmingly fast programmes, Haskell (ghc-7.2.2) about > 13% slower than C (gcc-4.5.1). > > Not a bad result for ghc, but the algorithm is less than optimal. > > What's the deal? > > We cross off multiples of primes 2549489372 times, and we need (10^9+1)/8 > bytes of memory for the sieve. > > Both numbers are higher than is good for us. Since the sieve is so large > and we cross off the multiples of each prime sequentially until we reach > the sieve limit, we have lots of cache misses. Bad for performance. > And each crossing-off takes a couple of clock cycles, > > arr[i >> 6] &= ~(1ull << (i&63)); > > shift index, fetch value, (index & 63), shift 1, complement, and with > value, store value. > > I'm no CPU expert, so I don't know how many cycles that takes, but even > though some of the operations can be done in parallel on modern CPUs, it's > still several cycles. > > Lesson 2: Avoid duplicate work and redundant data. > It's easy to separate even and odd numbers, there aren't many even primes, > and it's unnecessary to mark even numbers as multiples of odd primes. > > Separating the crossing-off of even and odd numbers, crossing off only odd > multiples of odd primes, reduces the number of crossings-off by almost 40% > and the running time for the C programme to 8.63 seconds (I haven't done > that for the Haskell programme). > > If we go the small step further and eliminate the even numbers from the > sieve completely, we reduce the required memory by half and the crossings- > off by almost 60% (the fraction of eliminated crossings-off slowly > decreases to 50% for higher limits, but as far as today's RAM takes us, it > remains close to 60%). > > At the cost of very slightly more complicated code, we can thus reduce the > running time to 6.06 seconds (C) resp. 6.49 seconds (Haskell) (about 7% > slower than C, not bad at all). > > That's a pretty good start, for some purposes it's already good enough. > > If we need more, the next step is to eliminate multiples of 3 from the > sieve. That reduces the memory requirements by a further factor of 2/3, and > the number of crossings-off by a further (mumble, didn't do an exact > calculation, educated guess) roughly 40% (that fraction decreases to 1/3 > for higher limits, but again very slowly). > > The code becomes a bit more complicated again - more than in the previous > step, but still fairly straightforward. > > The running time reduces by another factor of 0.65-0.7 or so. We'd be in > the region of 4-5 seconds then. > > One can eliminate the multiples of further small primes, for each prime the > additional code complexity increases and the reduction in work/running time > decreases. At some point, the slowdown from the more complex code > annihilates the gain from the reduced work. I stopped after eliminating > multiples of 5, 7 might be worth it, but I'm not convinced. > > Using a segmented sieve provides better locality at the cost of more > complex code. If the sieve is small enough to fit in the L2 cache and large > enough that some significant work can be done per segment, it's a net gain. > > tl,dr: The naive algorithm we started from is too slow for higher limits, > to get goodish performance, it has to be tweaked heavily. But Haskell plays > in the same league as C (at least the C I've written). > >> >>>> I haven't tried it, but an equivalent C implementation can easily >>>> compute the first 10^9 prime numbers within seconds. >>> >>> You mean the primes up to 10^9 here? >> >> Yes, sorry. And I was referring to the Sieve of Atkin there, but you >> are right. That one is only slightly faster. > > Well, D.J. Bernstein's primegen is much faster. With the sieve size adapted > to my L1 cache, it counts the primes to 10^9 in 0.68 seconds. > But it's only heavily optimised for primes to 2^32. Above that, my sieve > catches up (and overtakes it around 5*10^11, yay). > >> >> >> Greets, >> Ertugrul > > Cheers, > Daniel > > _______________________________________________ > Beginners mailing list > Beginners@haskell.org > http://www.haskell.org/mailman/listinfo/beginners Best regards, Zhi-Qiang Lei zhiqiang....@gmail.com ------------------------------ Message: 2 Date: Sun, 5 Feb 2012 09:19:02 +0100 From: Thomas Engel <thomas.eng...@gmx.net> Subject: [Haskell-beginners] numerical integration over lists To: beginners@haskell.org Message-ID: <20120205081902.GA3461@siduxbox> Content-Type: text/plain; charset=us-ascii Hello, I have two list (one with x values, the other with y values) What I want to do is a numercial integration according the following formula: Result x2 = Result x1 + ((y(x1) + y(x2))/2) * (x2 -x1) and put the result in another list. below my first try: integriereListe::(a)->(a)->(a) integriereListe [][] = [0.0] integriereListe (x:xs) (y:ys) = ((y - y2) /2) * (x2 -x) where x2 = head xs y2 = head ys but I got the failure Couldn't match type `[t0]' with `[[t0]]' In the pattern: y : ys In an equation for `integriereListe': integriereListe (x : xs) (y : ys) = ((y - y2) / 2) * (x2 - x) where x2 = head xs y2 = head ys another problem is how get the result x1 (see above) any hints are welcome. Thomas ------------------------------ Message: 3 Date: Sun, 5 Feb 2012 18:33:35 +0800 From: Zhi-Qiang Lei <zhiqiang....@gmail.com> Subject: Re: [Haskell-beginners] Is foldr slow? To: Haskell Beginer <beginners@haskell.org> Message-ID: <5d37e09c-9d0e-4077-81fa-723a1ef75...@gmail.com> Content-Type: text/plain; charset=iso-8859-1 Thank you very much. You are right. foldl performs much better here. I thought foldr always has a performance advantage due to its tail recursion. On Feb 3, 2012, at 5:36 AM, Chadda? Fouch? wrote: > > let go a b = b `minus` (multiplies a c) > foldr go cs' (p:p':ps) ==> foldr go cs' (p':ps) `minus` multiplies p c > ==> (minus needs his first argument to start removing stuff) (foldr go > cs' ps `minus` multiplies p' c) `minus` multiplies p c > And so on and so forth... > In other words, contrary to your first version, this function must > develop the entire list before making its first suppression... > > Your version rather correspond to a foldl (be careful of strictness, > using foldl it's pretty easy to get big thunks). > > Foldr is very useful for functions that are lazy in their second > argument like (||) , (&&), (:) or others but if the function is > strict in its second argument like yours (strict in b)... > > -- > Jeda? Best regards, Zhi-Qiang Lei zhiqiang....@gmail.com ------------------------------ _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners End of Beginners Digest, Vol 44, Issue 6 ****************************************