Am Montag 04 Januar 2010 18:08:42 schrieb Will Ness: > Daniel Fischer <daniel.is.fischer <at> web.de> writes: > > Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness: > > > The quesion of a memory blowup with the treefolding merge still > > > remains. For some reason using its second copy for a feeder doesn't > > > reduce the memory (as reported by standalone compiled program, GHCi > > > reported values are useless) - it causes it to increase twice..... > > > > I have a partial solution. The big problem is that the feeder holds on to > > the beginning of comps while the main runner holds on to a later part. > > Thus the entire segment between these two points must be in memory. > > > > So have two lists of composites (yeah, you know that, but it didn't work > > so far). > > > > But you have to force the compiler not to share them: enter -fno-cse. > > The attached code does that (I've also expanded the wheel), it reduces > > the memory requirements much (a small part is due to the larger wheel, a > > factor of ~5 due to the non-sharing). > > I don't understand. What is there to be shared? Each multiples list is > consumed only at one point; there's nothing to be shared. Do you mean the > compiler still hangs on to them? If so, why??
Okay, let's look at the code. Your Primes.hs says {-# SPECIALIZE primes :: () -> [Int] #-} {-# SPECIALIZE primes :: () -> [Integer] #-} primes :: Integral a => () -> [a] primes () = 2:3:5:7:primes' where primes' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps (comps,_) = tfold mergeSP (pairwise mergeSP mults) mults = map (\p-> fromList $ map (p*) $ rollFrom p) $ primes' fromList (x:xs) = ([x],xs) What's going on evaluating primes'? First, we know primes' is 11 : 13 : more. Then we can start looking at 'more'. For that, we need rollFrom 11 `minus` comps. rollFrom 11 is easy. What about comps? We need the start of primes' for that. Cool, we already know the first two elements, start tfolding, we know comps is 121:143:169:notYetKnown So the rollFrom 11 produces happily, minus doesn't veto anything for a while, then says no to 121, 143, 169. Now rollFrom 11 says "look at 173". Fine, what's next in comps? To find that out, we need the multiples of 17 and 19 (and 23 and 29, but those can still be deferred for a while). That gives ([289,323,361],whatever), then the mergeSP with the merged multiples of 11 and 13 produces (187:209:221:...) for notYetKnown. Everything's going on fine, but the problem is, when primes' produces primes in the region of n, it needs the comps in that region. Those contain the multiples of primes in the region of sqrt n, thus the small primes cannot yet be released. Bottom line, to produce p, all the primes from about sqrt p to p must be in memory. Oops. Enter the feeder. Say primes :: Integral a => () -> [a] primes () = 2:3:5:7:primes' where primes' = (rollFrom 11) `minus` comps primes'' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps (comps,_) = tfold mergeSP (pairwise mergeSP mults) mults = map (\p-> fromList $ map (p*) $ rollFrom p) $ primes'' fromList (x:xs) = ([x],xs) Now primes'', the feeder of composites advances much more slowly and for primes' to produce p, primes'' must produce the next prime after sqrt p, thus holds on to all primes between sqrt (sqrt p) and sqrt p - not much of a problem. primes' doesn't hold on to smaller primes, great. But! primes' needs the part of comps around p. primes'' needs the part of comps around sqrt p. That means we must keep the part of the list of composites from sqrt p to p in memory. Even after removing the multiples of 2, 3, 5 and 7, it doesn't take long until there are far more composites between sqrt p and p than primes. Double oops. So we must make sure that the list of composites that primes' consumes is not the same as that which primes'' consumes. You can try primes :: Integral a => () -> [a] primes () = 2:3:5:7:primes' where primes' = (rollFrom 11) `minus` comps primes'' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps2 (comps,_) = tfold mergeSP (pairwise mergeSP mults) mults = map (\p-> fromList $ map (p*) $ rollFrom p) $ primes'' (comps2,_) = tfold mergeSP (pairwise mergeSP mults2) mults2 = map (\p-> fromList $ map (p*) $ rollFrom p) $ primes'' fromList (x:xs) = ([x],xs) If you compile without optimisations, it works. It just is rather slow. Unfortunately, the optimiser sees that comps2 and comps are the same and thinks, "why should I produce it twice if they're the same?" - whoops, back to square two. Tell the compiler that you do *not* want it shared via -fno-cse and it isn't shared, you really have two distinct lists of composites, happiness ensues - almost. > > I used the switch; it didn't help at all. The only thing I can see is > different is that all my interim data which I named with inner vars you > moved out to the top level as functions. Is that what did the trick? Not really, the above works too, with inner local bindings. If you introduce separate names for the lists of multiples and for the lists of composites, -fno-cse ought to do the trick. Maybe it would even work with one list of multiples. Can I see what you did that -fno-cse didn't manage to separate the lists? > What would be the reason to hang on to the already consumed data that is > inaccessible to any active consumer? No reason. And that's not what happened. We always had two consumers gnawing on the same list, one proceeding fast, the other slow. The part the slow consumer was done with could be dropped (and probably was) - the part between the slow and the fast: Not. > Why not make the forgetful behaviour the norm - > especially where remembering is pointless?? It is. > > > It still uses much more memory than the PQ, and while the PQ's memory > > requirements grow very slowly, the tree-fold merge's still grow rather > > fast (don't go much beyond the 10,000,000th prime), I'm not sure why. > > You did it! It's now 7M for 1,000,000th prime, instead of 52M before. > Making the pattern lazy in mergeSP was probably an important fix too. :) Did I forget to remove them? (checks) Yes. No, I put them in without thinking, then realised "Hey, stupid, where-bindings *are* already lazy!" But somehow I have forgotten to remove them. Since let- and where-bindings are lazy, all those could do is make the compiler say "Sheesh". > > Unfortunately it grows, as you've said - 23MB for 2 mln. :| And I've found out why. Change the definition of tfold to tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f xs and memory stays low (things are going much slower, though). By always doing a (pairwise f xs), you're getting the composites faster, but at the expense of pretty huge thunks (I think, if the merging isn't lazy enough, you get some pretty long lists). You can make a compromise by using the above tfold (which is no longer a tree-fold) and grouping (and merging) the multiples in a slower-growing manner, e.g. compos ps = fst (tfold mergeSP $ nwise 1 mergeSP (pairwise mergeSP (multip ps))) tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f xs nwise k f xs = let (ys,zs) = splitAt k xs in rfold f ys : nwise (k+1) f zs rfold f [x] = x rfold f (x:xs) = x `f` rfold f xs memory still grows, but much slower, in my tests, due to the much smaller GC time, it's a bit faster than the version with the original tfold. > > PQ stays at just 2MB. > No doubling the size of merge trees, no big thunks, just occasionally add a filter to the PQ. Memory requirements are O(pi(sqrt n)) [off the top of my head, may be wrong again],
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe