Thomas Conway wrote: > On 6/22/07, Duncan Coutts <[EMAIL PROTECTED]> wrote: >> You might find that lazy IO is helpful in this case. The primitive that >> implements lazy IO is unsafeInterleaveIO :: IO a -> IO a > > Personally, unsafeInterleaveIO is so horribly evil, that even just > having typed the name, I'll have to put the keyboard through the > dishwasher (see http://www.coudal.com/keywasher.php).
:D :D Finally someone who fully understands the true meaning of the prefix "unsafe" ;) >> Note that using a Map will probably not help since it needs to >> read all the keys to be able to construct it so that'd pull >> in all the data from disk. > > Well, in the case I'm dealing with, the map can contain the current > key from each postings vector, and the closure for reading the > remainder of the vector. E.g. Map Key ([IO (Maybe Key)]). In any case, you have to store as many keys as you have lists to sort, but lazy mergesort will not hold on more than (length xs + 1) keys in memory at a single moment in time and only force one new key per retrieval. No lingering intermediate lists :) In this situation, unsafeInterleaveIO is an easy way to carry this behavior over to the IO-case: type Reader t = IO (Maybe t) type Writer t = t -> IO () readList :: Reader t -> IO [t] readList m = unsafeInterleaveIO $ do mx <- m case mx of Just x -> liftM (x:) $ readList m Nothing -> return [] mergesortIO :: Ord t => [Reader t] -> Writer t -> IO () mergesortIO xs f = do ys <- mapM readList xs mapM_ f $ mergesort ys Here, readList creates only as many list elements as you demand, similarly to getContents. Of course, it has the same problem as getContents, namely that you can accidentally close the file before having read all data. But this is applies to any on-demand approach be it with IO or without. Also, you can make the heap in mergesort explicit and obtain something similar to your current approach with Data.Map. The observation is that while mergesort does create a heap, its shape does not change and is determined solely by (length xs). -- convenient invariant: -- the smaller element comes from the left child data Ord b => Heap m b = Leaf m b | Branch b (Tree a b) (Tree a b) -- smart constructor branch :: Ord b => Tree m b -> Tree m b -> Tree m b branch x y | gx <= gy = Branch gx x y | otherwise = Branch gy y x where (gx,gy) = (getMin x, getMin y) -- fromList is the only way to "insert" elements into a heap fromList :: Ord b => [(m,b)] -> Heap m b fromList = foldtree1 branch . map (uncurry Leaf) getMin :: Heap m b -> b getMin (Leaf _ b) = b getMin (Branch b _ _ ) = b deleteMin :: Heap (Reader b) b -> IO (Maybe (Heap (Reader b) b)) deleteMin (Leaf m _) = m >>= return . fmap (Leaf m) deleteMin (Branch _ x y) = do mx' <- deleteMin x return . Just $ case mx' of Just x' -> branch x' y Nothing -> y mergesortIO :: Ord t => [Reader t] -> Writer t -> IO () mergesortIO xs f = ... > Also, I need to support concurrent querying and updates, > and trying to manage the locking is quite hard enough as it is, > without trying to keep track of which postings vectors have closures > pointing to them! I guess you have considered Software Transactional Memory for atomic operations? http://research.microsoft.com/~simonpj/papers/stm/index.htm Also, write-once-read-many data structures (like lazy evaluation uses them all the time) are probably very easy to get locked correctly. Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe