[Haskell-cafe] Re: Lock-Free Data Structures using STMs in Haskell
Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > Pete Kazmier wrote: > >> data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM { >> [...] >> sa :: Array Int (TVar e) >> } >> >> It's unclear to me why the Array's elements must be wrapped in TVars. > > To allow them to be modified. You can't otherwise modify the elements > of an array without going into the ST monad. Thanks! I forgot about the whole immutable thing :-) Haven't used arrays yet while learning Haskell! ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Lock-Free Data Structures using STMs in Haskell
I recently read the STM paper on lock-free data structures [1] which I found very informative in my quest to learn how to use STM. However, there are a few things I do not fully understand and was hoping someone might be able to explain further. In the STM version of the ArrayBlockingQueue, the following type is defined: data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM { shead :: TVar Int, stail :: TVar Int, sused :: TVar Int, slen :: Int, sa :: Array Int (TVar e) } It's unclear to me why the Array's elements must be wrapped in TVars. Why aren't the TVars on shead, stail, and sused sufficient? Here is the only function that reads from the queue: readHeadElementSTM :: ArrayBlockingQueueSTM e -> Bool -> Bool -> STM (Maybe e) readHeadElementSTM abq remove block = do u <- readTVar (sused abq) if u == 0 then if block then retry else return Nothing else do h <- readTVar (ihead abq) let tv = sa abq ! h -- Why are the array elements wrapped in TVars? e <- readTVar tv if remove then do let len = slen abq let newh = h `mod` len writeTVar (shead abq) $! newh writeTVar (sused abq) $! (u-1) else return () return (Just e) It is not immediately obvious to me why the elements need to be wrapped in TVars. Could someone help elaborate? The other question is in regards to section 2 where STM is introduced. The authors define the following: decT :: TVar Int -> IO () decT v = atomically (do x <- readTVar v if x == 0 then retry else return () writeTVar v (x-1)) And then go on to show how easy it is to compose STM types with this function: decPair v1 v1 :: TVar Int -> TVar Int -> IO () decPair v1 v2 = atomically (decT v1 `orElse` decT v2) Will this actually compile? I was under the impression that 'orElse' could only combine STM types, not IO () types. Thank you, Pete [1] Anthony Discolo, Tim Harris, Simon Marlow, Simon Peyton Jones, and Satnam Singh. Lock-free data structures using STMs in Haskell. In Eighth International Symposium on Functional and Logic Programming (FLOPS.06), April 2006. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Sneaking haskell in the workplace -- cleaning csv files
"Brandon S. Allbery KF8NH" <[EMAIL PROTECTED]> writes: > On Jun 15, 2007, at 18:37 , Jason Dagit wrote: > >> I love to see people using Haskell, especially professionally, but I >> have to wonder if the real tool for this job is sed? :-) > > Actually, while sed could do that, it'd be a nightmare. You really > want a parser to deal with general CSV like this, and while you can > write parsers in sed, you *really* don't want to. :) sed ':a /,$/!{N;s/\n//;ba}' somefile.csv ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Yaham - Yet Another HAskell Mode for GNU Emacs
Ian Zimmerman <[EMAIL PROTECTED]> writes: > It has a different focus than the Moss haskell-mode that's normally > used nowadays. Yaham strives, first and foremost, to integrate well > with the rest of Emacs and respect the Emacs ecosystem. Could you elaborate on this? Perhaps an itemized list of things that are different? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Network.HTTP+ByteStrings Interface--Or: How to shepherd handles and go with the flow at the same time?
As a newbie to Haskell, I found your thorough analysis very interesting. Thanks for the great read! I have a few questions regarding some of your comments, see below: Jules Bean <[EMAIL PROTECTED]> writes: > E,F. Progressive GET > pSynGET :: URL -> ((Bool,ByteString) -> IO ()) -> IO () > pAsynGET :: URL -> ((Bool,ByteString) -> IO ()) -> IO (MVar ()) > > (This is a particular simple case of Oleg's iteratees, I > think) Download the data at whatever speed is convenient. As data > arrives, feed it to the 'callback' provided. The ByteString is the > new chunk of data, the 'Bool' is just supposed to indicate whether > or not this is the final chunk. > Incidentally there are more complex options than (Bool,Bytestring) > -> IO (). A simple and obvious change is to add a return > value. Another is a 'state monad by hand', as in (Bool,Bytestring) > -> s -> s, and change the final return value of the type to IO s, > which allows the callback to accumulate summary information and > still be written as pure code. I want to be sure that I understand the implications of the callback function returning an IO action as originally proposed versus it being a pure function. It would seem to me that if it were a pure callback the usefulness would be limited as I would not be able to take the data read from the network and immediately write it out to a file. Is this correct? And if the above is correct, is there a way to define the callback such that one does not have to hardcode the IO monad in the return type so you can have the best of both worlds? > Other options allow the 'callback' to request early termination, > by layering in an 'Either' type in there. I believe the ability to request early termination is important, and was one of the nice features of Oleg's left-fold enumerators. It would be a shame if the API did not offer this capability. > Another more sophisticated option, I think, is the higher rank > > MonadTrans t => URL -> > ((forall m. Monad m) => (Bool,ByteString) -> t m) > -> t IO () > > ...which, unless I've made a mistake, allows you to write in 'any > monad which can be expressed as a transformer', by transforming it > over IO, but still contains the implicit promise that the > 'callback' does no IO. For example t = StateT reduces to the > earlier s -> s example, in effect, with a slightly different data > layout. I don't fully understand this, but would this prevent one from calling IO actions as it was receiving the chunks in the callback (such as writing it to a file immediately)? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Stack overflow with my Trie implementation
I've modified my Norvig spelling corrector to use a trie instead of Data.Map in the hopes of improving performance. Plus, this is fun and a great learning exercise for me. Unfortunately, when I load my trie with a large amount of data, I get a stack overflow. It's unclear to me why this is happening. I specifically use foldl' to avoid this situation when building my trie. Could someone shed some light on the situation for me? Here is the code: > module Main where > > import Data.List (foldl') > import Data.Maybe (maybe, fromMaybe) > import Prelude hiding (lookup) > import qualified Data.Map as M > > data Trie a = T (Maybe a) (M.Map Char (Trie a)) deriving (Show) > > main = do > -- big.txt is a large file of words: http://www.norvig.com/big.txt > c <- readFile "big.txt" > let freqTrie = foldl' incWordCount empty (words c) > print $ lookup "evening" freqTrie > where > incWordCount m w = insertWith (+) w 1 m > > empty :: Trie a > empty = T Nothing M.empty > > lookup :: String -> Trie a -> Maybe a > lookup ([]) (T Nothing m) = Nothing > lookup ([]) (T (Just v) m) = return v > lookup (k:ks) (T _m) = case M.lookup k m of > Nothing -> Nothing > Just trie -> lookup ks trie > > findWithDefault :: a -> String -> Trie a -> a > findWithDefault v k t = fromMaybe v (lookup k t) > > member :: String -> Trie a -> Bool > member k t = maybe False (const True) (lookup k t) > > insertWith :: (a -> a -> a) -> String -> a -> Trie a -> Trie a > insertWith fn ([]) v (T Nothing m) = T (Just v) m > insertWith fn ([]) v (T (Just v') m) = T (Just $ fn v v') m > insertWith fn (k:ks) v (T mvm) = T mv (M.insertWith const k newtrie m) > where > oldtrie = M.findWithDefault empty k m > newtrie = insertWith fn ks v oldtrie > > {-- I also tried to use this line instead of the one above to see > if this had any impact. Unfortunately, I obtained the same > results. > > insertWith fn ([]) v (T (Just v') m) = let x = fn v v' in seq x T (Just x) m > --} ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector
Derek Elkins <[EMAIL PROTECTED]> writes: >> After reading Bryan's post, I switched my right fold to a strict left >> fold and almost tripled my original speed. Could someone provide some >> guidelines on when to use each? I thought foldr should be used when >> building some large data structure such as the map I build. >> Bryan, out of curiosity, is a non bytestring version of your code >> faster? > > http://www.haskell.org/hawiki/StackOverflow I read the article and understand it, but I am having a hard time applying that specifically to my use of foldr. Here is how I was using foldr: type WordFreq = M.Map B.ByteString Int train:: [B.ByteString] -> WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m So is 'incWordCount' strict in its second argument? I'm still not sure exactly what that means. According to the wiki page, if it is strict in the second argument, I should have used foldl' instead of foldr. Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector
Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > After this switch, I found that spellchecking one word still took 4x > as long in Haskell as Norvig's Python program. Since I was checking > only one word in each case, essentially all of the runtime was taken > up by building the word frequency map. > > train = foldl' updateMap M.empty . map lower . mkWords > where updateMap model word = M.insertWith' (+) word 1 model > mkWords = filter (not . B.null) . X.splitWith isNotAlpha > lower !s = X.map toLower s > isNotAlpha !c = c < 0x41 || (c > 0x5a && c < 0x61) || c > 0x7a > toLower !c | c >= 0x41 && c <= 0x5a = c + 0x20 > | otherwise = c After reading Bryan's post, I switched my right fold to a strict left fold and almost tripled my original speed. Could someone provide some guidelines on when to use each? I thought foldr should be used when building some large data structure such as the map I build. Bryan, out of curiosity, is a non bytestring version of your code faster? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector
Ketil Malde <[EMAIL PROTECTED]> writes: > On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote: >> Pete Kazmier <[EMAIL PROTECTED]> writes: >> >> > I'd love to see other Haskell implementations as well if anyone has a >> > few moments to spare. Admittedly, it took me several hours to get my >> > version working, but I'm a Haskell newbie. Unfortunately, I think it >> > runs as slow as it took me to write it! > > Hm - nobody suggested using ByteStrings yet? String is notoriously > wasteful, and replacing it with ByteString normally gives a quite > significant speedup. I actually have a ByteString version but it runs much slower. This part of the code is where all of the time is spent in the ByteString version: type WordFreq = M.Map B.ByteString Int train:: [B.ByteString] -> WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m > Worse - and this is true for ByteStrings, too - String comparisons are > O(n), which means lookups in Sets and Maps are expensive. A trie (i.e, > a search tree where each internal node corresponds to a word prefix, and > has one branch per letter in the alphabet) will give you lookup that > scales with word size (and possibly alphabet size). Right. My first version was just a direct translation of Norvig's code with an emphasis on trying to keep the complexity and size of code to a minimum. > Instead of generating the (huge) list of misspelled words, you could > calculate edit distance to each correctly spelled word? With a bound on > allowable mistakes, this is a linear time operation using the standard > dynamic programming algorithm. Could you provide additional information on this "standard dynamic programming algorithm?" I'm not familiar with dynamic programming. Thanks! Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector
Pete Kazmier <[EMAIL PROTECTED]> writes: > I'd love to see other Haskell implementations as well if anyone has a > few moments to spare. Admittedly, it took me several hours to get my > version working, but I'm a Haskell newbie. Unfortunately, I think it > runs as slow as it took me to write it! There is definitely something > wrong with it, a memory leak, because I can't correct more than a few > words without a great deal of memory being consumed. As monochrom pointed out on #haskell, I am using 'interact' incorrectly. For some reason I thought 'interact' applied its argument to each line of the input. I've replaced it as follows: interact $ unlines . map (show . (id &&& correct)) . lines The program is still terribly slow due to my use of lists. Is there a better way to write 'edits1'? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Haskell version of Norvig's Python Spelling Corrector
Recently I read an interesting article by Peter Norvig[1] on how to write a spelling corrector in 21-lines of Python. I wanted to try and implement it in Haskell. My implementation is terribly slow and I was hoping for some tips on how to improve it and make it idiomatic. I'd love to see other Haskell implementations as well if anyone has a few moments to spare. Admittedly, it took me several hours to get my version working, but I'm a Haskell newbie. Unfortunately, I think it runs as slow as it took me to write it! There is definitely something wrong with it, a memory leak, because I can't correct more than a few words without a great deal of memory being consumed. Thanks, Pete [1] http://norvig.com/spell-correct.html module Main where import Control.Arrow import Data.Char (toLower, isPunctuation) import Data.List (maximumBy) import qualified Data.Set as S import qualified Data.Map as M type WordSet= S.Set String type WordFreq = M.Map String Int main :: IO () main = do -- 'holmes.txt' can be found here: -- http://norvig.com/holmes.txt -- We should train it with a larger corpus, but my program is -- is already slow enough. c <- readFile "holmes.txt" let correct = train . tokens $ c interact $ (++"\n") . show . (id &&& correct) . init -- Returns a list of words lowercased and stripped of punctuation -- at the end of the word. tokens :: String -> [String] tokens = map (lower . nopunc) . words where lower = map toLower nopunc = strip isPunctuation strip p = reverse . dropWhile p . reverse -- Returns a closure (equivalent to the 'correct' function in the -- Python implementation). train:: [String] -> (String -> String) train words word = maximumBy freq . S.toList . head $ filter (not . S.null) [ known (S.singleton word), known (edits1 word), known (edits2 word), S.singleton word ] where freq c c'= compare (findfreq c) (findfreq c') findfreq c = M.findWithDefault 1 c frequencyMap known= S.filter (`M.member` frequencyMap) frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m edits1 :: String -> WordSet edits1 word = S.fromList $ concat [[ t i ++ d (i+1)| i <- range n], -- deletion [ t i ++ [word!!(i+1)] ++ [word!!i] ++ d (i+2) | i <- range (n-1) ], -- transposition [ t i ++ [c] ++ d (i+1) | i <- range n, c <- alphabet ], -- alteration [ t i ++ [c] ++ d i | i <- range (n+1), c <- alphabet ]] -- insertion where n= length word t i = take i word d i = drop i word range x = [ 0..(x-1) ] alphabet = ['a'..'z'] edits2 :: String -> WordSet edits2 = S.unions . S.toList . S.map edits1 . edits1 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Tutorial on Haskell
"Evan Laforge" <[EMAIL PROTECTED]> writes: > It illustrates a few nice things about haskell: laziness for the > recursive defs and easy backtracking, low syntax overhead and custom > operators for DSLs, composability, etc. Although that is true, I somehow feel that showing a perl, ruby, or python programmer an alternate approach to regexps, a technique firmly ingrained into the roots of these languages, will not garner much interest in Haskell. I know this is the case for me. In fact, it was always a large negative for me that Haskell/GHC never had decent builtin support for regexps until recently (6.6). From a practical point of view, the tasks that I do frequently involve the use of regexps (for better or worse). Again, I'm not an academic, just an everyday python programmer trying to assist me in my day job. Upon thinking about this subject further, I think it would be very important that Simon somehow incorporates at least one use of the new regexp library. The target audience would by more likely to consider Haskell if it contains they're beloved tool of choice. Later, they can discover the elegance of parsec if needed. -Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Tutorial on Haskell
Simon Peyton-Jones <[EMAIL PROTECTED]> writes: > My guess is that they'll be Linux/Perl/Ruby types, and they'll be > practitioners rather than pointy-headed academics. > Suggest concrete examples of programs that are > * small > * useful > * demonstrate Haskell's power > * preferably something that might be a bit > tricky in another language > But there must be lots of others. As one of those python/ruby types trying to learn Haskell for the past year, here are my suggestions for small examples: - Tom Moertel's Haskell Port Scanner Why? Demonstrates concurrent haskell in a small amount of lines http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell - A web-server example of some sort that interfaces with a database and uses some interesting HTML combinator library. - Building a simple unit testing framework is always a good example (even though they already exist), and then introducing quickcheck perhaps. - A program to concurrently verify the links on an HTML page recursively. I'm sure there are lots of interesting idioms and techniques that could be used while keeping the code small and elegant. - Perhaps a Haskell version of Norvig's 20-line Python Spell Checker: Why? Maybe a Haskell version could be shorter and more elegant? http://norvig.com/spell-correct.html - Tom Moertel's Directory Tree Printing in Haskell: Why? Demonstrates all sorts of introductory techniques http://blog.moertel.com/articles/2007/03/28/directory-tree-printing-in-haskell-part-three-lazy-i-o Some thoughts on other topics suggested by others: - Parsec is not that interesting for those coming from perl, ruby, or python as they rely on regular expressions for everything and just expect that they are part of the language. The thought of writing one's own parser is not as "cool" as most Haskellers believe it is, regardless of how interesting the parsec library is. - STM may be too complex of a subject for an intro to Haskell tutorial. There are just too many concepts in there that may overwhelm some beginners. - Don's post on shell scripting was very interesting, but I'm still having a hard time understanding some parts of it, the error handling part, and I've been playing with Haskell on and off for the past year (I'm also a slow learner and not an academic). - Don's post on simple UNIX utilities was also quite nice. I believe he also wrote a simple IRC bot example that would prove interesting, can't seem to find the link at the moment though. Just my thoughts as a newbie desiring a book on how to use Haskell in a practical manner (such as Practical Common Lisp book). -Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: flip fix and iterate
"Claus Reinke" <[EMAIL PROTECTED]> writes: >> I won't try to understand fix just yet, but I'm still confused by >> the type of fix: >> fix :: (a -> a) -> a >> It appears to me that it takes a function as an argument, and that >> function takes a single argument. So how are you passing fix an >> anonymous function taking 2 arguments? Sorry if I have beaten this >> horse to death, but my pea-sized brain is working overtime here. > > fix takes a function as an argument, and that function takes a > single argument. that function also returns something of the same > type as its single argument. [snip] > and suddenly, fix does have two parameters, which flip can flip!-) > > no magic, just technology sufficiently advanced to be > indistinguishable from it: a function of one parameter, which > returns a function of one parameter, is a function of more than one > parameter. > > at which point this particular fixed-point combinator puts its > recursive unfoldings to rest for tonight. Claus, Thank you for the detailed explanation. I think I understand now! To be sure, I'll reread your post several times over the next few days. Thanks again, this was very helpful. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: fix
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > As others have pointed out, fix is polymorphic, so "a" can stand for any > type, including "(b -> c)". Removing redundant parentheses, this means > fix can directly specialise to: > >> fix :: ((b -> c) -> b -> c) -> b -> c I understand now. I think part of my problem was that I was trying to grasp one too many new things all at once. This makes perfect sense. Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: flip fix and iterate
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > Pete Kazmier: >> I understand the intent of this code, but I am having a hard time >> understanding the implementation, specifically the combination of >> 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how >> one can call 'flip' on a function that takes one argument. > > I threw that in there because I figured you were up for another > challenge. :-) Haskell has a way of making one feel dumb. This is by far the most challenging programming language I've ever used. > It took me ages to get some clue about how to use fix, quite apart from > combining it with flip. The concept of passing the output of a function > as one of its parameters ("tying the knot") can be difficult to accept, > particularly if you haven't studied lambda calculus. This is a bit mind boggling. Do you have any trivial examples that may help my understanding? > Note that I could have just written this: > >> let iterate a = do >> ... iterate a' ... >> iterate accum In the meantime, I'm more than happy to claim ignorance and stick with the above version which is very accessible to us mere mortals. > So with my use of "flip fix", I'm really just calling fix on the > anonymous function (\iterate accum -> ...), and then the parameter > ("accum") is passed to the function returned by fix. So now you just > need a couple of weeks (or months if you're as slow as me) to > understand what fix is all about... :-) I won't try to understand fix just yet, but I'm still confused by the type of fix: fix :: (a -> a) -> a It appears to me that it takes a function as an argument, and that function takes a single argument. So how are you passing fix an anonymous function taking 2 arguments? Sorry if I have beaten this horse to death, but my pea-sized brain is working overtime here. Thanks for all of the help. Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > Pete Kazmier wrote: > >> I understand the intent of this code, but I am having a hard time >> understanding the implementation, specifically the combination of >> 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how >> one can call 'flip' on a function that takes one argument. > As to why it's okay to call "flip" on "fix" at all, look at the types > involved. > > fix :: (a -> a) -> a > flip :: (a -> b -> c) -> b -> a -> c > > By substitution: > > flip fix :: a -> ((a -> b) -> a -> b) -> b Sadly, I'm still confused. I understand how 'flip' works in the case where its argument is a function that takes two arguments. I've started to use this in my own code lately. But my brain refuses to understand how 'flip' is applied to 'fix', a function that takes one argument only, which happens to be a function itself. What is 'flip' flipping when the function passed to it only takes one argument? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
Pete Kazmier <[EMAIL PROTECTED]> writes: > I attempted to read Oleg's fold-stream implementation [1] as this > sounds quite appealing to me, but I was completely overwhelmed, > especially with all of the various type signatures used. It would be > great if one of the regular Haskell bloggers (Tom Moertel are you > reading this?) might write a blog entry or two interpreting his > implementation for those of us starting out in Haskell perhaps by > starting out with a non-polymorphic version so as to emphasize the > approach. > > [1] http://okmij.org/ftp/Haskell/fold-stream.lhs In the event any other Haskell newbie comes along someday and is just as overwhelmed as I was, I've found this post by Oleg to be a much easier to understand than the above paper because it is not as generic and thus the type signatures are a bit easier on the eyes: http://www.haskell.org/pipermail/haskell/2003-September/012741.html With that said, I have a question regarding Hal's response to the above email in which he states: > Just thought I'd mention that this is, in fact, my preferred method of > iterating over a file. It alleviates the pain associated with lazy file > IO, and simultaneously provides a useful abstraction. I actually have > 3*2 functions that I use which look like: > > > type Iteratee iter seed = seed -> iter -> Either seed seed > > hFoldChars :: FilePath -> Iteratee Char seed -> seed -> IO seed > > hFoldLines :: FilePath -> Iteratee String seed -> seed -> IO seed > > hFoldWords :: FilePath -> Iteratee [String] seed -> seed -> IO seed > > > type IterateeM iter seed = seed -> iter -> IO (Either seed seed) > > hFoldCharsM :: FilePath -> IterateeM Char seed -> seed -> IO seed > > hFoldLinesM :: FilePath -> IterateeM String seed -> seed -> IO seed > > hFoldWordsM :: FilePath -> IterateeM [String] seed -> seed -> IO seed > > Which perform as expected (hFoldWords(M) can be written in terms of > hFoldLinesM, but I find I use it sufficiently frequently to warrent > having it stand out). Also, of course, the only ones actually > implemented are the (M) variants; the non-M variants just throw a return > into the Iteratee. What does he mean by the very last sentence? Oleg's version seems more like the non-M versions. What is his implication? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > Pete Kazmier: >> I attempted to read Oleg's fold-stream implementation [1] as this >> sounds quite appealing to me, but I was completely overwhelmed, >> especially with all of the various type signatures used. It would be >> great if one of the regular Haskell bloggers (Tom Moertel are you >> reading this?) might write a blog entry or two interpreting his >> implementation for those of us starting out in Haskell perhaps by >> starting out with a non-polymorphic version so as to emphasize the >> approach. >> >> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs > > The basic idea of the paper is the use of a left-fold operator as the > primary interface for enumarating collections. The recursive version > (less general than the non-recursive version) of a left-fold operator > for enumerating the lines of a text file might look something like this: > >> import Control.Monad.Fix >> import Control.Exception >> import Data.List >> import qualified Data.Set as S >> import qualified Data.Map as M >> import System.IO >> >> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a >> enumLines iter accum filename = do >> h <- openFile filename ReadMode >> flip fix accum $ >> \iterate accum -> do >> try_line <- try (hGetLine h) >> case try_line of >> Left e -> hClose h >> return accum >> Right line -> do >> case iter accum line of >> Left accum -> hClose h >> return accum >> Right accum -> iterate accum I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. > To use this, you provide an "iteratee", a function which takes an > accumulator and a line from the file, and returns a new accumulator > embedded in an Either. Using the Left branch causes immediate > termination of the enumeration. For example, to search for the first > occurrence of each of a set of email headers: > >> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String >> String) >> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where >> findHdrs accum@(wanted,found) line = >> if null line >> then Left accum >> else >> case headerLine line of >> Nothing -> Right accum >> Just hdr -> >> case findDelete hdr wanted of >> Nothing -> Right accum >> Just wanted -> >> let accum = (wanted, M.insert hdr line found) in >> if S.null wanted >> then Left accum >> else Right accum >> >> headerLine :: String -> Maybe String >> headerLine (':':xs) = Just [] >> headerLine (x:xs) = fmap (x:) (headerLine xs) >> headerLine [] = Nothing >> >> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) >> findDelete e s = if S.member e s >> then Just (S.delete e s) >> else Nothing > > It's a bit of a case-analysis nightmare, but when comparing this to > previous approaches, note that file traversal and processing are cleanly > separated, file handle closure is guaranteed to be timely, file > traversal stops as soon as all the required headers have been found, > memory usage is minimised. Very nice. I like the clean separation, but as you say, its one ugly bit of code compared to my original code, although much more elegant no doubt. > I hope that helps. Very much so. Thank you for you help. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > So here's a test. I don't have any big maildirs handy, so this is based > on the simple exercise of printing the first line of each of a large > number of files. First, the preamble. > >> import Control.Exception (bracket) >> import System.Environment >> import System.IO > >> main = do >> t:n:fs <- getArgs >> ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs) > > [snip] Thank you for summarizing the approaches presented by others. As a Haskell newbie, there seems to be quite a few esoteric concepts to conquer. Your concrete examples were helpful in my understanding of the ramifications associated with the various approaches. After reading the various threads you cited, I decided to avoid lazy IO altogether. By using 'readFile' without forcing the strict evaluation of my parser, I inadvertently relinquished control of the resource management--closing of the file handles was left to the GC. And although I could have used 'seq' to address the issue, why bother fixing a problem that could have been avoided altogther by using strict IO. With that said, I added the following function to my program and then replaced the invocation of 'readFile' with it: readEmailHeaders :: FilePath -> IO String readEmailHeaders file = bracket (openFile file ReadMode) (hClose) (headers []) where headers acc h = do line <- hGetLine h case line of -- Stop reading file once we hit the empty separator -- line, no need to read the rest of the file (body). "" -> return . concat . reverse $ acc _ -> headers ("\n":line:acc) h I'm not sure if this is the best implementation, but the speed is comparable to the lazy IO version without the annoying defect of running out of file handles. I also tried an implementation using 'hGetChar' but that was much slower. I attempted to read Oleg's fold-stream implementation [1] as this sounds quite appealing to me, but I was completely overwhelmed, especially with all of the various type signatures used. It would be great if one of the regular Haskell bloggers (Tom Moertel are you reading this?) might write a blog entry or two interpreting his implementation for those of us starting out in Haskell perhaps by starting out with a non-polymorphic version so as to emphasize the approach. Thanks, Pete [1] http://okmij.org/ftp/Haskell/fold-stream.lhs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
[EMAIL PROTECTED] (Donald Bruce Stewart) writes: > pete-expires-20070513: >> When using readFile to process a large number of files, I am exceeding >> the resource limits for the maximum number of open file descriptors on >> my system. How can I enhance my program to deal with this situation >> without making significant changes? > > Read in data strictly, and there are two obvious ways to do that: > > -- Via strings: > > readFileStrict f = do > s <- readFile f > length s `seq` return s > > -- Via ByteStrings > readFileStrict = Data.ByteString.readFile > readFileStrictString = liftM Data.ByteString.unpack > Data.ByteString.readFile > > If you're reading more than say, 100k of data, I'd use strict > ByteStrings without hesitation. More than 10M, and I'd use lazy > bytestrings. Correct me if I'm wrong, but isn't this exactly what I wanted to avoid? Reading the entire file into memory? In my previous email, I was trying to state that I wanted to lazily read the file because some of the files are quite large and there is no reason to read beyond the small set of headers. If I read the entire file into memory, this design goal is no longer met. Nevertheless, I was benchmarking with ByteStrings (both lazy and strict), and in both cases, the ByteString versions of readFile yield the same error regarding max open files. Incidentally, the lazy bytestring version of my program was by far the fastest and used the least amount of memory, but it still crapped out regarding max open files. So I'm back to square one. Any other ideas? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Lazy IO and closing of file handles
When using readFile to process a large number of files, I am exceeding the resource limits for the maximum number of open file descriptors on my system. How can I enhance my program to deal with this situation without making significant changes? My program takes one or more directories that contain email messages, stored one per file, and prints a list of all the email threads. Here is a snippet of output: New addition to the Kazmier family Casey Kazmier Memoization in Erlang? Thomas Johnsson Ulf Wiger \(AL/EAB\) As a newcomer to Haskell, I am intrigued by lazy evaluation and how it can influence one's designs. With that said, I wrote the program as a sequence of list manipulations which seemed quite natural to do in Haskell starting with reading the contents of the each file. Here is the algorithm at the high level: 1. Read contents of all files returning a list of Strings 2. Map over the list and parse each String as an Email 3. Sort the list of Emails 4. Group the list of Emails by Subject 5. Map over the grouped list to create a list of Threads 6. Finally, print the list of Threads It is my understanding that, as a result of lazy IO, the entire file does not need to be read into memory because parseEmail only inspects the topmost portion of the email (its headers), which is a key part of my design as some of the files can be quite large. Unfortunately, as soon as I run this program on a directory with more than 1024 files, GHC craps out on me due to resource limits. It seems that the handles opened by readFile remain open. Would this be common across all Haskell implementations? How do I go about fixing this without making a significant number of changes to my program? Did I make a mistake in steps 1 and 2 above? Should I have read and parsed a single file at a time, and then move on to the next? I'd appreciate any other comments on the program as well. I feel this is the best example of Haskell code that I have written. Compared to the first version of this program I wrote a few months ago, this is a hundred times better. Here is the program: > module Main where > > import Control.Monad (filterM, liftM) > import Data.List > import Data.Maybe > import System.Directory > import System.Environment > > type From= String > type Subject = String > data Email = Email {from :: From, subject :: Subject} deriving Show > data Thread = Thread [Email] > > instance Show Thread where > show (Thread emails@(e:es)) = title ++ senders > where > title = newline . bolder . subject $ e > sender = newline . indent . from > senders = concatMap sender emails > newline = (++ "\n") > indent = (" " ++) > bolder = ("\27[0;32;40m" ++) . (++ "\27[0m") > > main = > getArgs >>= > mapM fileContentsOfDirectory >>= > mapM_ print . threadEmails . map parseEmail . concat > > fileContentsOfDirectory :: FilePath -> IO [String] > fileContentsOfDirectory dir = > setCurrentDirectory dir >> > getDirectoryContents dir >>= > filterM doesFileExist>>= -- ignore directories > mapM readFile > > parseEmail :: String -> Email > parseEmail text = > Email (getHeader "From") (getHeader "Subject") > where > getHeader = fromMaybe "N/A" . flip lookup headers > headers = concatMap mkassoc . takeWhile (/="") $ lines text > mkassoc s = case findIndex (==':') s of > Just n -> [(take n s, drop (n+2) s)] > Nothing -> [] > > threadEmails :: [Email] -> [Thread] > threadEmails = > map Thread . groupBy (fuzzy (==)) . sortBy (fuzzy compare) > where > fuzzy fn e e' = stripReFwd (subject e) `fn` stripReFwd (subject e') > stripReFwd= stripSpaces . reverse . stripToColon . reverse > stripSpaces = dropWhile (==' ') > stripToColon = takeWhile (/=':') ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: How would you replace a field in a CSV file?
[EMAIL PROTECTED] writes: > For such a small self-contained task, I don't think Haskell > is any better than Python. I figured as much, but I thought with the new FPS lazy bytestrings it might have a chance in terms of raw speed. On the other side of the coin, in terms of elegance, I thought I'd ask as haskellers always amaze me with their one-liners :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] How would you replace a field in a CSV file?
The other day at work an opportunity arose where I was hoping to sneak some Haskell into the pipeline of tools used to process call detail records (CDRs). In the telecommunications industry, CDRs are used for billing. Each CDR is a single line record of 30 comma-separated values. Each line is approximately 240 characters in length. The task at hand is to replace field number 10 if a new value can be found in a hashmap which is keyed using the contents of the field. My colleague was going to write a C program (that's all he knows), but I whipped up a trivial python program instead. I was curious if a haskell version could be faster and more elegant , but I have not been able to beat my python version in either case. So, I'm curious as to how you would go about this task in Haskell. The input files are generally 300-400MB, and the hashmap will contain perhaps 20-30 items. For those that know python, here is a very simple implementation that happens to be very fast compared to my Haskell version and very short: for line in sys.stdin: fields = line.split(',') fields[9] = tgmap.get(fields[9], fields[9]) print ",".join(fields), For each line in standard input: - Splits the string on the comma: "field0,field1,...,field29" => ["field0", "field1", ..., "field29"] to obtain a list of strings. - Gets the value associated with the key of field9 from tgmap, if it does not exist, it returns a default value which is the original value. I.e., if it's not in the map, then don't replace the field. - Joins the list of fields with a comma to yield a string again which is printed out to standard output. The join method on the string is a bit odd: ",".join([1,2,3]) => "1,2,3" Here is my first Haskell attempt: import Data.ByteString.Lazy.Char8 as B hiding (map,foldr) import Data.List (map) import Data.Map as M hiding (map) -- This is just a placeholder until I actually populate the map tgmap = M.singleton (B.pack "Pete") (B.pack "Kazmier") main = B.interact $ B.unlines . map doline . B.lines where doline= B.join comma . mapIndex fixup . B.split ',' fixup i s = if i==9 then M.findWithDefault s s tgmap else s comma = B.pack "," -- f is supplied the index of the current element being processed mapIndex f xs = m f 0 xs where m f i [] = [] m f i (x:xs') = f i x : m f (i+1) xs' After talking with dons on #haskell, he cleaned my version up and produced this version which gets rid of 'if' statement and makes mapIndex stricter: import Data.ByteString.Lazy.Char8 as B hiding (map,foldr) import Data.List (map) import Data.Map as M hiding (map) -- This will be populated from a file dict = M.singleton (B.pack "Pete") (B.pack "Kazmier") main = B.interact $ B.unlines . map doline . B.lines where doline= B.join comma . mapIndex fixup . B.split ',' comma = B.singleton ',' fixup 3 s = M.findWithDefault s s dict fixup n s = s -- f is supplied the index of the current element being processed mapIndex :: (Int -> ByteString -> ByteString) -> [ByteString] -> [ByteString] mapIndex f xs = m xs 0 where m [] _ = [] m (x:xs') i = f i x : (m xs' $! i+1) That helped things a bit, but I must confess I don't understand how the strictness improved things as I had assumed things were going to be evaluated in a reasonable amount of time due to the printing of output. I thought IO was interlaced with the execution and thus I wasn't going to have to concern myself over laziness. In addition, the function is able to generate new elements of the list on demand so I thought it was a good citizen in the lazy world. Could anyone help explain? And then he came up with another version to avoid the 'unlines', but that did not that really speed things up significantly. So, with all that said, is there a better approach to this problem? Perhaps a more elegant Haskell solution? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Editors for Haskell
Pete Kazmier <[EMAIL PROTECTED]> writes: > As part of my learning experience, I think I want to see if I can > write a haskell pastebin that does proper syntax highlighting. > Someone in #haskell suggested that I use just a lexer because using a > parser is overkill. However, I can't make this assessment until I see > how to use the parser and the information it can supply. Thanks for the responses and pointers to the other projects. I'll investigate those after the day-job (the one that pays the bills). As for using the lexer vs the parser, I was hoping to do things such as folding and/or nifty mouse-overs of logical blocks of code, which is why I was interested in the parser. I'm not sure if I could do the same with only a lexer. I'm basically just looking for something concrete to tinker with as I learn Haskell and it seems that Haskell is missing a snazzy pastebin. Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Editors for Haskell
"Simon Peyton-Jones" <[EMAIL PROTECTED]> writes: > You probably know this, but your kind of application is a big reason > that we now make GHC available as a library. (Just say 'import GHC'.) > > You shouldn't need to parse Haskell yourself: just call GHC's parser. > You get back a syntax tree with very precise location information that > can guide your editor (e.g. if you want to select a sub-exprssion). > Similarly, you can call the type checker. Are there any small examples of using GHC's parser? I'm a complete newbie so perhaps I'm not checking all of the relevant locations for docs, but I can't seem to find this parser that is being referred to. I checked out the source tree to GHC as well, but I have no idea where to look in there (not to mention it's a bit intimidating). Pointers would be appreciated! As part of my learning experience, I think I want to see if I can write a haskell pastebin that does proper syntax highlighting. Someone in #haskell suggested that I use just a lexer because using a parser is overkill. However, I can't make this assessment until I see how to use the parser and the information it can supply. Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe