Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. program not running lazily (Mason Lai) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Jan 2016 16:53:02 -0800 From: Mason Lai <masonm...@gmail.com> To: beginners@haskell.org Subject: [Haskell-beginners] program not running lazily Message-ID: <cah1ivpf1x8qoak1ym+g4f5h3tvy3jkcoaaqesjn3bwy5pvt...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi, I'm teaching myself Haskell and attempting to use it for day-to-day tasks at work. I signed up for this list a few weeks ago, and this is my first post. Here's my problem: I'm given a list of 45 Strings, each nine Chars long. The only characters are 'A', 'C', 'G' and 'T'. (This is for a bioinformatics application.) I'll refer to any nine-length String composed of these Chars as a "9-mer". I need to generate a larger list of 9-mers such that no two 9-mers have a Levenshtein distance of less than 3. (The list I start with satisfies this requirement.) I can generate as many 9-mers as I possibly can, but this process is very slow. It's also not being computed lazily; calling head on the output list forces the entire list of 9-mers to be computed. *I'd like to understand why this list isn't being computed lazily, and how I can change my code to make it so. *My knowledge of monads is pretty poor as well, so a digression or a series of questions about why the line [9] >>= (`replicateM` "ACGT") works would also be helpful, as would general tips about writing clean code. This is an O(n^2) operation, so I'm not expecting it to be slow. However, I figured that I'd just be able to take the first N elements from the output list. Here's what I have: import Control.Monad main :: IO () main = interact processData processData :: String -> String processData = unlines . (`merge` ([9] >>= (`replicateM` "ACGT"))) . lines -- Merges two lists of things into a single list merge :: Eq a => [[a]] -> [[a]] -> [[a]] merge xs [] = xs merge xs ys = merge ((head ys) `addInto` xs) $ tail ys -- Adds a thing into a list if it is "different" enough addInto :: Eq a => [a] -> [[a]] -> [[a]] y `addInto` ys = if ((minimum $ map (dist y) ys) < 3) then ys else y:ys -- Calculate the Levenshtein distance -- Lloyd Allison algorithm. See Haskell wiki -- code omitted dist :: Eq a => [a] -> [a] -> Int My workaround to getting a smaller subset of the whole list is to replace [9] >>= (`replicateM` "ACGT") with take 10000 $ [9] >>= (`replicateM` "ACGT") Thanks! Mason Lai -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160120/d3f38aa0/attachment-0001.html> ------------------------------ Subject: Digest Footer _______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners ------------------------------ End of Beginners Digest, Vol 91, Issue 25 *****************************************