Christian Höner writes:
> welcome ;-)
Thanks!
> What you still need to do is to make @stats@ and @foldl@ more strict.
> For this you should check out @foldl'@ (note the prime) and bang
> patterns for @stats@. I'm intentionally not giving the solution!
Ah, thanks for the hints - especially the second one; I was suspecting
the first :-)
> Once you have done that the program should work with arbitrarily long
> inputs.
Indeed, this is what I changed it into:
{-# LANGUAGE BangPatterns #-}
import System.Environment
import Data.List
import Bio.Sequence.FastQ
import Bio.Core.Sequence
main = do
[f] <- getArgs
putStrLn . output . average . foldl' stats (0, 0) =<< readIllumina f
where stats (!count, !totalLength) s = (count+1,
totalLength+toInteger(seqlength s))
average (count, totallength) = (count, totallength, t/c)
where t = fromIntegral totallength :: Float
c = fromIntegral count :: Float
output (count, length, average) = "Count " ++ show count ++ "\n" ++ "Total "
++ show count ++ " records " ++ show length ++ " length " ++ show average ++ "
average"
And now it works fine on a fastq-file of 5.1 GB on my desktop with 16GB
RAM.
Let me see if I can find a bigger fastq-file to run it on... Yes, it
works fine on the 33 GB file I just located.
Thanks for the tips!
Do you gradually get an intuitive feeling for when strictness is
"necessary", is it something you'll handle when running into a problem,
or do you do measurements?
Best regards,
Adam
--
"A cat has nine lives, but a bullfrog croaks Adam Sjøgren
every day." [email protected]