I have to ask for your help on general approaches to write Haskell
programs using space more economically.
I hope to use Haskell for my daily programming next semester
so I'm trying to apply Haskell to some small practical applications.
The first try is a simple compression program. However, I found that
I need 25MB RAM to compress a 31K file. (total alloc = 25MB in
the time profile generated by GHC. I had to assign it a 2MB
stack and 8MB heap to get it run).
Enabling the heap profiler, I found the functions consume the most
space to be
main -- Read the file, generate the huffman tree and
then encode. In peudo code :
getArgs >>= \ [file] ->
readFile file >>= \ fdata ->
let tree = huff_tree fdata
in output (huff_encode tree fdata)
:
count_freq -- given the file content, count the times each character
appears
encode -- given the file content and a table, generates the encoded
output.
First of all, I noticed that there're 2 references to fdata.
It may have forced the program to keep the entire file in
memory. So I rewrote 'main', read the same file 2 twice.
It seemed to work! The total alloced memory reduced 1 MB.
Then I turned to 'encode'. Encode was originally written in
a tail-recursive style, that is
encode [] result = result
encode (fd:fdata) result = encode fdata (do_something_about result)
Previously I thought tail-recursive is good because it can be
transformed into loop. However in this style the final result
can't be extracted until all computation has been done (am I right?).
Before that, entire data must be kept in memory. So I used a non-tail
recursive version,
encode [] = ..
encode (fd:fdata) = (something : (encode fdata))
in hope that this would save some space when the program is evaluated
lazily. It also worked! The total alloc reduced to around 7 MB
and the heap usage graph of 'encode' redued from a high mountain
to several small hills. But I still had to assign it megas of
stack and heap space.
Finally it came to 'count_freq'. However I had no good method
to fix it.
count_freq :: [Char] -> [(Char, Freq)]
count_freq str = [(chr i, freq_arry ! i) | i <- range (0,255),
freq_arry ! i /= 0]
where freq_arry = accumArray (+) 0 (0,255) [(ord ch) := 1 | ch <- str]
What should I do?
Even the whole file was loaded into memory, I don't understand
how could a 31K file be expanded into megas. How should I write
Haskell programs to make it more economical in memory if I want to
process large files?
--
plato. rene descartes.william somerset maugham. albert einstein.alan turing
pablo picasso.homer.nicolaus copernicus PALADIN MU georg
cantor. lau tzu . kurt goedel .johnann [EMAIL PROTECTED] seb
astian bach. alonzo church.claude monet DEP. OF CIS, NCTU. TAIWAN donald
knuth.charles babbage.john backus.von neumann.william shakespeare.aristotle