[Haskell-cafe] Re: XML parser recommendation?

Rene de Visser Tue, 23 Oct 2007 09:38:30 -0700

"Uwe Schmidt" <[EMAIL PROTECTED]> schrieb im Newsbeitrag 
news:[EMAIL PROTECTED]
it into HXT.
>
> This still does not solve the processing of "very very large"
> XML document. I doubt, whether we can do this with a DOM
> like approach, as in HXT or HaXml. Lazy input does not solve all problems.
> A SAX like parser could be a more useful choice for very large documents.
>
> Uwe


I think a step towards support medium size documents in HXT would be to 
store the tags and content more efficiently.
If I undertand the coding correctly every tag is stored as a seperate 
Haskell string. As each byte of a string under GHC takes 12 bytes this alone 
leads to high memory usage. Tags tend to repeat. You could store them 
uniquely using a hash table. Content could be stored in compressed byte 
strings.

As I mentioned in an earlier post 2GB memory is not enough to process a 35MB 
XML document in HXT as we have

30 x 2 x 12 = 720 MB for starters to just store the string data (once in the 
parser and once in the DOM).

(Well a machine with 2GB memory). I guess I had somewhere around 1GB free 
for the program. Other overheads most likely used up the ramaining 300 MB.

Rene. 



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: XML parser recommendation?

Reply via email to