Re: Hashes and memory consumption

david Thu, 16 Jan 2003 11:02:24 -0800

Schoenwaelder Oliver wrote:

> Hi list members,
> 
> I have some problems with a script using hashes.
> I use hashes for years but never had this kind of problem before...
> I have a ascii file with 6.5 MB of data. This file is tokenized by
> Parse:Lex module.


i never use this module before so i don't know how efficient it will be and 
what algr. it uses.

> The tokens are then stored in a two level hash:
> $TokenHash{$TokenType}->{$TokenID}=$TokenValue.

not that much of a problem in Perl. hash algr. in Perl is pretty efficient 
in terms of speed and storage. yes, hash is a little waste of memory (apply 
to most other languages) but it's not that expensive as you might think of.

> The file contains 536,332 tokens which will lead to 79 keys for
> %TokenHash. I'm evaluating the hash with two loops, one for each level.
> Due to that I need to move back and forth through the _sorted_ hash while
> being in the loop I can't use the built-in procedures like "foreach $key1
> (keys %TokenHash)...". So I decided to use Tie::LLHash.

never use Tie::LLHash either. did you contact the author of this module?

> Now I'm amazed by the memory consumption. The script uses up to 300MB for

could be a combination of your parsing module and Tie::LLHash that cause 
this. by the way, what platform are you using?

> processing this small file which will lead to a 3.5 MB file at the end. I
> developed and tested my script with a 2K subset of the original file and
> therefore I haven't encountered the problem during tests.
> A simple "if (not exists $TokenHash{$TokenType}->{$TokenID}) {}" uses
> 110MB of memory. I encountered this when storing the elements into the
> hash was commented out. Just tokenization of the file uses 4M of memory.
> So in my opinion it's hash/hash operations related.
> In production the files to be processed will be up to several 100MB of
> size, so memory usage is really an issue for me.
> I also tried with "simple/built-in" hashes just to be sure that the module
> isn't the problem. But I got the same strange results. And I tried to use
> multi-dimension arrays, but they also use up to 50MB of memory.
> 
> Anything I need to consider? Anybody with the same experience?

i regularly parse log files up to > 50G with pretty bad nested deep down 
hash/ref but never encounter the problem you are referring to. for a 6.5M 
of text file to consum over 300M of memory is unbelieveable. if the problem 
comes down to the modules you are using, they would be useless and many 
users would have complain about them.

david

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Hashes and memory consumption

Reply via email to