Re: Memory explodes loading CSV into hash

2002-05-02 Thread Ernest Lergon

Stas Bekman wrote:
 
 Ideally when such a
 situation happens, and you must load all the data into the memory, which
 is at short, your best bet is to rewrite the datastorage layer in XS/C,
 and use a tie interface to make it transparent to your perl code. So you
 will still use the hash but the refs to arrays will be actually C arrays.
 
Sorry, I'm not familiar with C(hinese) - but if someone could develop a
XS/Pascal interface ;-))

 Ernest Lergon wrote:

  Another thing I found is, that Apache::Status seems not always report
  complete values. Therefore I recorded the sizes from top, too.
 
 Were you running a single process? If you aren't Apache::Status could
 have shown you a different process.
 
Running httpd -X shows the same results.

I will use the named %index structure for now. Thanks to the modular OO
perl I can re-code my data package later, if the memory explosion hits
me again ;-))

Ernest


-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-05-01 Thread Ernest Lergon

Hi Stas,

having a look at Apache::Status and playing around with your tips on

http://www.apacheweek.com/features/mod_perl11

I found some interesting results and a compromising solution:

In a module I load a CSV file as class data into different structures
and compared the output of Apache::Status with top.

Enclosed you'll find a test report.

The code below 'building' shows, how the lines are put into the
structures.

The lines below 'perl-status' show the output of Apache::Status.
The line below 'top' shows the output of top.

Examples for the tested structures are:

$buffer = '1\tr1v1\tr1v2\tr1v3\n2\tr2v1\tr2v2\tr2v3\n' ...

@lines = (
'1\tr1v1\tr1v2\tr1v3',
'2\tr2v1\tr2v2\tr2v3',
... )

%data = (
1 = [ 1, 'r1v1' , 'r1v2' , 'r1v3' ],
2 = [ 2, 'r2v1' , 'r2v2' , 'r2v3' ],
... )

$pack = {
1 = [ 1, 'r1v1' , 'r1v2' , 'r1v3' ],
2 = [ 2, 'r2v1' , 'r2v2' , 'r2v3' ],
... }

%index = (
1 = '1\tr1v1\tr1v2\tr1v3',
2 = '2\tr2v1\tr2v2\tr2v3',
... )

One thing I realized using Devel::Peek is, that using a hash of
array-ref, each item in the array has the full blown perl flags etc.
That seems to be the reason for the 'memory explosion'.

Another thing I found is, that Apache::Status seems not always report
complete values. Therefore I recorded the sizes from top, too.

Especially for the the hash of array-refs (%data) and the hash-ref of
array-refs ($pack) perl-status reports only a part of the used memory
size: for $pack only the pointer (16 bytes), for %data only the keys
(?).

As compromise I'll use the %index structure. It is small enough while
providing a fast access. A further optimization will be to remove the
redundant key-field from each line.

Success: A reduction from 26 MB to 7 MB - what I estimated in my first
mail.

A last word from perldebguts.pod:

|| Perl is a profligate wastrel when it comes to memory use.  There is a
|| saying that to estimate memory usage of Perl, assume a reasonable
|| algorithm for memory allocation, multiply that estimate by 10, and
|| while you still may miss the mark, at least you won't be quite so
|| astonished.  This is not absolutely true, but may prvide a good grasp
|| of what happens.
||
|| [...]
||
|| Anecdotal estimates of source-to-compiled code bloat suggest an
|| eightfold increase.

Perhaps my experiences could be added to the long row of anecdotes ;-))

Thank you all again for escorting me on this deep dive.

Ernest

--

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




TEST REPORT
===

CSV file:
14350 records
CSV 2151045 bytes = 2101 Kbytes
CSV_2   2136695 bytes = 2086 Kbytes (w/o CR)

1   all empty
=

building:
none

perl-status:
*buffer{SCALAR}   25 bytes
*lines{ARRAY} 56 bytes
*data{HASH}  228 bytes
*pack{SCALAR} 16 bytes
*index{HASH} 228 bytes

top:
12992  12M 12844   base

2   buffer
==

building:
$buffer .= $_ . \n;

perl-status:
*buffer{SCALAR}  2151069 bytes = CSV + 24 bytes
*lines{ARRAY} 56 bytes
*data{HASH}  228 bytes
*pack{SCALAR} 16 bytes
*index{HASH} 228 bytes

top:
17200  16M 17040   base + 4208 Kbytes = CSV + 2107 KBytes

3   lines
=

building:
push @lines, $_;

perl-status:
*buffer{SCALAR}   25 bytes
*lines{ARRAY}2519860 bytes = CSV_2 + 383165 bytes
 (approx. 27 * 14350 )
*data{HASH}  228 bytes
*pack{SCALAR} 16 bytes
*index{HASH} 228 bytes

top:
18220  17M 18076   base + 5228 Kbytes = CSV_2 + 3142 Kbytes

4   data


building:
@record = split ( \t, $_ );
$key = 0 + $record[0];
$data{$key} = [ @record ];

perl-status:
*buffer{SCALAR}   25 bytes
*lines{ARRAY} 56 bytes
*data{HASH}   723302 bytes = approx. 50 * 14350 ( key +
ref )
 (where is the data?)
*pack{SCALAR} 16 bytes
*index{HASH} 228 bytes

top:
40488  38M 39208   base + 27566 Kbytes = CSV_2 + 25480 Kbytes

Re: Memory explodes loading CSV into hash

2002-05-01 Thread Stas Bekman

Ernest Lergon wrote:

 having a look at Apache::Status and playing around with your tips on
 
 http://www.apacheweek.com/features/mod_perl11
 
 I found some interesting results and a compromising solution:

Glad to hear that Apache::Status was of help to you. Ideally when such a 
situation happens, and you must load all the data into the memory, which 
is at short, your best bet is to rewrite the datastorage layer in XS/C, 
and use a tie interface to make it transparent to your perl code. So you 
will still use the hash but the refs to arrays will be actually C arrays.

 Another thing I found is, that Apache::Status seems not always report
 complete values. Therefore I recorded the sizes from top, too.

Were you running a single process? If you aren't Apache::Status could 
have shown you a different process. Also you can use GTop, if you have 
libgtop on your system, which gives you a perl interface to the proc's 
memory usage. See the guide for many examples.

 Success: A reduction from 26 MB to 7 MB - what I estimated in my first
 mail.

:)
__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Ernest Lergon

Kee Hinckley wrote:
 
 
  At 17:18 28.04.2002, Ernest Lergon wrote:
  Now I'm scared about the memory consumption:
  
  The CSV file has 14.000 records with 18 fields and a size of 2 MB
  (approx. 150 Bytes per record).
 
  Now a question I would like to ask: do you *need* to read the whole CSV
  info into memory? There are ways to overcome this. For example, looking at
 
 When I have a csv to play with and it's not up to being transfered to a real
 database I use the DBD CSV module which puts a nice sql wrapper around
 it.

I've installed DBD::CSV and tested it with my data:

$dbh =
DBI-connect(DBI:CSV:csv_sep_char=\t;csv_eol=\n;csv_escape_char=)
$dbh-{'csv_tables'}-{'foo'} = { 'file' = 'foo.data'};

3 MB memory used.

$sth = $dbh-prepare(SELECT * FROM foo);

3 MB memory used.

$sth-execute();

16 MB memory used!

If I do it record by record like

$sth = $dbh-prepare(SELECT * FROM foo WHERE id=?);

than memory usage will grow query by query due to caching.

Moreover it becomes VERY slow because of reading every time the whole
file again; an index can't be created/used.

No win :-(

Ernest


-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Ernest Lergon

Perrin Harkins wrote:
 
  $foo-{$i} = [ @record ];
 
 You're creating 14000 arrays, and references to them (refs take up space
 too!).  That's where the memory is going.
 
 See if you can use a more efficient data structure.  For example, it
 takes less space to make 4 arrays with 14000 entries in each than to
 make 14000 arrays with 4 entries each.


So I turned it around:

$col holds now 18 arrays with 14000 entries each and prints the correct
results:

#!/usr/bin/perl -w

$col = {};

$line = \t\t\t;   #   4 string fields (4 chars)
$line .= \t10.99x9;   #   9 float fields (5 chars)
$line .= \t . 'A'x17; #   5 string fields (rest)
$line .= \t . 'B'x17; #
$line .= \t . 'C'x17; #
$line .= \t . 'D'x17; #
$line .= \t . 'E'x17; #

@record = split \t, $line;

foreach $j ( 0 .. $#record )
{
$col-{$j} = [];
}

for ( $i = 0; $i  14000; $i++ )
{
map { $_++ } @record;

foreach $j ( 0 .. $#record )
{
push @ { $col-{$j} }, $record [$j];
}

print $i\t$col-{0}-[$i],$col-{5}-[$i]\n   unless $i % 1000;
}

;

1;

and gives:

SIZE   RSS  SHARE
12364  12M  1044

Wow, 2 MB saved ;-))

I think, a reference is a pointer of 8 Bytes, so:

14.000 * 8 = approx. 112 KBytes - right?

This doesn't explain the difference of 7 MB calculated and 14 MB
measured.

Ernest


-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Ernest Lergon

Hi,

thank you all for your hints, BUT (with capital letters ;-)

I think, it's a question of speed: If I hold my data in a hash in
memory, access should be faster than using any kind of external
database.

What makes me wonder is the extremely blown up size (mod)perl uses for
datastructures.

Ernest


-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Simon Oliver

Have you tried DBD::AnyData?  It's pure Perl so it might not be as fast
but you never know?

--
  Simon Oliver



Re: Memory explodes loading CSV into hash

2002-04-29 Thread Perrin Harkins

Ernest Lergon wrote:
 So I turned it around:
 
 $col holds now 18 arrays with 14000 entries each and prints the correct
 results:
...
 and gives:
 
 SIZE   RSS  SHARE
 12364  12M  1044
 
 Wow, 2 MB saved ;-))

That's pretty good, but obviously not what you were after.

I tried using the pre-size array syntax ($#array = 14000), but it didn't 
help any.  Incidentally, that map statement in your script isn't doing 
anything that I can see.

 I think, a reference is a pointer of 8 Bytes, so:
 
 14.000 * 8 = approx. 112 KBytes - right?

Probably more.  Perl data types are complex.  They hold a lot of meta 
data (is the ref blessed, for example).

 This doesn't explain the difference of 7 MB calculated and 14 MB
 measured.

The explanation of this is that perl uses a lot of memory.  For one 
thing, it allocates RAM in buckets.  When you hit the limit of the 
allocated memory, it grabs more, and I believe it grabs an amount in 
proportion to what you've already used.  That means that as your 
structures get bigger, it grabs bigger chunks.  The whole 12MB may not 
be in use, although perl has reserved it for possible use.  (Grabbing 
memory byte by byte would be less wasteful, but much too slow.)

The stuff in perldebguts is the best reference on this, and you've 
already looked at that.  I think your original calculation failed to 
account for the fact that the minimum numbers given there for scalars 
are minimums (i.e. scalars with something in them won't be that small) 
and that you are accessing many of these in more than one way (i.e. as 
string, float, and integer), which increases their size.  You can try 
playing with compile options (your choice of malloc affects this a 
little), but at this point it's probably not worth it.  There's nothing 
wrong with 12MB of shared memory, as long as it stays shared.  If that 
doesn't work for you, your only choice will be to trade some speed for 
reduced memory useage, by using a disk-based structure.

At any rate, mod_perl doesn't seem to be at fault here.  It's just a 
general perl issue.

- Perrin




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Ernest Lergon

Perrin Harkins wrote:
 
 [snip]

 Incidentally, that map statement in your script isn't doing
 anything that I can see.

It simulates different values for each record - e.g.:

$line = \t\t1000\t10.99;

@record = split \t, $line;

for ( $i = 0; $i  14000; $i++ )
{
map { $_++ } @record;

#   $i=0@record=('AAAB','BBBC',1001,11.99);
#   $i=1@record=('AAAC','BBBD',1002,12.99);
#   $i=2@record=('AAAD','BBBE',1003,13.99);
#   etc.
}


 [snip]

Thanks for your explanations about perl's memory usage.

 At any rate, mod_perl doesn't seem to be at fault here.  It's just a
 general perl issue.
 
I think so, too.

Ernest



-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-04-29 Thread Stas Bekman

Ernest Lergon wrote:
 Hi,
 
 thank you all for your hints, BUT (with capital letters ;-)
 
 I think, it's a question of speed: If I hold my data in a hash in
 memory, access should be faster than using any kind of external
 database.
 
 What makes me wonder is the extremely blown up size (mod)perl uses for
 datastructures.

Looks like you've skipped over my suggestion to use Apache::Status. It 
uses B::Size and B::TerseSize to show you *exactly* how much memory each 
variable, opcode and what not uses. No need to guess.

You can use B:: modules directly, but since you say that outside of 
mod_perl the memory usage pattern is different I'd suggest using 
Apache::Status.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com




Re: Memory explodes loading CSV into hash

2002-04-28 Thread Stas Bekman

Ernest Lergon wrote:
 Hi,
 
 in a mod_perl package I load a CSV file on apache startup into a simple
 hash as read-only class data to be shared by all childs.
 
 A loading routine reads the file line by line and uses one numeric field
 as hash entry (error checks etc. omitted):
 
 package Data;
 
 my $class_data = {};
 
 ReadFile ( 'data.txt', $class_data, 4 );
 
 sub ReadFile
 {
   my $filename  = shift;  # path string
   my $data  = shift;  # ref to hash
   my $ndx_field = shift;  # field number
 
   my ( @record, $num_key );
   local $_;
 
   open ( INFILE, $filename );
   while ( INFILE )
   {
   chomp;
   @record = split \t;
   $num_key = $record[$ndx_field];
   $data-{$num_key} = [ @record ];
   }
   close ( INFILE );
 }
 
 sub new...
   creates an object for searching the data, last result, errors etc.
 
 sub find...
   method with something like:
   if exists $class_data-{$key}   return...
 etc.
 
 Now I'm scared about the memory consumption:
 
 The CSV file has 14.000 records with 18 fields and a size of 2 MB
 (approx. 150 Bytes per record).
 
 Omitting the loading, top shows, that each httpd instance has 10 MB (all
 shared as it should be).
 
 But reading in the file explodes the memory to 36 MB (ok, shared as
 well)!
 
 So, how comes, that 2 MB data need 26 MB of memory, if it is stored as a
 hash?
 
 Reading perldebguts.pod I did not expect such an increasing:
 
 Description (avg.) CSV  Perl
 4 string fields (4 chars)16 bytes   (32 bytes)  128 bytes
 9 float fields (5 chars) 45 bytes   (24 bytes)  216 bytes
 5 string fields (rest)   89 bytes   (32 bytes)  160 bytes
 the integer key (20 bytes)   20 bytes
 150 bytes   524 bytes

 That will give 14.000 x 524 = approx. 7 MB, but not 26 MB !?
 
 Lost in space...

Use Apache::Status, which can show you exactly where all the bytes go. 
See the guide or its manpage for more info. I suggest that you 
experiment with a very small data set and look at how much memory each 
record takes.


__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com





Re: Memory explodes loading CSV into hash

2002-04-28 Thread Ernest Lergon

Jeffrey Baker wrote:
 
 I tried this program in Perl (outside of modperl) and the memory
 consumption is only 4.5MB:
 
 #!/usr/bin/perl -w
 
 $foo = {};
 
 for ($i = 0; $i  14000; $i++) {
 $foo-{sprintf('%020d', $i)} = 'A'x150;
 }
 
 ;
 
 1;
 
 So I suggest something else might be going on causing your memory
 problems.
 
Hi Jeffrey,

good idea to boil it down.

Yes, your prog gave me only:

SIZE   RSS  SHARE
 4696  4696  964

Running my code snippet outside mod_perl (with real data) still gives:

SIZE   RSS  SHARE
14932  14M  1012

A simulation like this:

#!/usr/bin/perl -w

$foo = {};

$line = \t\t\t;   #   4 string fields (4 chars)
$line .= \t10.99x9;   #   9 float fields (5 chars)
$line .= \t . 'A'x17; #   5 string fields (rest)
$line .= \t . 'B'x17; #
$line .= \t . 'C'x17; #
$line .= \t . 'D'x17; #
$line .= \t . 'E'x17; #

@record = split \t, $line;

for ($i = 0; $i  14000; $i++)
{
map { $_++ } @record;

$foo-{$i} = [ @record ];

print $i\t$foo-{$i}-[0],$foo-{$i}-[5]\n   unless $i % 1000;
}

;

1;

prints:

0   AAAB,11.99
1000ABMN,1011.99
2000ACYZ,2011.99
3000AELL,3011.99
4000AFXX,4011.99
5000AHKJ,5011.99
6000AIWV,6011.99
7000AKJH,7011.99
8000ALVT,8011.99
9000ANIF,9011.99
1   AOUR,10011.99
11000   AQHD,11011.99
12000   ARTP,12011.99
13000   ATGB,13011.99

and gives:

SIZE   RSS  SHARE
14060  13M  1036

There is no difference between real and random data.

But I think, there is an optimization mechanism in perl concerning
strings, so you need less memory for your code.

So what is going on?  2 MB - 14 MB ?

Still lost in space ;-))

Ernest



-- 

*
* VIRTUALITAS Inc.   *  *
**  *
* European Consultant Office *  http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon  *
* Friedrichstraße 95 *mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany *   ums:+49180528132130266 *
*
   PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




Re: Memory explodes loading CSV into hash

2002-04-28 Thread Perrin Harkins

 $foo-{$i} = [ record ];

You're creating 14000 arrays, and references to them (refs take up space
too!).  That's where the memory is going.

See if you can use a more efficient data structure.  For example, it
takes less space to make 4 arrays with 14000 entries in each than to
make 14000 arrays with 4 entries each.  There is some discussion of this
in the Advanced Perl Programming book, and probably some CPAN modules
that can help.

- Perrin




Re: Memory explodes loading CSV into hash

2002-04-28 Thread Per Einar Ellefsen

At 17:18 28.04.2002, Ernest Lergon wrote:
Now I'm scared about the memory consumption:

The CSV file has 14.000 records with 18 fields and a size of 2 MB
(approx. 150 Bytes per record).

Now a question I would like to ask: do you *need* to read the whole CSV 
info into memory? There are ways to overcome this. For example, looking at 
your data I suppose you might want to look for specific IDs, in that case 
it would be much more efficient to read one line at a time and check if 
it's the correct one. Otherwise you might want to make the move th a 
relational database, this is the kind of thing RDMSes excel at.

Just some tips.


-- 
Per Einar Ellefsen
[EMAIL PROTECTED]





Re: Memory explodes loading CSV into hash

2002-04-28 Thread Andrew McNaughton





On Sun, 28 Apr 2002, Per Einar Ellefsen wrote:

 At 17:18 28.04.2002, Ernest Lergon wrote:
 Now I'm scared about the memory consumption:
 
 The CSV file has 14.000 records with 18 fields and a size of 2 MB
 (approx. 150 Bytes per record).

 Now a question I would like to ask: do you *need* to read the whole CSV
 info into memory? There are ways to overcome this. For example, looking at
 your data I suppose you might want to look for specific IDs, in that case
 it would be much more efficient to read one line at a time and check if
 it's the correct one. Otherwise you might want to make the move th a
 relational database, this is the kind of thing RDMSes excel at.

You might also want to look at loading your CSV data into a  MLDBM file,
and then having  your apache processes access it from there.  That way
most of your data stays on disk, and you access it in much the same way as
before through a hash of arrays.

Andrew McNaughton




Re: Re: Memory explodes loading CSV into hash

2002-04-28 Thread Kee Hinckley

 
 
 At 17:18 28.04.2002, Ernest Lergon wrote:
 Now I'm scared about the memory consumption:
 
 The CSV file has 14.000 records with 18 fields and a size of 2 MB
 (approx. 150 Bytes per record).
 
 Now a question I would like to ask: do you *need* to read the whole CSV 
 info into memory? There are ways to overcome this. For example, looking at 

When I have a csv to play with and it's not up to being transfered to a real
database I use the DBD CSV module which puts a nice sql wrapper around
it.



Re: Memory explodes loading CSV into hash

2002-04-28 Thread Jeffrey Baker

On Sun, Apr 28, 2002 at 05:18:24PM +0200, Ernest Lergon wrote:
 Hi,
 
 in a mod_perl package I load a CSV file on apache startup into a simple
 hash as read-only class data to be shared by all childs.
 
 A loading routine reads the file line by line and uses one numeric field
 as hash entry (error checks etc. omitted):
 
 package Data;
 
 my $class_data = {};
 
 ReadFile ( 'data.txt', $class_data, 4 );
 
 sub ReadFile
 {
   my $filename  = shift;  # path string
   my $data  = shift;  # ref to hash
   my $ndx_field = shift;  # field number
 
   my ( record, $num_key );
   local $_;
 
   open ( INFILE, $filename );
   while ( INFILE )
   {
   chomp;
   record = split \t;
   $num_key = $record[$ndx_field];
   $data-{$num_key} = [ record ];
   }
   close ( INFILE );
 }
 
 sub new...
   creates an object for searching the data, last result, errors etc.
 
 sub find...
   method with something like:
   if exists $class_data-{$key}   return...
 etc.
 
 Now I'm scared about the memory consumption:
 
 The CSV file has 14.000 records with 18 fields and a size of 2 MB
 (approx. 150 Bytes per record).
 
 Omitting the loading, top shows, that each httpd instance has 10 MB (all
 shared as it should be).
 
 But reading in the file explodes the memory to 36 MB (ok, shared as
 well)!
 
 So, how comes, that 2 MB data need 26 MB of memory, if it is stored as a
 hash?
 
 Reading perldebguts.pod I did not expect such an increasing:
 
 Description (avg.) CSV  Perl
 4 string fields (4 chars)16 bytes   (32 bytes)  128 bytes
 9 float fields (5 chars) 45 bytes   (24 bytes)  216 bytes
 5 string fields (rest)   89 bytes   (32 bytes)  160 bytes
 the integer key (20 bytes)   20 bytes
 150 bytes   524 bytes
 
 That will give 14.000 x 524 = approx. 7 MB, but not 26 MB !?

I tried this program in Perl (outside of modperl) and the memory
consumption is only 4.5MB:

#!/usr/bin/perl -w

$foo = {};

for ($i = 0; $i  14000; $i++) {
$foo-{sprintf('%020d', $i)} = 'A'x150;
}

;

1;

So I suggest something else might be going on causing your memory
problems.

-jwb