On Wednesday 11 June 2008, Chuck Swiger wrote:

> If your data files are small enough to fit into 2GB of address space,
> try using mmap() and then treat the file(s) as an array of records or
> memoblocks or whatever, and let the VM system deal with paging in the
> parts of the file you need.  Otherwise, don't fread() 1 record at a
> time, read in at least a (VM page / sizeof(record)) number of records
> at a time into a bigger buffer, and then process that in RAM rather
> than trying to fseek in little increments.

During a marathon session last night, I did just that.  I changed the 
sequential reads 
in the "outer" file to fread many records at a time.  Then I switched to mmap() 
for the 
random-access file.  The results were much better, with good CPU usage and only 
3 times 
the wall clock runtime:

[EMAIL PROTECTED] date; time /tmp/cdbf /tmp/invoice.dbf >/dev/null; date
Thu Jun 12 13:56:49 CDT 2008
/tmp/cdbf /tmp/invoice.dbf > /dev/null  29.00s user 11.16s system 56% cpu 
1:11.03 total
Thu Jun 12 13:58:00 CDT 2008

[EMAIL PROTECTED] date; time /tmp/cdbf ~pgsql/data/frodumps/xbase/invoice.dbf 
invid ln 
>/dev/null; date
Thu Jun 12 14:10:57 CDT 2008
/tmp/cdbf ~pgsql/data/frodumps/xbase/invoice.dbf invid ln > /dev/null  38.14s 
user 
6.21s system 23% cpu 3:05.13 total
Thu Jun 12 14:14:02 CDT 2008

> Also, if you're malloc'ing and freeing buf & memohead with every
> iteration of the loop, you're just thrashing the malloc system;
> instead, allocate your buffers once before the loop, and reuse them
> (zeroize or copy new data over the previous results) instead.

Also done.  I'd gotten some technical advice from Slashdot (which speaks 
volumes for my 
clueless, granted) that made it sound like a good idea.  I changed almost all 
the 
mallocs into static buffers.

I'm still offering that shell account to anyone who wants to take a peek.  :-)
-- 
Kirk Strauser

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to