On Tue, Jun 11, 2002 at 10:53:44PM +0100, [EMAIL PROTECTED]
wrote:
> Thanks for your reply. I was wondering, if you can't get any details about
> the overall volume of data perl has in memory, can you find out information
> about an individual variable's usage?
Not really. "Individual" variables are actually complex conglomerations of
lots of internal variables. For example, a Perl hash is really a C struct
containing various integers, pointers and strings and a C array of SV
pointers to Perl scalar values. Plus some empty slots in the array.
In theory it is possible to calculate how much memory a hash is using by
examining it's internal structure using XS, but to give you an idea:
$ perl5.8.0 -MDevel::Peek -wle '%h = (foo => 42, bar => 23, baz => 27); print Dump
\%h'
SV = RV(0x10226420) at 0x101d2098
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x102136b4
SV = PVHV(0x10211f50) at 0x102136b4
REFCNT = 2
FLAGS = (SHAREKEYS)
IV = 3
NV = 0
ARRAY = 0x10217008 (0:5, 1:3)
hash quality = 150.0%
KEYS = 3
FILL = 3
MAX = 7
RITER = -1
EITER = 0x0
Elt "bar" HASH = 0x80409109
SV = IV(0x10216690) at 0x101d22a8
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 23
Elt "baz" HASH = 0xffb60ff2
SV = IV(0x10216698) at 0x101d22cc
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 27
Elt "foo" HASH = 0x238678dd
SV = IV(0x10216688) at 0x101d217c
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
but this might get a little rediculous. Just looking at the length() of
each individual key/value pair is close enough.
> This was the core loop within my code
> which has to constantly compute the length of each input line just to have
> some sort of handle on memory usage.
Fortunately for you, Perl's strings are Pascal style, with the length
pre-calculated.
$ perl5.8.0 -MDevel::Peek -wle '$foo = "wibble"; print Dump \$foo'
SV = RV(0x10226420) at 0x101d217c
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x10212e90
SV = PV(0x101d2468) at 0x10212e90
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x101d8bc8 "wibble"\0
CUR = 6
LEN = 7
so length() is very cheap. Unlike strlen() in C, it doesn't have to walk
the string looking for a null byte.
> Once you write out to a temporary file the performance goes down enormously,
*mumble*Use an OS with real disk caching*mumble* Hmmm, what? I didn't say
anything. :)
> so I want to be able to use as much memory as possible. Do you have any
> suggestions as to how this could be improved?
I believe Programming Perls has a chapter on this, or something similar,
under merge sort.
> If your interested, there's a few oddities I discovered en route too:
> 1) The code:
> next if $seen{$_}; $seen{$_}=1;
> according to my benchmaring is faster than:
> unless ($seen{$_}++) {...}
> even though the former looks up $seen{$_} twice, and ++ is a pretty trivial
> operator. Its not where I'd have put my money.
Here's what I get on Linux.
$ perl5.6.1 ~/src/bench/next_vs_unless 200000
Benchmark: timing 200000 iterations of control, next, unless...
control: 1 wallclock secs ( 0.18 usr + 0.00 sys = 0.18 CPU) @ 1111111.11/s
(n=200000)
(warning: too few iterations for a reliable count)
next: 2 wallclock secs ( 1.74 usr + 0.00 sys = 1.74 CPU) @ 114942.53/s
(n=200000)
unless: 1 wallclock secs ( 1.17 usr + 0.02 sys = 1.19 CPU) @ 168067.23/s
(n=200000)
Rate next unless control
next 114943/s -- -32% -90%
unless 168067/s 46% -- -85%
control 1111111/s 867% 561% --
When things are this small and this close, changes in OS, perl version, how
perl was compiled, etc... will all cause perl's "speed" to vary. (Benchmark
code attached).
> 2) There is a bug Activestate Perl 5.6.0 and onwards for Win32 where garbage
> collection of %seen=() or %seen=undef is slow. I reported the bug back in
> December, but nothing seems to have happend since.
> http://bugs.activestate.com//ActivePerl/show_bug.cgi?id=18559
> I'm surprised there's been no action on this one as garbage collecting
> hashes I would have thought was critical to object oriented programming.
According to Bugzilla, Guru was unable to reproduce the problem:
------- Comments from Gurusamy Sarathy 2001-12-06 20:02
���
I don't see a significant difference among any of the 6xx builds,
so I think you are mistaken about 5.6.0 being faster. The 5xx
builds (5.0050x based) could have been faster for subsequent runs,
since they used a different memory allocator.
Thanks for the interesting test case. We'll continue to
investigate this.
--
This sig file temporarily out of order.
#!/usr/bin/perl -w
use Benchmark qw(cmpthese);
cmpthese(shift || -3, {
next => sub { do { next if $hash{foo}; $hash{foo} = 1;
delete $hash{foo}
}
},
unless => sub { unless( $hash{foo}++ ) { delete $hash{foo} } },
control => sub { delete $hash{foo} }
}
)