On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote:
Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is "typical", I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.


I connected to a random one of our processes, which has been running for a typical amount of time and is currently at ~300MB RSS.

(gdb) p *(PyDictObject*)interned
$2 = {ob_refcnt = 1,
      ob_type = 0x8121240,
      ma_fill = 97239,
      ma_used = 95959,
      ma_mask = 262143,
      ma_table = 0xa493c008,
      ....}

Going from 3MB to 2.25MB isn't much, but it's not nothing, either.

I'd be skeptical of cache performance arguments given that the strings used in any particular bit of code should be spread pretty much evenly throughout the hash table, and 3MB seems solidly bigger than any L2 cache I know of. You should be able to get meaningful numbers out of a C profiler, but I'd be surprised to see the act of interning taking a noticeable amount of time.

-jake
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to