Re: [Python-Dev] Rethinking intern() and its data structure

Jake McGuire Thu, 09 Apr 2009 17:38:34 -0700

On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote:

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is "typical", I still don't know.


Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.

I connected to a random one of our processes, which has been runningfor a typical amount of time and is currently at ~300MB RSS.


(gdb) p *(PyDictObject*)interned
$2 = {ob_refcnt = 1,
      ob_type = 0x8121240,
      ma_fill = 97239,
      ma_used = 95959,
      ma_mask = 262143,
      ma_table = 0xa493c008,
      ....}

Going from 3MB to 2.25MB isn't much, but it's not nothing, either.

I'd be skeptical of cache performance arguments given that the stringsused in any particular bit of code should be spread pretty much evenlythroughout the hash table, and 3MB seems solidly bigger than any L2cache I know of. You should be able to get meaningful numbers out ofa C profiler, but I'd be surprised to see the act of interning takinga noticeable amount of time.


-jake
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rethinking intern() and its data structure

Reply via email to