Re: [Python-Dev] Rethinking intern() and its data structure

Mike Klaas Thu, 09 Apr 2009 20:43:22 -0700


On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote:

Greg Ewing wrote:

John Arbash Meinel wrote:

And the way intern is currently
written, there is a third cost when the item doesn't exist yet,which is
another lookup to insert the object.


That's even rarer still, since it only happens the first
time you load a piece of code that uses a given variable
name anywhere in any module.


Somewhat true, though I know it happens 25k times during startup of

bzr... And I would be a *lot* happier if startup time was 100msinstead

of 400ms.

I don't want to quash your idealism too severely, but it is extremelyunlikely that you are going to get anywhere near that kind of speed upby tweaking string interning. 25k times doing anything (computation)just isn't all that much.

$ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x inxrange(25000): d.get(x)'

100 loops, best of 3: 8.28 msec per loop

Perhaps this isn't representative (int hashing is ridiculously cheap,for instance), but the dict itself is far bigger than the dict you aredealing with and such would have similar cache-busting properties.And yet, 25k accesses (plus python->c dispatching costs which you arepaying with interning) consume only ~10ms. You could do more good byeliminating a handful of disk seeks by reducing the number of importedmodules...


-Mike
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rethinking intern() and its data structure

Reply via email to