On Jun 17, 12:28 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > My guess is that the two main memory allocate/deallocate cases are 1) > > appending a new item to the end, and 2) GC'ing the entire data > > structure. I would optimize these 2 at the expense of all others. > > Does that include dictionary lookups? > > Regards, > Martin
Well, I did (try to) qualify my guess as pertaining specifically to memory allocate/deallocate cases, the other topics in the nearby posts were talking about updates to the odict, so I hope I can be forgiven this oversight. But you're right, the keyed access is one of the main reasons this is being done with an odict instead of just a list of tuples. The point I was trying to make was that sometimes, the GC case occurs far more frequently than other update methods, yet is overlooked because it happens invisibly to most Python users. I worked on a project a while ago where there was a huge performance penalty in releasing a list structure, because each node was freed individually, causing surrounding freelist entries to be updated for every node. In typical worst-case scenario fashion, a second list was built at the same time as the first, and the default system memory allocator interleaved the list nodes. The truly frustrating part was that at this point in the program, we were trying to free *all* the nodes in *both* lists, and would have preferred to just release the whole chunk of memory, if it had just been allocated in a large chunk instead of a node at a time. I implemented a zone-based memory allocator, so that nodes were allocated within a memory zone, but when we were done, we just released the entire zone, with tremendous improvement in system performance. But what kind of keyed access is likely to be used on an odict? For those cases where all of the items are desired, then I would recommend that the odict iterator be preferred, for this type of retrieval: for k,v in odictvar.iteritems(): # do something with key k and value v and discourage this usage: for k in odictvar.keys(): # or iterkeys() # do something with key k and value odictvar[k] How about this as a simple first approach? Let's assume that the nominal usage pattern for an odict is 1) create all entries in the odict, probably using append, 2) access some or all of the entries, either by iterating over the keys or values, or by keyed access to specific values, and 3) delete the odict. Have the odict keep an internal dict representing the keyed lookups, and have this internal dict set to null whenever the odict is updated. When the odict is accessed by specific key, the internal dict is used - if it null, it is built using dict(odict.iteritems()). In the nominal usage, this dict will only be built once, since the keyed accesses wont begin until after all of the key-value pairs have been added to the odict. Or as a first-first approach (to avoid premature optimization), just implement the odict as a list of tuples, and do linear search for matching key if accessed by key. I would bias any performance choices about keyed access toward cases where the queried key exists in the odict - I think that in those cases where odicts are used, such as ORMs and XML, the keys for a particular odict are known and expected to exist. Those applications where the keys are not known are likely to be meta-type apps, like debuggers and introspection GUIs. I contend that the meat-and- potatoes production apps will be those like database queries - when I get an odict back after querying an EMPLOYEES table, I really should reasonably expect odictvar["EMPNO"] to exist. And what would be the conditions of an abusive form of odict? How about an application log kept in an odict, keyed by timestamp? This could have many 1000's of entries, and yet, I would guess that an odict would be the wrong data structure for this, and that a list of tuples or LogMessage objects would be more appropriate. But someone is likely to do it - if they do, what will happen? Perhaps trying to anticipate "abusive" or degenerate uses of an odict might shed some light on ways to avoid, discourage, or maybe even accommodate these uses in odict. Well, this has become something of a rant, and a speculative one at that, but I think the "delete the entire data structure" memory case should be given reasonably high design attention. (And maybe this is already the norm in Python development - I'm really quite ignorant of this process, not being in the Python development circle.) Always nice to hear from you Martin - cheers! -- Paul -- http://mail.python.org/mailman/listinfo/python-list