On Jan 23, 1:24 am, Terry Reedy <tjre...@udel.edu> wrote: > On 1/23/2010 3:23 AM, Steve Howell wrote: > > > On Jan 23, 12:13 am, Terry Reedy<tjre...@udel.edu> wrote: > > >> Challenge yes, mock no. > > >> Part of writing good basic data structures is not adding needless > >> complication from featuritis and not penalizing 99.99% of access to > >> satify a .01% need better satisfied another way. > > > I would like to challenge your assertion that advancing ob_item > > instead of doing memmove during list_ass_slice would impact the > > performance of list accesses in any way. It would only slow down > > operations that add/insert items into the list by, and then only by a > > single conditional statement, and those add/insert operations are > > already O(N) to begin with. > > If you try writing a full patch, as I believe someone did, or at least a > prototype thereof, when the idea was discussed, you might have a better > idea of what the tradeoffs are and why it was rejected. > > For instance, when you append to a full list, it is resized. I believe > it is now doubled, but the policy has varied over the years. If you turn > list from essentially a stack to a deque, you complicate the resizing > policy and have to consider the addition of a shift policy. Do you split > the over-allocated fore and aft or increase the overallocation from > double to, say, triple? If the former, then for normal usage that never > uses the fore part, the over-allocation has been effectively reduced > from 2x to 1.5x (which is about what it once was, I believe). This means > more frequent resizings and copyings, which means slower operation for > most use cases. Adding extra usually wasted space is also a nuisance. >
It looks like most of the complication would be in list_resize. I'm gonna oversimplify a bit, but tell me if this is the gist of it. I would have ob_item continue to always refer to first element of the list, and then I'd have to introduce another variable to refer to the start of our allocated memory, ob_start_memory, whenever you do a realloc/free/malloc. I'd have a notion of fore_wastage, which would either be a variable I maintain or something that I just calculate as needed from the difference of ob_item and ob_start_memory. In deciding whether I want to give memory back to the memory manager, I just need to adjust my calculations to account for fore and aft wastage to see if it's time to do a shrink, and before shrinking, I do the memmove. On growth, I would just always do a memmove right away if there is fore_wastage, and then do the normal procedure for aft wastage. For the most common scenario of append, append, append, the only penalty is having to skip over fore_wastage logic by checking for fore_wastage == 0 or ob_item == ob_start_memory. For the scenario of several appends followed by several pops, I get the big win of only doing log 2 N memmoves instead of N as I shrink the list down to zero. If I start alternating between pops and appends, it's basically a wash...instead of doing the memmove on the pop, I do it on the next append. If I were to pop the top element and then prepend a new element, to be truly efficient, I'd want to use reserved right away, but for simplicity, I would probably not complicate list_ass_slice and just do the normal resize() dance, which would give me memmove in one direction followed immediately by a memmove in the other direction when I come back to list_ass_slice. (But it would all still be a wash, since I would have to do the same number of memmoves in the current implementation.) A lot of the essential complexity here seems to come from the fact that realloc() isn't a very robust abstraction. It seems to be expensive to tell it you want to shrink, and it also does not have an interface to tell it to give you a little growing room. On the other hand, the code within list_resize() actually provides a nice framework for amortizing memmoves exponentially. -- http://mail.python.org/mailman/listinfo/python-list