Python recipes: list mixin, improved timeit, etc
>So mixins are just a sub-class [pun intended] of sub-classing? > >I've just found this: > >[quote] >A mixin class is a parent class that is inherited from - but not as >a means of specialization. Typically, the mixin will export services to a >child class, but no semantics will be implied about the child "being a >kind of" the parent. >[end quote] > >from http://c2.com/cgi/wiki?MixIn > >Is that all they are? > >It is amazing how you can take the simplest concept, and by using >appropriate terminology, make it as confusing and opaque as you want... > >*wink* > "A mixin is an atomic unit in an object-oriented language that adds functionality to another class." - http://www.macromedia.com/support/documentation/en/flex/1/mixin/ mixin2.html#118542 The only experience I've had with mixins is in Python, where UserDict has a class DictMixin that defines the full dictionary interface from a minimal subset of dictionary methods. The Python docs don't define mixin. I just assumed this was a case where a programmer needed a name more descriptive than foo, so he called it mixin, and that stuck. :) - Connelly Barnes -- http://mail.python.org/mailman/listinfo/python-list
Python recipes: list mixin, improved timeit, etc
I added some recipes to the Python Cookbook: - listmixin Use ListMixin to create custom list classes from a small subset of list methods: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440656 - pytime Improves on timeit by allowing you to time a function directly (no need for confusing import and exec B.S.), and not requiring you to specify the number of iterations for the timing loop. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440657 - bitlist An example of listmixin: A memory compacted list of bits, uses 1 byte per every 8 elements. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440658 I hope you find these useful. You can find more Python stuff by myself (and other misc thoughts) at my blog (http://barnesc.blogspot.com/). - Connelly Barnes -- http://mail.python.org/mailman/listinfo/python-list
Builtin classes list, set, dict reimplemented via B-trees
Nifty, Tim Peters responded to my e-mail. I must've said something interesting. Cool, a PyCelebrity! >[barnesc at engr.orst.edu] >> ... >> I've gotten bored and went back to one of my other projects: >> reimplementing the Python builtin classes list(), set(), dict(), >> and frozenset() with balanced trees (specifically, counted B-trees >> stored in memory). >> >> In short, this allows list lookup, insertion, deletion in O(log(N)) >> time. It allows the set and dictionary types to be maintained in >> order, and insert/lookup/remove operations here take O(log(N)) time >> as well. Getting the k least or k greatest elements takes >> O(log(N)+k) time. > >Note that BTrees for Python have been part of ZODB for many years: > >Section 5.3, _BTrees Package_ >http://www.zope.org/Wikis/ZODB/FrontPage/guide/node6.html > >If you install ZODB, you can use its BTrees as in-memory data >structures without using any of the rest of ZODB. If you want to >persist them, great, that's what ZODB is for. > >Note that ZODB's are really B+ trees, so iterating over the smallest k >takes O(k) time. As an extension to Python's mapping protocol, the >keys/values/items/iterkeys/itervalues/iteritems methods also accept >optional lower and upper bounds on the keys to return. > >A gotcha: For scalability in multiprocess database apps, ZODB's >BTrees do not store their size. As a result, len(a_ZODB_BTree) takes >time linear in the number of elements. Thanks for the link! >> ... >> So my question is: are there any other *practical* applications of a >> B-tree based list/set/dict ? > >Yes. > >> In other words, is this module totally worth coding, > >That's a different question entirely ;-) > >> or is it just academic wankery and theoretical flim flam ? :) > >It's probably not a way to get rich quick. > Well, the Tim Peters PyCelebrity experience was fun, but not quite as exhilarating as advertised. I'm not sure what to say. Did I get my money's worth? There were no s, even a little sarcasm. I guess, overall, yeah it was cool, it was worthwhile, it was fun, man, it was a life-changing moment! Thanks for tuning in to PyCelebrity weekly. We're always glad to be part of your inexplicable existence in The Hostile And Indifferent Universe (Incorporated). Contributions come from hackers like you! Reporting live from Python-list, this is...Connelly Barnes. Next up: Startling photographs show that Guido van Rossum was KIDNAPPED BY ALIENS in 1990! Rumor has it that his superhuman language design skills are really due to neuroimplanted nanotube processors! -- http://mail.python.org/mailman/listinfo/python-list
Builtin classes list, set, dict reimplemented via B-trees
>barnesc at engr.orst.edu wrote: > > So my question is: are there any other *practical* applications of a > > B-tree based list/set/dict ? In other words, is this module totally > > worth coding, or is it just academic wankery and theoretical flim > > flam ? :) > >B-trees are specifically designed for disk storage. Seeking to a >page takes much longer than reading a page of several kilobytes. >B-trees read the keys for a many-way comparison in one seek. > >For in-memory comparison-based dictionaries, Red-Black trees, >AVL trees, 2-3 trees, or skip-lists, are a better choice. > > >-- >--Bryan - Memory Usage - Here overhead is compared to a C array of > 1 million PyObject *s. The B-tree is assumed to have a branch factor of ~1024. A B-tree has an average of 1.5x overhead (2x at worst, 1x at best, using malloc). This assumes that B-tree nodes contain a fixed size C array for storing values (which will be half-full on average), and that the child array is only allocated for internal nodes as needed. Any balanced binary tree with left and right child pointers has a 6x overhead (using malloc, measured via a gcc win32 test program), or with a free list, a 3x overhead (though memory will not be returned until the data structure's destructor is called). I haven't tried using PyObject_Malloc(), but it should be in between these two factors. A skip list has n values and n next pointers on the bottom level. In the ideal case (with respect to memory usage), we make the probability sufficiently small so that the higher levels take up a negligible amount of memory. Thus we have a 4x overhead (using malloc, measured via a gcc win32 test program), or with a free list, an overhead slightly greater than 2x (again, memory will only be returned when the destructor is called). I didn't test PyObject_Malloc(), but it should give an overhead between 2x and 4x. Thus, on average, a > 1 million element B-tree uses 25% less memory than any other balanced data structure which I am aware of, and 50% more memory than a raw C array. - Speed - I have no idea how B-trees compare to skip lists (the likely contender) in terms of speed. To do this, one would need two high performance C implementations. >From this, the Python performance can presumably be deduced (although caching causes unpredictable effects, the faster C algorithm should be faster in Python). You could start with optimized parallel C implementations, such as the ones by John-Mark Gurney: * http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html * http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html He indicates that the skip list is slower than the B-tree, but doesn't give any solid numbers. Of course you'd have to check his code and optimize it to death (Perhaps representing a sorted set of integers using each respective data structure would be a good test). Also note that my B-tree code currently has optimized special cases for block insertions, deletions, and retrievals, since this is easy (er, well, relatively speaking) to do in the B-tree case. In the skip list case, block retrievals are easy (though not as fast as B-tree block retrievals due to pointer indirection and cache problems). I have no idea how to do a fast block insert or delete on a skip list. -- Well, enough ivory tower silliness and academic blustering. Are there any *practical* applications for in-memory balanced data structures (e.g. skip list, AVL tree, RB tree, B tree) ? - Connelly Barnes -- http://mail.python.org/mailman/listinfo/python-list
Builtin classes list, set, dict reimplemented via B-trees
Hi again, Since my linear algebra library appears not to serve any practical need (I found cgkit, and that works better for me), I've gotten bored and went back to one of my other projects: reimplementing the Python builtin classes list(), set(), dict(), and frozenset() with balanced trees (specifically, counted B-trees stored in memory). In short, this allows list lookup, insertion, deletion in O(log(N)) time. It allows the set and dictionary types to be maintained in order, and insert/lookup/remove operations here take O(log(N)) time as well. Getting the k least or k greatest elements takes O(log(N)+k) time. I'm about 50% done with the implementation of this module. I thought I'd use this for Dijkstra's algorithm and some schedulers, but then I noticed that a binary heap can be retrofitted to have the DECREASE-KEY operation (see e.g. David Eppstein's priorityQueue). Thus my B-tree based classes aren't really necessary, since there are simpler heap implementations that do the same thing. So my question is: are there any other *practical* applications of a B-tree based list/set/dict ? In other words, is this module totally worth coding, or is it just academic wankery and theoretical flim flam ? :) Thanks for your comments/opinions/etc... - Connelly Barnes 'Y29ubmVsbHliYXJuZXNAeWFob28uY29t'.decode('base64') - Detailed overview of the bcollections module (from the docstring) - """ ... The bset() and bdict() classes improve upon the builtin set and dictionary types by maintaining the elements in sorted order. Generally, the b* classes are a factor of O(log(N)) slower[2] than the corresponding builtin classes, except for certain operations: - For a bset(), it takes O(log(N)) time to get the minimum, maximum, or ith element. Also, it takes O(log(N)+k) time to get the k first, k last, or any slice of k elements from the sorted set. - For a bdict(), dictionary keys can be retrieved in sorted order with the above time bounds. - For a blist(), items can be inserted and removed in O(log(N)) time. It takes O(log(N)+k) time to get, set, or delete a slice of k elements. ... blist - List implemented via B-tree. bset - Set implemented via B-tree. Additional methods: - s.min()- Minimum element - s.max()- Maximum element - s.index(x) - Index of x in sorted list of elements. - s.next(x) - Least element which is > x (x need not be in the set). - s.previous(x) - Greatest element which is < x (x need not be in the set). - s[i] - Get element i from sorted list. - s[i:j] - Get slice from sorted list. bfrozenset - Immutable set implemented via B-tree. bdict - Dict implemented via B-tree. Additional methods: - a.min()- Minimum key - a.max()- Maximum key - a.index(k) - Index of k in sorted list of keys. - s.next(x) - Least key which is > x (x need not be an existing key). - s.previous(x) - Greatest key which is < x (x need not be an existing key). - a.get_key_at(i)- Get key at index i from sorted list of keys. - a.get_key_at(i, j) - Get slice from sorted key list. """ -- http://mail.python.org/mailman/listinfo/python-list
Python linear algebra module -- requesting comments on interface
Thanks for your commentary. The module will be public domain. I fixed the broken link (epydoc was inserting backslashes in URLs), changed the default arguments to None as you suggested, and added a randfunc=None and randargs=() default argument for random_matrix() and random_vector() (the matrix is populated with randfunc(*randargs) entries if randfunc is not None). - Connelly Barnes E-mail address: 'Y29ubmVsbHliYXJuZXNAeWFob28uY29t\n'. decode('base64') >C. Barnes wrote: >> Hi, I'm in the process of writing a Python linear >> algebra module. >> >> The current targeted interface is: >> http://oregonstate.edu/~barnesc/temp/linalg/ > >Is this going to become free software. If yes, what license >will you use? > > >So my suggestions: > >In cases like these ones: > > random_matrix(m, n=-1) > zero_matrix(m, n=-1) > >.. I think it's better to set the default value to "None" >instead of a number: > > random_matrix(m, n=None) > zero_matrix(m, n=None) > >IMHO, this is more intuitive and more "pythonic". > >I also suggest to make the "random function" choosable: > > random_matrix(m, n=None, randfunc=random.random) > random_vector(n, randfunc=random.random) > >This way it's more easy for those who want another range >of numbers, or want another kind of distribution of the >random numbers. > > >At the top of your documentation, there is a link "overview", >which is broken: > > See _overview_ for a quick start. > > >Greets, > > Volker > >-- >Volker Grabsch -- http://mail.python.org/mailman/listinfo/python-list