Python recipes: list mixin, improved timeit, etc

2005-10-07 Thread barnesc

>So mixins are just a sub-class [pun intended] of sub-classing?
>
>I've just found this:
>
>[quote]
>A mixin class is a parent class that is inherited from - but not as
>a means of specialization. Typically, the mixin will export services to a
>child class, but no semantics will be implied about the child "being a
>kind of" the parent.
>[end quote]
>
>from http://c2.com/cgi/wiki?MixIn
>
>Is that all they are?
>
>It is amazing how you can take the simplest concept, and by using
>appropriate terminology, make it as confusing and opaque as you want...
>
>*wink*
>

"A mixin is an atomic unit in an object-oriented language that adds
functionality to another class."

 - http://www.macromedia.com/support/documentation/en/flex/1/mixin/
   mixin2.html#118542

The only experience I've had with mixins is in Python, where
UserDict has a class DictMixin that defines the full dictionary
interface from a minimal subset of dictionary methods.

The Python docs don't define mixin.  I just assumed this was a case
where a programmer needed a name more descriptive than foo, so he
called it mixin, and that stuck.  :)

 - Connelly Barnes
-- 
http://mail.python.org/mailman/listinfo/python-list


Python recipes: list mixin, improved timeit, etc

2005-10-06 Thread barnesc

I added some recipes to the Python Cookbook:

 - listmixin

   Use ListMixin to create custom list classes from a small subset of
   list methods:

   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440656

 - pytime

   Improves on timeit by allowing you to time a function directly
   (no need for confusing import and exec B.S.), and not requiring
   you to specify the number of iterations for the timing loop.

   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440657

 - bitlist

   An example of listmixin: A memory compacted list of bits, uses
   1 byte per every 8 elements.

   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440658


I hope you find these useful.

You can find more Python stuff by myself (and other misc thoughts) at
my blog (http://barnesc.blogspot.com/).

 - Connelly Barnes
-- 
http://mail.python.org/mailman/listinfo/python-list


Builtin classes list, set, dict reimplemented via B-trees

2005-09-14 Thread barnesc

Nifty, Tim Peters responded to my e-mail.  I must've said something
interesting.  Cool, a PyCelebrity!

>[barnesc at engr.orst.edu]
>> ...
>> I've gotten bored and went back to one of my other projects:
>> reimplementing the Python builtin classes list(), set(), dict(),
>> and frozenset() with balanced trees (specifically, counted B-trees
>> stored in memory).
>>
>> In short, this allows list lookup, insertion, deletion in O(log(N))
>> time.  It allows the set and dictionary types to be maintained in
>> order, and insert/lookup/remove operations here take O(log(N)) time
>> as well.  Getting the k least or k greatest elements takes
>> O(log(N)+k) time.
>
>Note that BTrees for Python have been part of ZODB for many years:
>
>Section 5.3, _BTrees Package_
>http://www.zope.org/Wikis/ZODB/FrontPage/guide/node6.html
>
>If you install ZODB, you can use its BTrees as in-memory data
>structures without using any of the rest of ZODB.  If you want to
>persist them, great, that's what ZODB is for.
>
>Note that ZODB's are really B+ trees, so iterating over the smallest k
>takes O(k) time.  As an extension to Python's mapping protocol, the
>keys/values/items/iterkeys/itervalues/iteritems methods also accept
>optional lower and upper bounds on the keys to return.
>
>A gotcha:  For scalability in multiprocess database apps, ZODB's
>BTrees do not store their size.  As a result, len(a_ZODB_BTree) takes
>time linear in the number of elements.

Thanks for the link!

>> ...
>> So my question is: are there any other *practical* applications of a
>> B-tree based list/set/dict ?
>
>Yes.
>
>> In other words, is this module totally worth coding,
>
>That's a different question entirely ;-)
>
>> or is it just academic wankery and theoretical flim flam ? :)
>
>It's probably not a way to get rich quick.
>

Well, the Tim Peters PyCelebrity experience was fun, but not quite
as exhilarating as advertised.  I'm not sure what to say.  Did I get
my money's worth?  There were no s, even a little sarcasm.  I
guess, overall, yeah it was cool, it was worthwhile, it was fun, man,
it was a life-changing moment!

Thanks for tuning in to PyCelebrity weekly.  We're always glad to
be part of your inexplicable existence in The Hostile And Indifferent
Universe (Incorporated).  Contributions come from hackers like you!

 Reporting live from Python-list, this is...Connelly Barnes.

 Next up: Startling photographs show that Guido van Rossum was
  KIDNAPPED BY ALIENS in 1990!  Rumor has it that his
  superhuman language design skills are really due to
  neuroimplanted nanotube processors!
-- 
http://mail.python.org/mailman/listinfo/python-list


Builtin classes list, set, dict reimplemented via B-trees

2005-09-14 Thread barnesc

>barnesc at engr.orst.edu wrote:
> > So my question is: are there any other *practical* applications of a
> > B-tree based list/set/dict ?  In other words, is this module totally
> > worth coding, or is it just academic wankery and theoretical flim
> > flam ? :)
>
>B-trees are specifically designed for disk storage. Seeking to a
>page takes much longer than reading a page of several kilobytes.
>B-trees read the keys for a many-way comparison in one seek.
>
>For in-memory comparison-based dictionaries, Red-Black trees,
>AVL trees, 2-3 trees, or skip-lists, are a better choice.
>
>
>--
>--Bryan

-
Memory Usage
-

Here overhead is compared to a C array of > 1 million PyObject *s.

The B-tree is assumed to have a branch factor of ~1024.

A B-tree has an average of 1.5x overhead (2x at worst, 1x at best,
  using malloc).  This assumes that B-tree nodes contain a fixed
  size C array for storing values (which will be half-full on
  average), and that the child array is only allocated for internal
  nodes as needed.
Any balanced binary tree with left and right child pointers has
  a 6x overhead (using malloc, measured via a gcc win32 test program),
  or with a free list, a 3x overhead (though memory will not
  be returned until the data structure's destructor is called).
  I haven't tried using PyObject_Malloc(), but it should be in
  between these two factors.
A skip list has n values and n next pointers on the bottom level.
  In the ideal case (with respect to memory usage), we make the
  probability sufficiently small so that the higher levels take up a
  negligible amount of memory.  Thus we have a 4x overhead (using
  malloc, measured via a gcc win32 test program), or with a free list,
  an overhead slightly greater than 2x (again, memory will only be
  returned when the destructor is called).  I didn't test
  PyObject_Malloc(), but it should give an overhead between 2x and 4x.

Thus, on average, a > 1 million element B-tree uses 25% less memory
than any other balanced data structure which I am aware of, and 50%
more memory than a raw C array.


-
Speed
-

I have no idea how B-trees compare to skip lists (the likely
contender) in terms of speed.

To do this, one would need two high performance C implementations.

>From this, the Python performance can presumably be deduced (although
caching causes unpredictable effects, the faster C algorithm should be
faster in Python).

You could start with optimized parallel C implementations, such as
the ones by John-Mark Gurney:

 * http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html
 * http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html

He indicates that the skip list is slower than the B-tree, but doesn't
give any solid numbers.  Of course you'd have to check his code and
optimize it to death (Perhaps representing a sorted set of integers
using each respective data structure would be a good test).

Also note that my B-tree code currently has optimized special cases
for block insertions, deletions, and retrievals, since this is easy
(er, well, relatively speaking) to do in the B-tree case.  In the skip
list case, block retrievals are easy (though not as fast as B-tree
block retrievals due to pointer indirection and cache problems).  I
have no idea how to do a fast block insert or delete on a skip list.

--

Well, enough ivory tower silliness and academic blustering.



Are there any *practical* applications for in-memory balanced data
structures (e.g. skip list, AVL tree, RB tree, B tree) ?



 - Connelly Barnes
-- 
http://mail.python.org/mailman/listinfo/python-list


Builtin classes list, set, dict reimplemented via B-trees

2005-09-14 Thread barnesc

Hi again,

Since my linear algebra library appears not to serve any practical
need (I found cgkit, and that works better for me), I've gotten bored
and went back to one of my other projects: reimplementing the Python
builtin classes list(), set(), dict(), and frozenset() with balanced
trees (specifically, counted B-trees stored in memory).

In short, this allows list lookup, insertion, deletion in O(log(N))
time.  It allows the set and dictionary types to be maintained in
order, and insert/lookup/remove operations here take O(log(N)) time
as well.  Getting the k least or k greatest elements takes
O(log(N)+k) time.

I'm about 50% done with the implementation of this module.

I thought I'd use this for Dijkstra's algorithm and some schedulers,
but then I noticed that a binary heap can be retrofitted to have the
DECREASE-KEY operation (see e.g. David Eppstein's priorityQueue).
Thus my B-tree based classes aren't really necessary, since there are
simpler heap implementations that do the same thing.

So my question is: are there any other *practical* applications of a
B-tree based list/set/dict ?  In other words, is this module totally
worth coding, or is it just academic wankery and theoretical flim
flam ? :)

Thanks for your comments/opinions/etc...

 - Connelly Barnes
   'Y29ubmVsbHliYXJuZXNAeWFob28uY29t'.decode('base64')


-
Detailed overview of the bcollections module (from the docstring)
-

"""
...

The bset() and bdict() classes improve upon the builtin set and
dictionary types by maintaining the elements in sorted order.

Generally, the b* classes are a factor of O(log(N)) slower[2] than the
corresponding builtin classes, except for certain operations:

 - For a bset(), it takes O(log(N)) time to get the minimum, maximum,
   or ith element.  Also, it takes O(log(N)+k) time to get the
   k first, k last, or any slice of k elements from the sorted set.
 - For a bdict(), dictionary keys can be retrieved in sorted order
   with the above time bounds.
 - For a blist(), items can be inserted and removed in O(log(N)) time.
   It takes O(log(N)+k) time to get, set, or delete a slice of k
   elements.

...

blist  - List implemented via B-tree.
bset   - Set implemented via B-tree.
 Additional methods:
   - s.min()- Minimum element
   - s.max()- Maximum element
   - s.index(x) - Index of x in sorted list of
  elements.
   - s.next(x)  - Least element which is > x
  (x need not be in the set).
   - s.previous(x)  - Greatest element which is < x
  (x need not be in the set).
   - s[i]   - Get element i from sorted list.
   - s[i:j] - Get slice from sorted list.
bfrozenset - Immutable set implemented via B-tree.
bdict  - Dict implemented via B-tree.
 Additional methods:
   - a.min()- Minimum key
   - a.max()- Maximum key
   - a.index(k) - Index of k in sorted list of
  keys.
   - s.next(x)  - Least key which is > x
  (x need not be an existing key).
   - s.previous(x)  - Greatest key which is < x
  (x need not be an existing key).
   - a.get_key_at(i)- Get key at index i from sorted
  list of keys.
   - a.get_key_at(i, j) - Get slice from sorted key list.
"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Python linear algebra module -- requesting comments on interface

2005-09-09 Thread barnesc

Thanks for your commentary.

The module will be public domain.

I fixed the broken link (epydoc was inserting backslashes in URLs),
changed the default arguments to None as you suggested, and added
a randfunc=None and randargs=() default argument for random_matrix()
and random_vector() (the matrix is populated with randfunc(*randargs)
entries if randfunc is not None).

 - Connelly Barnes
   E-mail address: 'Y29ubmVsbHliYXJuZXNAeWFob28uY29t\n'.
   decode('base64')

>C. Barnes wrote:
>> Hi, I'm in the process of writing a Python linear
>> algebra module.
>>
>> The current targeted interface is:
>>   http://oregonstate.edu/~barnesc/temp/linalg/
>
>Is this going to become free software. If yes, what license
>will you use?
>
>
>So my suggestions:
>
>In cases like these ones:
>
>   random_matrix(m, n=-1)
>   zero_matrix(m, n=-1)
>
>.. I think it's better to set the default value to "None"
>instead of a number:
>
>   random_matrix(m, n=None)
>   zero_matrix(m, n=None)
>
>IMHO, this is more intuitive and more "pythonic".
>
>I also suggest to make the "random function" choosable:
>
>   random_matrix(m, n=None, randfunc=random.random)
>   random_vector(n, randfunc=random.random)
>
>This way it's more easy for those who want another range
>of numbers, or want another kind of distribution of the
>random numbers.
>
>
>At the top of your documentation, there is a link "overview",
>which is broken:
>
>   See _overview_ for a quick start.
>
>
>Greets,
>
>   Volker
>
>--
>Volker Grabsch
-- 
http://mail.python.org/mailman/listinfo/python-list