greg wrote:
João Valverde wrote:

What's lacking is an associative array that preserves ordering, doesn't require a hash function and has fast insertions and deletions in O(log(n)).

Careful here -- you can't get away from the need for
hashability just by using a tree. Even if you don't
need to actually hash the values, it's still important
that the criterion for ordering between objects doesn't
change while they're in the tree, otherwise they'll
be in the wrong place and won't be found by subsequent
lookups.

I'm aware :) Technically it's necessary to define a total ordering on the set of keys.

> I'm genuinely surprised to know
there are no data structures that efficiently support such a common need in Python.

Is it really all that common? If it truly were common,
there probably *would* be something for it in the
stdlib by now.
Obviously my experience differs, but those were my expectations.

What sort of things are you doing that you want such
a structure for? Maybe we can suggest a way of using
the existing data structures to achieve the same
goal.

To answer the question of what I need the BSTs for, without getting into too many boring details it is to merge and sort IP blocklists, that is, large datasets of ranges in the form of (IP address, IP address, string). Originally I was also serializing them in a binary format (but no more after a redesign). I kept the "merge and sort" part as a helper script, but that is considerably simpler to implement.

Please note that I'm happy with my code, it works well. I intended to implement it in C all along, even before trying Python. The python code was a side thing for testing/curiosity/fun. It prompted my original question but that was really about Python and the standard library itself, and I don't wish to waste anyone's time.

As an anecdotal data point (honestly not trying to raise the "Python is slow" strawman), I implemented the same algorithm in C and Python, using pyavl. Round numbers were 4 mins vs 4 seconds, against Python (plus pyavl). Even considering I'm a worse Python programmer than C programmer, it's a lot. I know many will probably think I tried to do "C in Python" but that's not the case, at least I don' t think so. Anyway like I said, not really relevant to this discussion.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to