On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> ints, or 176 bytes. Searching for membership in an inversion list is
> O(N log N) (binary search).  "Encoding the whole range" is a non-issue
> bordering on a joke: two ints, or 8 bytes.

[Clarification from a noncombatant] You meant O(log N).

I like the inversion list idea. But its speed is proportional to the
toothiness of the character class, and while I have good intuition for
what that means in 7-bit US-ASCII, I have no idea how bad it gets for
other languages. "Vowels"? "Capital letters"? Would anyone ever want
to select all Chinese characters with a particular radical?

That's just lookup. We should also consider other character class
operations: union, subtraction, intersection. They're pretty
straightforward and fast (O(N)) for inversion lists. (Yes, all these
operations can be postponed until lookup time, regardless of the
underlying represention, in which case the time of union(C1,C2) is
just the time of C1 + time of C2 + time of an 'or'.)

Reply via email to