On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote: > On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a joke: two ints, or 8 bytes. > > [Clarification from a noncombatant] You meant O(log N). > > I like the inversion list idea. But its speed is proportional to the > toothiness of the character class, and while I have good intuition for > what that means in 7-bit US-ASCII, I have no idea how bad it gets for > other languages. "Vowels"? "Capital letters"? Would anyone ever want > to select all Chinese characters with a particular radical? > > That's just lookup. We should also consider other character class > operations: union, subtraction, intersection. They're pretty
Complement of an inversion list is neat: insert 0 at the beginning (and append max+1), unless there already is one, in which case delete the 0 (and shift the list and delete the max+1). Again, O(N). (One could of course have a bit for a 'negative character class', but that would in turn complicate the computations.) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen