Re: on parrot strings

Jarkko Hietaniemi Fri, 18 Jan 2002 14:08:07 -0800

On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote:
> On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> > ints, or 176 bytes. Searching for membership in an inversion list is
> > O(N log N) (binary search).  "Encoding the whole range" is a non-issue
> > bordering on a joke: two ints, or 8 bytes.
> 
> [Clarification from a noncombatant] You meant O(log N).
> 
> I like the inversion list idea. But its speed is proportional to the
> toothiness of the character class, and while I have good intuition for
> what that means in 7-bit US-ASCII, I have no idea how bad it gets for
> other languages. "Vowels"? "Capital letters"? Would anyone ever want
> to select all Chinese characters with a particular radical?
> 
> That's just lookup. We should also consider other character class
> operations: union, subtraction, intersection. They're pretty


Complement of an inversion list is neat: insert 0 at the beginning
(and append max+1), unless there already is one, in which case delete
the 0 (and shift the list and delete the max+1).  Again, O(N). 
(One could of course have a bit for a 'negative character class',
but that would in turn complicate the computations.)

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Re: on parrot strings

Reply via email to