On Wed, Feb 20, 2002 at 11:13:44PM +0000, Nicholas Clark wrote:
> Sorry to be off topic.
> 
> If I have a string of some length, containing characters between 0 and 255,
> I'm wondering there's a fast perl way to make a 64 byte long bit vector
> from it, with bits set for any character present in the original string.

I suspect 32 bytes is sufficient, no?

> If I can do this quickly, then when I'm trying lots of things to see if $a is
> a substring of $b, then I think I can optimise it.
> I should be able to rapidly reject $b as a candidate by logical operations on
> the two bit vectors in constant time whatever the length of the strings - if
> $b doesn't contain all the characters in $a somewhere, then $a can't possibly
> be a substring, so I don't need to bother trying index.

Short of coding in C, I can't think of a way to do this without a Perl
level loop, which would kill efficiency, I imagine.

The best I can come up with this early is something containing

  vec ord for split

> [yes, I'm still playing with Encode's compile script.
> No, it's not a waste of time - I've got it taking about 37% less time
> than it did 2 days ago. Much cheaper than throwing hardware at the world]

On behalf of anyone who has ever compiled Encode, thanks!

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net

Reply via email to