Re: string encoding

nick Sat, 24 Mar 2001 11:37:47 -0800
Dan Sugalski <[EMAIL PROTECTED]> writes:
>> >    substr($foo, 233253, 14)
>> >
>> > is going to cost significantly more with variable sized characters than
>> > fixed sized ones.
>>
>>I don't believe so.
>
>Then you would be incorrect. To find the character at position 233253 in a 
>variable-length encoding requires scanning the string from the beginning, 
>and has a rather significant potential cost. You've got a test for every 
>character up to that point with a potential branch or two on each one. 
>You're guaranteed to blow the heck out of your processor's D-cache, since 
>you've just waded through between 200 and 800K of data that's essentially 
>meaningless for the operation in question.

If you are really doing that sort of processing then it would be better
to represent the data differently - say as a tree, or a list of 1000-ish char
blocks. That way you can find the block quickly and then do short-ish
search for actual chracter. (Like a text editor does.)

-- 
Nick Ing-Simmons
Re: string encoding

Reply via email to