On Sat, Jan 22, 2022 at 12:41:56AM +0000, Colin Watson wrote:
>> Technically, UTF-8 validation can be done at a few gigabytes per second
>> per core:
>> 
>>   
>> https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/
>> 
>> but that is probably overkill. :-)
> Quite :-)

It struck me that it can probably be folded for free into the lexer.
If you add symbols for all invalid UTF-8 sequences, I believe it should just
go into the state machine. But I'm fine with those 20%; the perfect need not
be the enemy of the good here.

In general, I don't think I need to look at it again now, unless there are
any special questions. Thanks for taking care of this! Looking forward to
bookworm being faster (and of course sid before that), and then I'll happily
live with this on bullseye, knowing that it's transient.

/* Steinar */
-- 
Homepage: https://www.sesse.net/

Reply via email to