> On Jul 3, 2023, at 11:43 AM, Mattias Gaertner via fpc-pascal 
> <fpc-pascal@lists.freepascal.org> wrote:
> 
> There is a header byte.
> 
> It depends, if you want to check for invalid UTF-8 sequences.
> 
> From LazUTF8:
> 
> function UTF8CodepointSizeFast(p: PChar): integer;
> begin
>  case p^ of
>    #0..#191   : Result := 1;
>    #192..#223 : Result := 2;
>    #224..#239 : Result := 3;
>    #240..#247 : Result := 4;
>    else Result := 1; // An optimization + prevents compiler warning about 
> uninitialized Result.
>  end;
> end;

This is a header for the file? Does that mean the file itself must have uniform 
character sizes? I though the idea was to read the file one byte at a time but 
I don't understand how you would know if a 1 byte character (like ascii) was 
part of a 4 byte character or not.

Regards,
Ryan Joseph

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to