Re: [exim-dev] UTF-8 and Exim string operations

Jasen Betts via Exim-dev Fri, 17 Aug 2018 03:40:43 -0700

On 2018-08-17, Phil Pennock via Exim-dev <exim-dev@exim.org> wrote:
> Anyone have strong feelings on how Exim should handle UTF-8 with
> operators such as ${length_1:STR} ?
>
> Document that the current operators work on bytes


Yeah stay with treating srings as nul terminated arrays of octets.
The same unit the RFCs use to define email and SMTP.

> and add ulength_1 for being UTF-8 aware?

Would also need utf8-aware also substr and strlen. 
is it going to count code-points or glyphs? 

> Look at the top-bit being set and assume UTF-8, or
> will that break too much with all the places which are still ISO-8859-1?

Just looking at that bit won't tell you enough to count code-points or
glyphs. you need to then group the octets together, and you need to do
something when you hit a non-valid octet....
parts of ${utf8clean can probably be re-used.

"${lc" "${uc" and "${if eqi" need consideraton too

-- 
     ت

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim 
details at http://www.exim.org/ ##

Re: [exim-dev] UTF-8 and Exim string operations

Reply via email to