On 2018-08-17, Phil Pennock via Exim-dev <exim-dev@exim.org> wrote: > Anyone have strong feelings on how Exim should handle UTF-8 with > operators such as ${length_1:STR} ? > > Document that the current operators work on bytes
Yeah stay with treating srings as nul terminated arrays of octets. The same unit the RFCs use to define email and SMTP. > and add ulength_1 for being UTF-8 aware? Would also need utf8-aware also substr and strlen. is it going to count code-points or glyphs? > Look at the top-bit being set and assume UTF-8, or > will that break too much with all the places which are still ISO-8859-1? Just looking at that bit won't tell you enough to count code-points or glyphs. you need to then group the octets together, and you need to do something when you hit a non-valid octet.... parts of ${utf8clean can probably be re-used. "${lc" "${uc" and "${if eqi" need consideraton too -- ت -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##