Pádraig Brady <[email protected]> writes:

> On 27/11/2025 21:32, Collin Funk wrote:
>> Hi Pádraig,
>> Your coreutils i18n page mentions that 'tac' needs work to handle
>> multibyte characters [1]. Could you help me understand why it is listed
>> there?
>> My initial guess was that re_search does not work on multibyte
>> characters since it compares bytes. However, that still works for UTF-8:
>>     $ printf '1д2д3д' | tac --separator='д' && printf '\n'
>>     3д2д1д
>> I guess if we want it to work on other character sets, we can use
>> the
>> fastmap in only unibyte locales or UTF-8. Does that sound correct? I am
>> not too familiar with the GNU regex functions.
>> Collin
>> [1] https://www.pixelbeat.org/docs/coreutils_i18n/
>
> Oh I was mistaken.
>
> I think I saw the --separator option and thought it would
> have similar issues as the join -t option.
>
> I've removed it from the page.

Cool, thanks!

I'll have a look at adding some multibyte tests. And testing with other
character sets to see if the fastmap should be removed in those cases.

Collin

Reply via email to