Pádraig Brady <[email protected]> writes: > On 27/11/2025 21:32, Collin Funk wrote: >> Hi Pádraig, >> Your coreutils i18n page mentions that 'tac' needs work to handle >> multibyte characters [1]. Could you help me understand why it is listed >> there? >> My initial guess was that re_search does not work on multibyte >> characters since it compares bytes. However, that still works for UTF-8: >> $ printf '1д2д3д' | tac --separator='д' && printf '\n' >> 3д2д1д >> I guess if we want it to work on other character sets, we can use >> the >> fastmap in only unibyte locales or UTF-8. Does that sound correct? I am >> not too familiar with the GNU regex functions. >> Collin >> [1] https://www.pixelbeat.org/docs/coreutils_i18n/ > > Oh I was mistaken. > > I think I saw the --separator option and thought it would > have similar issues as the join -t option. > > I've removed it from the page.
Cool, thanks! I'll have a look at adding some multibyte tests. And testing with other character sets to see if the fastmap should be removed in those cases. Collin
