"tr --complement --squeeze-repeats ..." makes sure that the replaced characters only appear once (that it doesn't immediately repeat). Say you have something like " " (two spaces) or "?$|" (three characters) which will be replaced by just an underscore.
In the case of: "ASCII text" what should come out of it is: "ASCII_text" not: "ASCII_text_" no underscore at the end. That is the question I have. I use such constructs as: "[A-Za-z0-9.]" to make explicit to myself and other people what I mean. I work in corpora research dealing with text based various alphabets not just in ASCII so I avoid any kinds of linguistic/cultural shortcuts and abbreviations. lbrtchx On 12/11/23, to...@tuxteam.de <to...@tuxteam.de> wrote: > On Mon, Dec 11, 2023 at 08:04:06AM +0000, Albretch Mueller wrote: >> On 12/11/23, Greg Wooledge <g...@wooledge.org> wrote: >> > Please tell us ... >> >> OK, here is what I did as a t-table > > [...] > > Your style is confusing, to say the least. Why not play with minimal > examples and work your way up from that? > >> the two strings are not the same length even though your are just >> replacing ASCII characters, why did: >> echo "${ftype}" | tr --complement --squeeze-repeats '[A-Za-z0-9.]' '_' >> place a character at the end? > > Two things stick out: > > 1. with --squeeze-repeats you are challenging tr to output less > characters than the input has: > > trotzki:~$ echo -n "this is a # string ###" | tr -cs 'a-z' '_' > => this_is_a_string_ > > (I allowed myself to simplify things a bit) See? tr is squeezing > repeats (repeated matches), the space-plus-three-hashes at the > end gets squeezed to just one _, thus changing the length. > If your strings contain more than one non-alphanumeric (something > I don't feel like even trying a guess at), this is bound to happen. > You ordered it. > > 2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you > think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and > ']'. I guess you want to say 'A-Za-z0-9.' > > 3. As a convenience, tr has char classes. Perhaps [:alnum:] is for > you. No idea whether this is a GNU extension > > 4. In case of doubt, read the man page :) > > Cheers > -- > t >