On Fri, Dec 15, 2023, at 7:08 PM, Jacob Bachmeyer wrote: > Zack Weinberg wrote: >> [...] >> Also, there’s a perl 2.14ism in one place (s///a) which I need >> to figure out how to make 2.6-compatible before it can land. ... >> + $q_channel =~ s/([^\x20-\x7e])/"\\x".sprintf("%02x", ord($1))/aeg; ... > If I am reading perlre correctly, you should be able to simply drop the > /a modifier because it has no effect on the pattern you have written, > since you are using an explicit character class and are *not* using the > /i modifier.
Thanks, you've made me realize that /a wasn't even what I wanted in the first place. What I thought /a would do is force s/// to act byte by byte -- or, in the terms of perlunitut, force the target string to be treated as a binary string. That might be clearer with a concrete example: $ perl -e '$_ = "\xE2\x88\x85"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";' \xe2\x88\x85 $ perl -e '$_ = "\N{EMPTY SET}"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";' \x2205 What change do I need to make to the second one-liner to make it also print \xe2\x88\x85? How do I express that in a way that is backward compatible all the way to 5.6.0? And finally, how do I ensure that there is absolutely nothing I can put in the initial assignment to $_ that will cause the rest of the one-liner to crash? For example over in the Python universe it's very easy to get Unicode conversion to crash: $ python3 -c 'print("\uDC00".encode("utf-8"))' Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed Given that having non-ASCII here in the first place is pretty unlikely, I am going to go ahead and land the patch with your suggested changes, but I'd still appreciate any further advice you have. zw