On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote: > On 8/31/23 11:35 AM, Eric Blake wrote: > > In today's Austin Group call, we discussed the fact that printf(1) has > > mandated behavior for %b (escape sequence processing similar to XSI > > echo) that will eventually conflict with C2x's desire to introduce %b > > to printf(3) (to produce 0b000... binary literals). > > > > For POSIX Issue 8, we plan to mark the current semantics of %b in > > printf(1) as obsolescent (it would continue to work, because Issue 8 > > targets C17 where there is no conflict with C2x), but with a Future > > Directions note that for Issue 9, we could remove %b entirely, or > > (more likely) make %b output binary literals just like C. > > I doubt I'd ever remove %b, even in posix mode -- it's already been there > for 25 years.
But the longer that printf(3) supports "%b" to output binary values, the more surprised new shell coders will be that printf(1) %b does not behave the same. What's more, other languages have already started using %b for binary output (python, for example), so it is definitely gaining in mindshare. That said, I also agree with your desire to keep the functionality in place. The current POSIX says that %b was added so that on a non-XSI system, you could do: my_echo() { printf %b\\n "$*" } and then call my_echo everywhere that a script used to depend on XSI echo (perhaps by 'alias echo=my_echo' with aliases enabled), for a much quicker portability hack than a tedious search-and-replace of every echo call that requires manual inspection of its arguments for translation of any XSI escape sequences into printf format specifications. In particular, code like [var='...\c'; echo "$var"] cannot be changed to use printf by a mere s/echo/printf %s\\n/. Thus, when printf was invented and standardized for the shell, the solution at the time was to create [printf %b\\n "$var"] as a drop-in replacement for XSI [echo "$var"], even for platforms without XSI echo. Nowadays, I personally have not seen very many scripts like this in the wild (for example, autoconf scripts prefer to directly use printf, rather than trying to shoe-horn behavior into echo). But assuming such legacy scripts still exist, it is still much easier to rewrite just the my_echo wrapper to now use %#s\\n instead of %b\\n, than it would be to find every callsite of my_echo. Bash already has shopt -s xpg_echo; I could easily see this being a case where you toggle between the old or new behavior of %b (while keeping %#s always at the old behavior) by either this or some other shopt in bash, so that newer script writers that want binary output for %b can do so with one setting, while scripts that must continue to run under old semantics can likewise do so. > > > But that > > raises the question of whether the escape-sequence processing > > semantics of %b should still remain available under the standard, > > under some other spelling, since relying on XSI echo is still not > > portable. > > > > One of the observations made in the meeting was that currently, both > > the POSIX spec for printf(1) as seen at [1], and the POSIX and C > > standard (including the upcoming C2x standard) for printf(3) as seen > > at [3] state that both the ' and # flag modifiers are currently > > undefined when applied to %s. > > Neither one is a very good choice, but `#' is the better one. It at least > has a passing resemblence to the desired functionality. Indeed, that's what the Austin Group settled on today after I first wrote my initial email, and what I wrote up in a patch to GNU Coreutils (https://debbugs.gnu.org/65659) > > Why not standardize another character, like %B? I suppose I'll have to look > at the etherpad for the discussion. I think that came up on the mailing > list, but I can't remember the details. Yes, https://austingroupbugs.net/view.php?id=1771 has a good discussion of the various ideas. %B is out for the same reason as %b: although the current C2x draft wording says that %<capital> is reserved for implementation use, other than [AEFGX] which already have a history of use by C (as it was, when C99 added %A, that caused problems for some folks), it goes on to _highly_ encourage any implementation that adds %b for "0b0" binary output also add %B for "0B0" binary output (to match the x/X dichotomy). Burning %B to retain the old behavior while repurposing %b to output lower-case binary values is thus a non-starter, while burning %#s (which C says is undefined) felt nicer. The Austin Group also felt that standardizing bash's behavior of %q/%Q for outputting quoted text, while too late for Issue 8, has a good chance of success, even though C says %q is reserved for standardization by C. Our reasoning there is that lots of libc over the years have used %qi as a synonym for %lli, and C would be foolish to burn %q for anything that does not match those semantics at the C language level; which means it will likely never be claimed by C and thus free for use by shell in the way that bash has already done. > > > Is there > > any interest in a patch to coreutils or bash that would add such a > > synonym, to make it easier to leave that functionality in place for > > POSIX Issue 9 even when %b is repurposed to align with C2x? > > It's maybe a two or three line change at most. Yeah, creating an alias proved to be pretty simple in coreutils; I spent more time documenting it than I did writing the code changes. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org