Re: bug#65659: RFC: changing printf(1) behavior on %b

Eric Blake Thu, 31 Aug 2023 13:02:51 -0700

On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote:
> On 8/31/23 11:35 AM, Eric Blake wrote:
> > In today's Austin Group call, we discussed the fact that printf(1) has
> > mandated behavior for %b (escape sequence processing similar to XSI
> > echo) that will eventually conflict with C2x's desire to introduce %b
> > to printf(3) (to produce 0b000... binary literals).
> > 
> > For POSIX Issue 8, we plan to mark the current semantics of %b in
> > printf(1) as obsolescent (it would continue to work, because Issue 8
> > targets C17 where there is no conflict with C2x), but with a Future
> > Directions note that for Issue 9, we could remove %b entirely, or
> > (more likely) make %b output binary literals just like C.
> 
> I doubt I'd ever remove %b, even in posix mode -- it's already been there
> for 25 years.


But the longer that printf(3) supports "%b" to output binary values,
the more surprised new shell coders will be that printf(1) %b does not
behave the same.  What's more, other languages have already started
using %b for binary output (python, for example), so it is definitely
gaining in mindshare.

That said, I also agree with your desire to keep the functionality in
place.  The current POSIX says that %b was added so that on a non-XSI
system, you could do:

my_echo() {
  printf %b\\n "$*"
}

and then call my_echo everywhere that a script used to depend on XSI
echo (perhaps by 'alias echo=my_echo' with aliases enabled), for a
much quicker portability hack than a tedious search-and-replace of
every echo call that requires manual inspection of its arguments for
translation of any XSI escape sequences into printf format
specifications.  In particular, code like [var='...\c'; echo "$var"]
cannot be changed to use printf by a mere s/echo/printf %s\\n/.  Thus,
when printf was invented and standardized for the shell, the solution
at the time was to create [printf %b\\n "$var"] as a drop-in
replacement for XSI [echo "$var"], even for platforms without XSI
echo.

Nowadays, I personally have not seen very many scripts like this in
the wild (for example, autoconf scripts prefer to directly use printf,
rather than trying to shoe-horn behavior into echo).  But assuming
such legacy scripts still exist, it is still much easier to rewrite
just the my_echo wrapper to now use %#s\\n instead of %b\\n, than it
would be to find every callsite of my_echo.

Bash already has shopt -s xpg_echo; I could easily see this being a
case where you toggle between the old or new behavior of %b (while
keeping %#s always at the old behavior) by either this or some other
shopt in bash, so that newer script writers that want binary output
for %b can do so with one setting, while scripts that must continue to
run under old semantics can likewise do so.

> 
> > But that
> > raises the question of whether the escape-sequence processing
> > semantics of %b should still remain available under the standard,
> > under some other spelling, since relying on XSI echo is still not
> > portable.
> > 
> > One of the observations made in the meeting was that currently, both
> > the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> > standard (including the upcoming C2x standard) for printf(3) as seen
> > at [3] state that both the ' and # flag modifiers are currently
> > undefined when applied to %s.
> 
> Neither one is a very good choice, but `#' is the better one. It at least
> has a passing resemblence to the desired functionality.

Indeed, that's what the Austin Group settled on today after I first
wrote my initial email, and what I wrote up in a patch to GNU
Coreutils (https://debbugs.gnu.org/65659)

> 
> Why not standardize another character, like %B? I suppose I'll have to look
> at the etherpad for the discussion. I think that came up on the mailing
> list, but I can't remember the details.

Yes, https://austingroupbugs.net/view.php?id=1771 has a good
discussion of the various ideas.

%B is out for the same reason as %b: although the current C2x draft
wording says that %<capital> is reserved for implementation use, other
than [AEFGX] which already have a history of use by C (as it was, when
C99 added %A, that caused problems for some folks), it goes on to
_highly_ encourage any implementation that adds %b for "0b0" binary
output also add %B for "0B0" binary output (to match the x/X
dichotomy).  Burning %B to retain the old behavior while repurposing
%b to output lower-case binary values is thus a non-starter, while
burning %#s (which C says is undefined) felt nicer.

The Austin Group also felt that standardizing bash's behavior of %q/%Q
for outputting quoted text, while too late for Issue 8, has a good
chance of success, even though C says %q is reserved for
standardization by C. Our reasoning there is that lots of libc over
the years have used %qi as a synonym for %lli, and C would be foolish
to burn %q for anything that does not match those semantics at the C
language level; which means it will likely never be claimed by C and
thus free for use by shell in the way that bash has already done.

> 
> > Is there
> > any interest in a patch to coreutils or bash that would add such a
> > synonym, to make it easier to leave that functionality in place for
> > POSIX Issue 9 even when %b is repurposed to align with C2x?
> 
> It's maybe a two or three line change at most.

Yeah, creating an alias proved to be pretty simple in coreutils; I
spent more time documenting it than I did writing the code changes.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: bug#65659: RFC: changing printf(1) behavior on %b

Reply via email to