On Wed, Aug 13, 2025 at 12:42:33PM -0500, Eric Blake wrote:
> > But in the meantime of implementing, I have found a wired behavior of
> > `defn'. The document said:
> > > If name is a user-defined macro, the quoted definition is simply the 
> > > quoted expansion text. If, instead, there is only one name and it is a 
> > > builtin, the expansion is a special token, which points to the builtin’s 
> > > internal definition. This token is only meaningful as the second argument 
> > > to define (and pushdef), and is silently converted to an empty string in 
> > > most other contexts. Combining a builtin with anything else is not 
> > > supported; a warning is issued and the builtin is omitted from the final 
> > > expansion.
> 
> That text may be outdated for m4 1.6; part of the rework to allow
> faster shift($@) means that the parser can concatenate from more
> sources, including from defn of a builtin.

Well, that's the ultimate goal; but when I tested again today (in
preparation for releasing 1.4.21), I see that branch-1.6 is not quite
where I want it yet for catenating arbitrary text, but DOES do mostly
what I want for safety-sake:

$ src/m4
define(`a',1)dnl
define(`b', defn(`a')defn(`divnum'))dnl
m4:stdin:2: warning: define: cannot concatenate builtins
define(`c', defn(`a',`divnum'))dnl
m4:stdin:5: warning: define: cannot concatenate builtins
define(`d', defn(`divnum')defn(`a'))dnl
m4:stdin:7: warning: define: cannot concatenate builtins
define(`e', defn(`divnum',`a'))dnl
m4:stdin:9: warning: define: cannot concatenate builtins
define(`f',defn(`divnum'))dnl
dumpdef(`b',`c',`d',`e',`f')
b:      1
c:      1
d:      1
e:      1
f:      <divnum>

until I tried this:

$ src/m4
eval(defn(`divnum')+0)
m4: macro.c:1152: arg_len: Assertion `flatten' failed.
src/m4: internal error detected; please report this bug to <[email protected]>: 
Aborted

> 
> Ouch - you uncovered a genuine bug; the behavior is not matching the
> documentation (so one or the other, if not both, are wrong).  I'm
> trying to figure out how long that bug has been present...

So now I have a (different) bug in branch-1.6 than what I'm also
trying to fix in branch-1.4.  Ugh.

> 
> I found access to m4 1.4.13 built in 2009 and it still had the issue.
> But as you can also see:
> 
> define(`blah', `a'defn(`defn')`b')
> m4trace: -2- defn(`defn')
> m4trace: -1- define(`blah', `ab')
> 
> it looks like it's really a matter of whatever the parser encounters
> first: if it encounters literal text, then builtin functions are
> ignored with all further text still being used; if it encounters a
> builtin function first, then that is used and all further text is
> ignored.
>

My goal for m4 1.4.21: uniformly warn in any context that takes
builtin tokens (builtin, inder, define, pushdef) with the builtin
token flattened to the empty string at the time of warning regardless
of whether it was first or second in the concatenation; and uniformly
be silent in any other context.

> Ultimately, I _want_ m4 to be able to do stuff like:
> 
> # wrap(pre, macro, post)
> define(`wrap', `$1'defn(`$2')`$3')
> 
> and have that work for ANY macro (whether macro was builtin or
> user-defined), which implies that the desired correct behavior will be
> to construct `wrap' as an array of three sources { "text of $1",
> definition of $2, "text of $3" }.  In m4 1.4.x, where the definition
> MUST be either a single string or a single function pointer, you
> cannot usefully concatenate a builtin definition from $2 with other
> text; the alternatives are to concatenate an empty string instead
> (what the docs promised) or to noisily warn about the problem.  But m4
> 1.6 allows a definition of catenated contents (various implementations
> exist for that, whether you use escape sequences, or represent macro
> contents using wchar_t with special wchar_t values that can't be
> produced as normal characters for functions, and so forth).
> 
> But for 1.4.x, I'm most likely to change things to be noisy by default
> (any attempt to use both a builtin function and text in the same
> definition, regardless of which came first, is going to cause
> surprises if undiagnosed); and by adding a warning, I think it is
> unlikely to break backwards compatibility of any real script that may
> have been relying on getting the builtin's behavior with no trailing
> text to now get the trailing text and not the builtin function.

I also compared what BSD m4 does.  There, it looks like builtin macros
have a defining text of "__builtin_NAME", which the engine then
short-circuits any time it encounters a recognized magic string during
macro expansion.  Which leads to weird effects - the above test
repeated in BSD produces:

dumpdef(`b',`c',`d',`e',`f')
`b'     `1__builtin_divnum'
`c'     `__builtin_divnum1'
`d'     `__builtin_divnumdefn(a)'
`e'     `1__builtin_divnum'
`f'     `divnum'

so I do say I have to like GNU behavior better, but that anyone trying
to concatenate builtin macros is already in non-portable territory.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to