On Wed, Aug 13, 2025 at 12:42:33PM -0500, Eric Blake wrote: > > But in the meantime of implementing, I have found a wired behavior of > > `defn'. The document said: > > > If name is a user-defined macro, the quoted definition is simply the > > > quoted expansion text. If, instead, there is only one name and it is a > > > builtin, the expansion is a special token, which points to the builtin’s > > > internal definition. This token is only meaningful as the second argument > > > to define (and pushdef), and is silently converted to an empty string in > > > most other contexts. Combining a builtin with anything else is not > > > supported; a warning is issued and the builtin is omitted from the final > > > expansion. > > That text may be outdated for m4 1.6; part of the rework to allow > faster shift($@) means that the parser can concatenate from more > sources, including from defn of a builtin.
Well, that's the ultimate goal; but when I tested again today (in preparation for releasing 1.4.21), I see that branch-1.6 is not quite where I want it yet for catenating arbitrary text, but DOES do mostly what I want for safety-sake: $ src/m4 define(`a',1)dnl define(`b', defn(`a')defn(`divnum'))dnl m4:stdin:2: warning: define: cannot concatenate builtins define(`c', defn(`a',`divnum'))dnl m4:stdin:5: warning: define: cannot concatenate builtins define(`d', defn(`divnum')defn(`a'))dnl m4:stdin:7: warning: define: cannot concatenate builtins define(`e', defn(`divnum',`a'))dnl m4:stdin:9: warning: define: cannot concatenate builtins define(`f',defn(`divnum'))dnl dumpdef(`b',`c',`d',`e',`f') b: 1 c: 1 d: 1 e: 1 f: <divnum> until I tried this: $ src/m4 eval(defn(`divnum')+0) m4: macro.c:1152: arg_len: Assertion `flatten' failed. src/m4: internal error detected; please report this bug to <[email protected]>: Aborted > > Ouch - you uncovered a genuine bug; the behavior is not matching the > documentation (so one or the other, if not both, are wrong). I'm > trying to figure out how long that bug has been present... So now I have a (different) bug in branch-1.6 than what I'm also trying to fix in branch-1.4. Ugh. > > I found access to m4 1.4.13 built in 2009 and it still had the issue. > But as you can also see: > > define(`blah', `a'defn(`defn')`b') > m4trace: -2- defn(`defn') > m4trace: -1- define(`blah', `ab') > > it looks like it's really a matter of whatever the parser encounters > first: if it encounters literal text, then builtin functions are > ignored with all further text still being used; if it encounters a > builtin function first, then that is used and all further text is > ignored. > My goal for m4 1.4.21: uniformly warn in any context that takes builtin tokens (builtin, inder, define, pushdef) with the builtin token flattened to the empty string at the time of warning regardless of whether it was first or second in the concatenation; and uniformly be silent in any other context. > Ultimately, I _want_ m4 to be able to do stuff like: > > # wrap(pre, macro, post) > define(`wrap', `$1'defn(`$2')`$3') > > and have that work for ANY macro (whether macro was builtin or > user-defined), which implies that the desired correct behavior will be > to construct `wrap' as an array of three sources { "text of $1", > definition of $2, "text of $3" }. In m4 1.4.x, where the definition > MUST be either a single string or a single function pointer, you > cannot usefully concatenate a builtin definition from $2 with other > text; the alternatives are to concatenate an empty string instead > (what the docs promised) or to noisily warn about the problem. But m4 > 1.6 allows a definition of catenated contents (various implementations > exist for that, whether you use escape sequences, or represent macro > contents using wchar_t with special wchar_t values that can't be > produced as normal characters for functions, and so forth). > > But for 1.4.x, I'm most likely to change things to be noisy by default > (any attempt to use both a builtin function and text in the same > definition, regardless of which came first, is going to cause > surprises if undiagnosed); and by adding a warning, I think it is > unlikely to break backwards compatibility of any real script that may > have been relying on getting the builtin's behavior with no trailing > text to now get the trailing text and not the builtin function. I also compared what BSD m4 does. There, it looks like builtin macros have a defining text of "__builtin_NAME", which the engine then short-circuits any time it encounters a recognized magic string during macro expansion. Which leads to weird effects - the above test repeated in BSD produces: dumpdef(`b',`c',`d',`e',`f') `b' `1__builtin_divnum' `c' `__builtin_divnum1' `d' `__builtin_divnumdefn(a)' `e' `1__builtin_divnum' `f' `divnum' so I do say I have to like GNU behavior better, but that anyone trying to concatenate builtin macros is already in non-portable territory. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
