On Wed, Aug 13, 2025 at 02:37:32PM +0800, Xavier Wang wrote:
> Hi list,
> 
> After looked into the discuss 2 months ago between Nicolaos and Eric,
> I'm interesting to try to make a M4 implement in Rust to better
> understand m4 and for fun. A naive version is easy to made (I have
> both study the BSD implement and GNU implement together to make a
> better architecture). But it seems that both two implement can not
> well handle recursive `$@' in O(1) . So I start to study the implement
> of 1.6 version of GNU m4 and make a new implementation based on the
> idea of that version.
> 
> But in the meantime of implementing, I have found a wired behavior of
> `defn'. The document said:
> > If name is a user-defined macro, the quoted definition is simply the quoted 
> > expansion text. If, instead, there is only one name and it is a builtin, 
> > the expansion is a special token, which points to the builtin’s internal 
> > definition. This token is only meaningful as the second argument to define 
> > (and pushdef), and is silently converted to an empty string in most other 
> > contexts. Combining a builtin with anything else is not supported; a 
> > warning is issued and the builtin is omitted from the final expansion.

That text may be outdated for m4 1.6; part of the rework to allow
faster shift($@) means that the parser can concatenate from more
sources, including from defn of a builtin.

> 
> But this text:
> define(`foo',    defn(`defn')     bar)
> it should be same as:
> define(`foo', bar)

At first read, it is (supposed) to be the same as:

define(`foo',    `'     bar)

> as the defn(`defn') expands to empty, and the whitespaces after `,'
> will ignored.

Not all the whitespace, only the whitespace up until the point of the
first unquoted macro invocation.  Even if the macro expands to nothing
(which it doesn't), whitespace after that point is no longer ignored
by the m4 parser.

>  But I found it defined a wired macro:
> foo
> => foo
> foo()
> =>

Well, let's turn up the verbosity:

$ m4 -daeqt
define(`foo',   defn(`defn')   bar)
m4trace: -2- defn(`defn')
m4trace: -1- define(`foo', <defn>)

defn(`foo')
m4trace: -1- defn(`foo')

Ouch - you uncovered a genuine bug; the behavior is not matching the
documentation (so one or the other, if not both, are wrong).  I'm
trying to figure out how long that bug has been present...

I found access to m4 1.4.13 built in 2009 and it still had the issue.
But as you can also see:

define(`blah', `a'defn(`defn')`b')
m4trace: -2- defn(`defn')
m4trace: -1- define(`blah', `ab')

it looks like it's really a matter of whatever the parser encounters
first: if it encounters literal text, then builtin functions are
ignored with all further text still being used; if it encounters a
builtin function first, then that is used and all further text is
ignored.

> 
> I noticed the statement: "This token is only meaningful as the second
> argument to define (and pushdef)", So I tried without `define':
> define(`foo', `[$1]-[$2]')
> =>
> foo(   first   , defn(`defn')   second)
> =>[first   ]-[]
> 
> I think it should be expand as:
> => [first   ]-[second]
> or even:
> => [first   ]-[   second]
> 
> So which behavior is correct?

Ultimately, I _want_ m4 to be able to do stuff like:

# wrap(pre, macro, post)
define(`wrap', `$1'defn(`$2')`$3')

and have that work for ANY macro (whether macro was builtin or
user-defined), which implies that the desired correct behavior will be
to construct `wrap' as an array of three sources { "text of $1",
definition of $2, "text of $3" }.  In m4 1.4.x, where the definition
MUST be either a single string or a single function pointer, you
cannot usefully concatenate a builtin definition from $2 with other
text; the alternatives are to concatenate an empty string instead
(what the docs promised) or to noisily warn about the problem.  But m4
1.6 allows a definition of catenated contents (various implementations
exist for that, whether you use escape sequences, or represent macro
contents using wchar_t with special wchar_t values that can't be
produced as normal characters for functions, and so forth).

But for 1.4.x, I'm most likely to change things to be noisy by default
(any attempt to use both a builtin function and text in the same
definition, regardless of which came first, is going to cause
surprises if undiagnosed); and by adding a warning, I think it is
unlikely to break backwards compatibility of any real script that may
have been relying on getting the builtin's behavior with no trailing
text to now get the trailing text and not the builtin function.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to