Hi Branden,

G. Branden Robinson wrote on Mon, Jan 19, 2026 at 12:03:25PM -0600:
> At 2026-01-19T11:55:49+0100, Ingo Schwarze wrote:

> [...]
>> So, here is the second problem report.  For now, i'm lazily quoting
>> my preliminary commit message in the WIP port:
>> 
>>  ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 -----
>> commit 63d225ba7e1a90988645fbec0d2e599c4732bb22
>> Author: Ingo Schwarze <[email protected]>
>> Date:   Mon Jan 19 11:15:42 2026 +0100
>> 
>>     disable parallel builds for now
>>     
>>     Groff builds have a long history of blowing up in parallel mode,
>>     so let's first try to build at all before worrying about broken
>>     dependency rules and trying to get parallel building to work.

> We've fixed multiple bugs in this respect since groff 1.23.0, and builds
> routinely work for me using 10 to 12 cores in parallel.

I'm not trying to cast doubt on that - if i recall correctly, i fixed
various parallel build bugs in pre-1.23 times, an that is why i
said "long history".

>>     Right now, a parallel build dies instantly with:
>>     
>>     cp -f ./font/devpdf/SS.in font/devpdf/SS
>>     make: don't know how to make font/devpdf/symbolsl.afm
>>           (prerequisite of: font/devpdf/stamp)
>>     Stop in /usr/ports/pobj/groff-1.24.0rc1/groff-1.23.0.5077-7dcc8
>>     
>>     I hope to investigate later and report properly, this is clearly
>>     not a very useful report.  For now: as soon as i remove -j
>>     from the make flags, the instant crash goes away and the build
>>     progresses a bit further

> I saw a similar problem just the other day when using Debian's
> superannuated version of bmake.  Parallelism per se was not the problem.
> 
> https://lists.gnu.org/archive/html/groff/2026-01/msg00027.html

Hmm, having a closer look...

The top Makefile generated by ./configure in my build has:

  $(devpdf_builddir)/symbolsl.afm: \
    $(devpdf_srcdir)/symbolsl.afm.in
        $(AM_V_GEN)cp -f $(devpdf_srcdir)/symbolsl.afm.in $@

  devpdf_builddir = $(top_builddir)/font/devpdf
  top_builddir = .
  devpdf_srcdir = $(top_srcdir)/font/devpdf
  top_srcdir = .
  AM_V_GEN = $(am__v_GEN_$(V))
  V is undefined
  am__v_GEN_ = $(am__v_GEN_$(AM_DEFAULT_VERBOSITY))
  AM_DEFAULT_VERBOSITY = 1
  am__v_GEN_1 =

  substituting everything in:
  ./font/devpdf/symbolsl.afm: ./font/devpdf/symbolsl.afm.in
        cp -f ./font/devpdf/symbolsl.afm.in ./font/devpdf/symbolsl.afm

  Then we have:
  all: font/devpdf/stamp

  font/devpdf/stamp: $(devpdffontdata)
        $(AM_V_at)>$@

  devpdffontdata = ... font/devpdf/symbolsl.afm ...
  AM_V_at = $(am__v_at_$(V))
  am__v_at_ = $(am__v_at_$(AM_DEFAULT_VERBOSITY))
  am__v_at_1 =

  substituting everything in:
  font/devpdf/stamp: ... font/devpdf/symbolsl.afm ...
        > font/devpdf/stamp

So the problem is that font/devpdf/stamp requires font/devpdf/symbolsl.afm,
but there is no rule how to build font/devpdf/symbolsl.afm.
OpenBSD make(1) does *NOT* consider the rule to
build ./font/devpdf/symbolsl.afm because "font/devpdf/symbolsl.afm"
and "./font/devpdf/symbolsl.afm" are not the same string.

The reason it works anyway without -j is that the stamp rule
contains $(devpdffontdescriptions) in front of font/devpdf/symbolsl.afm,
and $(devpdffontdescriptions) starts
with $(devpdffont_descriptions_from_devps),
which starts with font/devpdf/S,
and there is this rule:

  $(devpdffont_descriptions_from_devps):
        $(AM_V_at)$(MKDIR_P) $(top_builddir)/font/devpdf
        ...

So if the build is not parallel, the order of dependencies enforces
that the dependencies of font/devpdf/S are resolved before the
dependencies of font/devpdf/symbolsl.afm, and doing so apparently
happens to make make(1) realize that "font" and "./font" are somehow
equivalent, but you cannot rely on that.

In a parallel build, the dependencies of font/devpdf/S
and font/devpdf/symbolsl.afm are apparently evaluated at the same
time, and at that time, the equivalence of "font" and "./font"
has not yet been noticed.

So the cleanest way out appears to be deciding on one particular
name for each target and then using that name consistently
throughout, rather than hoping that make(1) figures out magically
which names are equivalent with each other.


FWIW, i dislike the term "bmake".  I'm not a specialist in make(1),
you would have to ask Marc Espie for that, but i strongly suspect
the term may be misleading.  For all i know, OpenBSD and NetBSD are
not using the same make(1) program.  The code bases may share
common roots in the remote past.  But OpenBSD make has definitely
seen substantial improvements over many years, and i would be
slightly surprised if NetBSD make would have been completely
stagnant (though i don't know what really went on there).
I'm not aware of any significant synching between both during the
last 15 years or so.  I have no idea which make(1) FreeBSD uses,
and i have no idea what "bmake" is supposed to mean.

> Because your build is failing on one of the exact same two files mine
> did, I suspect something other than parallelism is the culprit here, and
> that going single-core is masking the problem.

Well, isn't that always the case when bugs show up in parallel builds?
I mean, if a bug shows up in a parallel build but not in a serial
build, that means serialization is masking the bug - and the bug
is a defective dependency declaration in the vast majority of cases.
That's just what unreliable parallel build behaviour is:
dependency declarations outright wrong or not working reliably...

> Please review the "devpdf.am" file and advise if you see any
> mistakes.[1]

In that file, it appears relative pathnames starting with "font"
and "./font" are inconsistently specified.  I expect the problem
will go away once you pick either of these notations and then stick
to it.

>> The reason for not investigating properly at once is that i worry
>> that if i do everything properly, testing might end up taking weeks.
>> For 1.23, it ended up taking two years

> Unfortunate.  I haven't seen any follow-up from you on ... oh.
> 
> Well, I was going to point here:
> 
> https://github.com/ischwarze/groff-port/commits/1.23/?after=90725d710b9b2a12af22c7f8fdb8a35959297765+34
> 
> ...but it looks like you make have force-pushed since our earlier
> conversations.  Does that wipe out the comments we'd made?

I have no idea.  When i set up that temporary github repo, i did
not even consider that commenting on github is possible (though
admittedly, now that i think about it, i did know that commenting
on github is possible and not unusual in some repos).  On that repo,
the vast majority of all pushes have always been force-pushes, and
that's the main reason i chose to use git instead of CVS in the
first place: rebasing and force-pushing works much better with git
than with CVS, and bringing my large and chaotic heap of patches
into a logical order is the whole point of the repo, which *requires*
constant rebasing and force-pushing.  Avoiding rabasing would
totally defeat the purpose because it would make the patch heap
even more chaotic (due to the stepwise refinement that naturally
occurs in an unpredictable order) rather than more organized.

That actually is where this final paragraph of the README.md came from:

  Pull requests will not be accepted. If you have suggestions, send
  email to schwarze at openbsd dot org. Such mail can optionally
  contain in-line patch(1)es. Mail containing HTML, Markdown, or
  MIME attachments will be silently deleted.

In retrospect, it would have been better to say more clearly in that
paragraph that aggressive rebasing and force-pushing is the whole
point of the repo - but back then, it never occurred to me that
people might attach comments, nor that that might become a problem.

> I recall offering several back in September.  I guess since they
> were associated with Git hashes that are now orphaned, that they did.

That's unfortunate.  I acted on some of your comments, but not on all;
i disagreed with some, but not with all.

Oh wait.  I just used "git reflog" and identified
commit 564c3f9cd8accd242aee20a2c4ab49008e427e9b
as the state before the latest force-push, marked that commit
as 1.23old locally with "git branch", then re-pushed that obsolete
branch under that name, and voila, at least some of your
comments are back.  Since i *always* agressively rebased
and force-pushed in that repo, it's not obvious to me whether
all of your comments are back...

Do you recall whether
  https://github.com/ischwarze/groff-port/commit/8bcceb92
was the first comment you wrote?
If so, likely nothing was lost, because you wrote that on July 31,
and i don't think i force-pushed after July 31.  Anything you wrote
on July 26 or earlier (if anything) may still be hidden in
orphaned temporary commits, though.

> Glad to see you made headway here, though!
> 
> Incidentally, your feedback on 1.23 prompted me to take a deep dive into
> the grammar of delimiters in *roff.  Not just GNU troff, but all *roffs.
> 
> Will it surprise anyone if I observe that they were underspecified?
> 
> I've now specified them.  Some mandoc test cases may feel the effect.

I already bumped into that while starting to update my WIP 1.23 port
to 1.24rc1 for testing purposes.  There are some substantial
merge conflicts, see for example

  https://github.com/ischwarze/groff-port/commit/e8737fe2

  drop do_name_test() and do_width() patches for now

  These two functions saw large changes that are intended to address
  the same behaviour changes as the patches, and the patches no
  longer apply.  These issues will likely have to be re-investigated
  from scratch.

> Again, bottom line, I suspect that what you've found is not a
> parallelism bug, but either a bug in bmake or a bug in "devpdf.am".

Or both - i did not check whether POSIX requires using the ./foo
target when "foo" appears as a dependency.  Maybe it does?

> Were you building in-tree or out-of-tree?

In-tree.

Not because i would prefer that.  When building bleeding edge in
the past, i usually built in build/.  But the point of building a
port is to keep changes minimal (i.e. only change what is required,
not what is merely a personal preference) and in-tree is still the
default for groff.  Since building in-tree is possible and the
default, minimalism in the port requires building in-tree.  Also,
release testing without using the port would not make any sense.
>From a downstream perspective, the whole point of a release is that
it supports updating the port (occasionally, we also provide ports
based on arbitrary commits for software that never makes releases,
but we usually don't for software that does provide at least
occasional releases).  So the ports framework is the only natural
place where testing a release candidate makes any sense - as opposed
to bleeding-edge testing, which is easier to do, less time-consuming,
and almost as useful outside the ports framework.

I doubt that the current in-tree/out-of-tree situation makes
much sense.  IIUC, the maintainer (you) prefers out-of-tree
by a significant margin, and rarely tests in-tree.  Why, then,
is in-tree the default?  Doesn't that imply users get something
by default that is less well tested than it could easily be,
that it could even be without any additional effort?

I would even go one step further.  Given that you prefer out-of-tree,
that it is objectively cleaner and better tested, why is in-tree
even supported at all?  Simply deleting the code supporting in-tree
and making all builds out-of-tree naively looks like a win for
everyone: less maintenance and testing effort for you and more
cleanliness and better testing for the benefit of users.  So?

Yours,
  Ingo

Reply via email to