Re: [Groff] Formatting algorithm

Peter Schaffter Sat, 03 May 2014 11:56:13 -0700

On Fri, Apr 25, 2014, Doug McIlroy wrote:
> Equations and displayed quotations are not a problem for the line-breaking
> algorithm; they'd all be handled in the macro packages, which would have
> the duty to delimit the blocks of text that are to be treated unitarily.
> 
> But that doesn't mean we're home free. Line-length changes within such
> blocks are still a problem.
<snip>
> Notionally there could be a phantom copy of groff running for each live
> partial solution that the dynamic program maintains.
<snip>
> A straightforward way to pull this off would be to actualize the notional
> copies of groff by forking. There would be one copy going forward from
> each line break. That would evaluate the cost of breaking at each word
> (or hyphenation point) on that line. At each line break the copies would
> rendezvous to see which process should be cloned to continue. Output
> of each process, both to standard output and standard error, would
> be treasured up and only the ultimate winner's output would finally
> be released.
> 
> This model is somewhat formidable.


No kidding.  And I fear it might break one of groff's greatest
strengths, which is minimal demand on system resources.  

> So far my imagination has failed to find a clean and efficient
> alternative that retains classic groff semantics in every way
> except line-breaking.  Who has a better idea?

"...that retains classic groff semantics" is where the challenge
lies.  Groff is built from the ground up to use a greedy algorithm,
making dynamic programming somewhat alien.  I can't help but wonder
whether any attempt to implement KP won't ultimately stumble over
this issue.  A hybrid groff could certainly go off in that
direction, but we want to avoid forking the program.

In the simplest of terms, KP aims to normalize grey.  It does so
by minimizing the differences in word spacing from line to line by
finding the optimal number of words per line based on an assessment
of the whole paragraph.

The goal of any new strategy for groff formatting should also be to
normalize grey, but that does not necessarily mean KP is the only
route.  At the risk of sounding like a Johnny-One-Note, I posit that
a more sophisticated greedy algorithm, based on what groff already
does, stands a higher chance of success.  It wouldn't be perfect,
but neither is KP.  If there's one thing typesetters know, it's that
quality typography inevitably involves user intervention.  A good
typesetting program aims to minimize, not obviate, the need.  The
principle of "least ugly", which so successfully guided lilypond
development, might prove a better way of approaching the groff
formatting problem.

I suspect I'm a voice crying in the wilderness here, but we need to
consider that a greedy algorithm is almost always faster than a
dynamic programming solution; furthermore, greedy algorithms
sometimes lead to optimal solutions.  "Optimal", in this case,
would entail both improved grey *and* preserving groff semantics.

Food for thought.

-- 
Peter Schaffter
http://www.schaffter.ca

Re: [Groff] Formatting algorithm

Reply via email to