Re: Merging the new gropdf

Deri Tue, 07 Nov 2023 14:51:45 -0800

On Tuesday, 7 November 2023 19:48:51 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2023-11-07T17:33:06+0000, Deri wrote:
> > On Monday, 6 November 2023 17:47:13 GMT G. Branden Robinson wrote:
> > > Slowly.  I landed two small changes this weekend, but they're not
> > > things most folks are looking for.
> > 
> > Thanks for this.
> 
> Hoping to do more soon!
> 
> > > * The sheer magnitude of changes to gropdf.pl itself, and my own
> > > 
> > >   ignorance of details of PDF that the new functionality is
> > >   implementing.
> > 
> > There are two main areas of change. The first is rectifying my design
> > mistake in the original gropdf. It used the "t" command from groff as
> > the primary command as a series of input characters which would be
> > converted to postscript glyphs, all other text commands (for example
> > "c") were converted back to their input character and treated as a
> > single character "t" command. I was focussed on the groff font rather
> > than the postscript font.
> > 
> > While thinking about font subsetting it became clear it made more
> > sense to convert all input to postscript glyph names immediately, and
> > use them as the "common currency" rather than focus on words. This
> > particularly makes sense when dealing with non-latin input which has
> > been processed with preconv. It is also makes it much more natural
> > when dealing with font subsetting. Previously this was not necessary
> > because the whole font was embedded by gropdf.
> > 
> > The second major change is the addition of a type 1 font parser and
> > code to generate a font which only contains the glyphs required by the
> > document being processed. This is the area which needs the most
> > testing. I have tested with dozens of fonts that this parser is robust
> > enough, but there are thousands of fonts out there. It seems to be
> > happy with fonts produced by fontforge, which is promising.
> 
> This is super helpful!  I plan to quote a lot of this in the commit
> message when I land the changes to gropdf.pl (except for the bit you
> want me to hold off on).
> 
> So that I understand you, gropdf-ng basically emits _only_ "c" and "C"
> commands to render text, and never "t" anymore?  What happens if track
> kerning is required (the "u" command)?


Hi Branden,

Yes, in so far as the t-command is treated like multiple C-commands. The u-
command is the same as t but with a kernadjust flag set. The key routine is 
PutGlyph which all text commands end up using a single token at a time.

> > > * An uneasiness I feel about some of the solutions you adopted
> > > 
> > >   insofar as they have effect outside of gropdf.pl itself.  For
> > >   instance:
> > >   
> > >   1.  Changing the format of font description files to add yet
> > >   
> > >       another field, mapping character names to Unicode code points.
> > >       In the rest of groff, this is not necessary because we have
> > >       glyphuni.cpp.
> > > 
> > > https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/glyph
> > > uni
> > > .cpp
> > > 
> > >       I'd like to honor the DRY principle here.  What's a good way
> > >       to achieve that?
> > 
> > I'm sure you have noticed that glyphuni.cpp has 433 mappings from
> > groff character names to unicodes, and afmtodit has 4089 mappings
> > between postscript glyph names and unicodes (the mappings are also
> > algorithmically generated rather than manually hard coded as in the
> > case of glyphuni.cpp).
> 
> I certainly had not counted them.
> 
> > The mapping from postscript glyph to unicode is more appropriate for
> > gropdf, the mapping from groff character names is meant for the input
> > side of groff, where "glyph_name_to_unicode" is used solely by
> > input.cpp. The use of the word glyph in the subroutine name is a bit
> > confusing since it gives the false impression it is concerned with
> > postscript glyphs.
> 
> Fair point.  I've had my own issues with groff's internal names for
> things...repeatedly.
> 
> > I could add the mapping tables from afmtodit to the gropdf code and
> > drop the new column from the groff fonts created by afmtodit, but my
> > understanding of DRY principles is to avoid such duplication, have I
> > got it wrong?
> > 
> > When I looked at the code for the rest of groff, adding an extra
> > column had no effect on code which processed the groff font files.
> 
> I don't have a fundamental objection; my concern at this point is
> understanding what you did well enough to document it.  If I succeed,
> then I usually feel I grasp what's going on well enough to have an
> opinion about the design.  It's hard for anyone but the implementor to
> go the other direction.
> 
> > >   2.  I don't know the provenance of a new font you have proposed
> > >   
> > >       for shipping with groff, StandardSymSL.pfb.  We need to make
> > >       sure it is freely licensed.  If it is mechanically generated
> > >       from a PostScript Type 1 font that we can expect to find on
> > >       the system, maybe we should perform that procedure during the
> > >       build.  (On the other hand, I'm not sure I love the idea of
> > >       adding a build-dependency on fontforge or similar.)
> > 
> > This is the font which you said:-
> > 
> > "Looks great! It's not led astray by the superscripts or anything."
> > 
> > In this message to the list:-
> > 
> > https://lists.gnu.org/archive/html/groff/2023-06/msg00114.html
> > 
> > I described the provenance of the font. I don't think it is a good
> > idea to generate the font and introduce a dependency on fontforge. I
> > would put it in a similar category to the Euro font we currently
> > distribute, i.e. a font to make everything work smoothly.
> 
> Thank you--that conversation had aged out of my cache.  I did remember
> being happy with improved equation typography.
> 
> > >   3.  The new `stringhex` request you've proposed for troff.  As
> > >   
> > >       noted elsewhere, I'd prefer to solve this a different way.
> > >       
> > >       https://savannah.gnu.org/bugs/index.php?63074
> > >       
> > >       ...but I haven't implemented my idea yet, so I don't object to
> > >       `stringhex` as a temporary measure.
> > 
> > I'm happy to drop stringhex for a better solution, if it handles the
> > problem in this bug and the problems in these bugs:-
> > 
> > https://savannah.gnu.org/bugs/?62264
> > https://savannah.gnu.org/bugs/?64576
> 
> I think that none of these issues are tightly coupled with the others in
> the design sense, but in practical application, one might well employ a
> string iterator to decide how to rewrite a string that one subsequently
> sends to the device with `.output` or `\X`.

Each of these issues, although disparate, were solved by using stringhex to 
"hide" the troff syntax in the string, so it could be passed with \X'' without 
complaint.

> I got fairly far along with my `for` request; I need to pick that back
> up.
> 
> > > * There were _tons_ of seemingly unrelated whitespace changes to
> > > gropdf.pl, which frustrates code review.  (This has happened before;
> > > I don't remember when, but Dave might.)  I went through the file and
> > > attempted to impose a consistent style on it, but I'm not sure how
> > > you'll feel about it.  More importantly, it would be helpful to get
> > > your text editor to do better here.
> > 
> > I have attached an html version of how gropdf looks on my system. Tabs
> > are spaced at 8 character intervals, indents are 4 characters.
> 
> Okay.  Unfortunately that didn't help me much.
> 
> The commit a320b1357a807988c720bf482f17901ee5a24baf replaced tabs with
> spaces with very high reliability, as if you'd reconfigured your text
> editor.  There are significant amounts of new code, and those used
> spaces for indentation as well, except in a few places.
> 
> I'll paste including the (UTF-8-encoded) Vim "listchars" that I use to
> try to keep myself from making whitespace errors.
> 
> +        if ($options & SUBSET)
> +        {
> +            $lenIV=$1 if $head=~m'/lenIV\s+(\d+)';
> +            my $l=length($body);
> +            my
> $b=($gotinline)?decrypt_exec_C($body,$l):decrypt_exec_P(\$body,$l);
> +»‥‥‥‥‥‥    $body=substr($body,$lenIV);
> +»‥‥‥‥‥‥    $body=~m/begin([\r\n]+)/;
> +»‥‥‥‥‥‥    $term=$1;
> +»‥‥‥‥‥‥    if (defined($term))
> +»‥‥‥‥‥‥    {
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥(@bl)=split("$term",$body);
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥map_subrs(\@bl);
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥Subset(\@bl,$glyphs);
> +            }
> +            else
> +            {
> +                Warn("Unable to parse font '$fnt->{internalname}' for
> subsetting") +            }
> +        }
> 
> +        my $miss=-1;
> +        my $CharSet=join('',@{$fnt->{CHARSET}->[$j]});
> +»‥‥‥‥‥‥push(@{$chars->[$j]},'u0020') if $j==0 and
> $fnt->{NAM}->{u0020}->[PSNAME];
> 
> +    if (exists($fnt->{fontfile}))
> +    {
> +»‥‥‥‥‥‥if ($options & SUBSET and !($options & NOFILE))
> +»‥‥‥‥‥‥{
> +»‥‥‥‥‥‥    if (defined($term))
> +»‥‥‥‥‥‥    {
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥$body=encrypt(\@bl);
> +»‥‥‥‥‥‥    }
> +»‥‥‥‥‥‥}
> +
> +»‥‥‥‥‥‥if (defined($fobj))
> +»‥‥‥‥‥‥{
> +»‥‥‥‥‥‥    $obj[$fobj]->{STREAM}=$head.$body.$tail;
> +»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length1}=length($head);
> +»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length2}=length($body);
> +»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length3}=length($tail);
> +        }
> +
> +        foreach my $o (@fontdesc)
> +        {
> 
> +    if (!exists($obj[$o]->{XREF}))
> +    {
> +»‥‥‥‥‥‥if ($PDFver==5 and !exists($obj[$o]->{STREAM}) and
> ref($obj[$o]->{DATA}) eq 'HASH') +»‥‥‥‥‥‥{
> +»‥‥‥‥‥‥    # This can be put into an ObjStm
> +»‥‥‥‥‥‥    my $maj=int(++$objidx/128);
> +»‥‥‥‥‥‥    my $min=$objidx % 128;
> +
> +»‥‥‥‥‥‥    if ($maj > $omaj)
> +»‥‥‥‥‥‥    {
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥$omaj=$maj;
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥BuildObj(++$tobjct,
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥{
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥    'Type' => '/ObjStm',
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥}
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥);
> +
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥$obji[$maj]=[$tobjct,0,'',''];
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥$obj[$tobjct]->{DATA}->{Extends}=($tobjct-1)." 0 R" if $maj
> > 0; +»‥‥‥‥‥‥    }
> +
> +»‥‥‥‥‥‥    $obj[$o]->{INDIRECT}=[$tobjct,$min];
> +»‥‥‥‥‥‥    $obji[$maj]->[1]++;
> +»‥‥‥‥‥‥    $obji[$maj]->[2].=' ' if $obji[$maj]->[2];
> +»‥‥‥‥‥‥    $obji[$maj]->[2].="$o ".length($obji[$maj]->[3]);
> +»‥‥‥‥‥‥    PutObj($o,\$obji[$maj]->[3]);
> +»‥‥‥‥‥‥}
> +»‥‥‥‥‥‥else
> +»‥‥‥‥‥‥{
> +»‥‥‥‥‥‥    PutObj($o);
> +»‥‥‥‥‥‥}
> +    }
> 
> +»‥‥‥‥‥‥»‥‥‥‥‥‥‥$fnt{NAM}->{$r[0]}=$fnt{NAM}->{$lastnm};
> 
> +        # Real font needs subsetting
> +»‥‥‥‥‥‥$fnt{fontfile}=$download{$fontkey};
> +#         my ($head,$body,$tail)=GetType1($download{$fontkey});
> +#         $head=~s/\/Encoding .*?readonly def\b/\/Encoding StandardEncoding
> def/s; +#         $fontlst{$fontno}->{HEAD}=$head;
> +#         $fontlst{$fontno}->{BODY}=$body;
> +#         $fontlst{$fontno}->{TAIL}=$tail;
> +        #         $fno=++$objct;
> +        #       EmbedFont($fontno,\%fnt);
> 
> You could fix this up in your branch--but you'll likely have to fix
> merge conflicts in subsequent commits--and then force-push.  Or I can
> apply the tab-restoring indentation fix-up commit I attached as part of
> my previous mail.
> 
> What would you like to do?

I'm happy for you to decide suitable formatting. I prefer { and } to appear on 
their own lines and the contents of the block to be indented appropriately. 
Once gropdf has been incorporated into master, with your formatting, I shall 
remove my branch and start using the "official" gropdf for further 
development.

> > > Also your most recent commit to your branch says that it's starting
> > > work on a new thing.  Should I omit that from my merge?
> > > 
> > > https://git.savannah.gnu.org/cgit/groff.git/commit/?h=deri-gropdf-ng&id=
> > > a2b5 541142a1571e9f9f5a8321c1e21c721469aa
> > 
> > Yes, please drop this. It is my next project, text decorations. Peter
> > asked for underlining a long time ago. Mom postscript has a nifty
> > piece of postscript code (courtesy of Tadziu I think) which underlines
> > text. In PDFs underline is one of the text decorations, so I'm hoping
> > to expose an API for text decoration as the next project.
> 
> Understood, will do.
> 
> > > I'm attaching a "git log -p --format=fuller" of my staging branch so
> > > you can see where I am.
> > > 
> > > I look forward to hearing your thoughts on next steps.
> > 
> > Keep going. :-)
> 
> I have another proposal: use *roff-compatible register format syntax
> when assigning page number identifiers to bookmarks.  If I understand
> your code correctly, no impedance matching with a PDF feature is
> necessary; the page number labels are arbitrary.
> 
> I refer to this:
> 
> +            elsif (lc($xprm[1]) eq 'pagenumbering')
> +            {
> +                # 2=type of [D=decimal,R=Roman,r=roman,A=Alpha
> (uppercase),a=alpha (lowercase)
> 
> I think it would be more comfortable for experienced *roff users to use
> _its_ convention for number formats.  See CSTR #54 §8, or:
> 
> https://www.gnu.org/software/groff/manual/groff.html.node/Assigning-Register
> -Formats.html
> 
> Popping the stack...

I'm not averse to this, I was following the mom documentation (loosely!), 
which says:-

PAGENUM_STYLE lets you tell mom what kind of page numbering you want.

DIGIT   Arabic digits (1, 2, 3...)
ROMAN   upper case roman numerals (I, II, III...)
roman   lower case roman numerals (i, ii, iii...)
ALPHA   upper case letters (A, B, C...)
alpha   lower case letters (a, b, c...)

> As weird as it may sound, I think fixing the code style confusion may
> make the rest easier to understand.  Let me know what tack you want to
> take.
> 
> Regards,
> Branden

Cheers

Deri

Re: Merging the new gropdf

Reply via email to