Hi folks, The following change could probably use some additional eyeballs. It's one of the things Deri and I disagreed about.
First, I'd like to know if it horrifies anyone, or if anyone has concerns about the performance hit. I'm willing to pay a little bit of performance if I can buy robustness with it, and I think that's what this does. But that's me. The other thing to ask is of Peter: assuming you are among the non-horrified, would you like me to prepare a patch to om.tmac to migrate it to this new `pdf:lookup` macro? If that is done (regardless of who does it), I can also chop out the `ie d PRINTSTYLE` branches from pdf.tmac shown below. It's becoming clear to me how to add hyperlink support to the ms, me, and mm packages. Regards, Branden ----- Forwarded message from "G. Branden Robinson" <g.branden.robin...@gmail.com> ----- Date: Mon, 4 Mar 2024 07:42:40 -0500 (EST) From: "G. Branden Robinson" <g.branden.robin...@gmail.com> To: groff-com...@gnu.org Subject: [groff] 28/28: [pdf]: Implement linear bookmark tag search. gbranden pushed a commit to branch master in repository groff. commit cd9fde325fd889f673f9603b43e1204b1fde42fc Author: G. Branden Robinson <g.branden.robin...@gmail.com> AuthorDate: Mon Mar 4 05:24:40 2024 -0600 [pdf]: Implement linear bookmark tag search. * tmac/pdf.tmac (pdf:lookup): Given an argument, search defined bookmark tags for a match and return one (if found) in the string `pdf:lookup-result`, which is defined but empty if there is no match. Previously (and, actually, still--see below), lookups were O(1) because strings named `pdf:look($TAG_NAME)` were defined. The speed was great but unfortunately, in practical use, tags often got *roff escape sequences stuck into them, which drew diagnostic messages from the formatter and could defeat the matches. (pdfbookmark, pdf*href-M): Use the new mechanism to record a bookmark tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined, so as to not regress documents using that package. Store the tag text in the string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF bookmark in a groff document gets a serial number already. (pdf*href): Use the new mechanism to call `pdf:lookup` and locate a match for the desired tag. * doc/GMPfront.t.in (an*cln): Delete this macro. It iterated through a string, scrubbing it of `\%` escape sequences. Much more exotic things could be placed in bookmark tags; we were fortunate that man page cross references tend to stick to ASCII, plus `\%` to suppress hyphenation. But that's only suitable for man page references; if we want taggable (sub)section headings, like groff_mmse(7)'s "Se ocksÄ", we need strings, not just valid groff identifiers. (an*bookmark): Populate `an*page-ref-nm` without cleaning it first. So what's the performance hit? I built the tree before this commit, then ran the following simple script and saved the output. Applied the commit and did it again. target=doc/groff-man-pages.pdf date for n in $(seq 20) do rm -f $target time -v make $target done date This 381-page PDF's generation slowed from 5.99-6.71 seconds to 7.58-8.49 seconds. Scraping out the elapsed time and flinging these to GNU datamash: $ datamash mean 1 sstdev 1 < 20times-before.data 6.3565 0.22806451072587 $ datamash mean 1 sstdev 1 < 20times-after.data 8.0925 0.2556904995868 ...for a mean performance hit of 27.3%. If this is intolerable, then the priority for resolving Savannah #62264 could be increased. Once we have properly iterable strings (and a strings.tmac "standard library") it will be straightforward for people to write (and contribute!) string-hashing algorithms in *roff. --- ChangeLog | 32 ++++++++++++++++++++++++++++++++ doc/GMPfront.t.in | 13 +------------ tmac/pdf.tmac | 48 ++++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 75 insertions(+), 18 deletions(-) diff --git a/ChangeLog b/ChangeLog index a58151907..f6e0587ea 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,35 @@ +2024-03-04 G. Branden Robinson <g.branden.robin...@gmail.com> + + [pdf]: Implement linear bookmark tag search. + + * tmac/pdf.tmac (pdf:lookup): Given an argument, search defined + bookmark tags for a match and return one (if found) in the + string `pdf:lookup-result`, which is defined but empty if there + is no match. Previously (and, actually, still--see below), + lookups were O(1) because strings named `pdf:look($TAG_NAME)` + were defined. The speed was great but unfortunately, in + practical use, tags often got *roff escape sequences stuck into + them, which drew diagnostic messages from the formatter and + could defeat the matches. + (pdfbookmark, pdf*href-M): Use the new mechanism to record a + bookmark tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined, + so as to not regress documents using that package. Store the + tag text in the string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF + bookmark in a groff document gets a serial number already. + (pdf*href): Use the new mechanism to call `pdf:lookup` and + locate a match for the desired tag. + + * doc/GMPfront.t.in (an*cln): Delete this macro. It iterated + through a string, scrubbing it of `\%` escape sequences. Much + more exotic things could be placed in bookmark tags; we were + fortunate that man page cross references tend to stick to ASCII, + plus `\%` to suppress hyphenation. But that's only suitable for + man page references; if we want taggable (sub)section headings, + like groff_mmse(7)'s "Se ocksÄ", we need strings, not just valid + groff identifiers. + (an*bookmark): Populate `an*page-ref-nm` without cleaning it + first. + 2024-03-04 G. Branden Robinson <g.branden.robin...@gmail.com> * doc/GMPfront.t.in: Resync `MR` replacement with its diff --git a/doc/GMPfront.t.in b/doc/GMPfront.t.in index 367190478..4390ea5f3 100644 --- a/doc/GMPfront.t.in +++ b/doc/GMPfront.t.in @@ -7,17 +7,6 @@ .nr PDFOUTLINE.FOLDLEVEL 1 .defcolor pdf:href.colour rgb 0.00 0.25 0.75 .pdfinfo /Title "groff Collected Reference Pages" -.de an*cln -. ds \\$1 -. als an*cln:res \\$1 -. shift -. ds an*cln:res \\$*\" -. ds an*cln:chr \\$* -. substring an*cln:chr 0 0 -. if '\%'\\*[an*cln:chr]' \{\ -. substring an*cln:res 1 -. \} -.. . .de END .. @@ -25,7 +14,7 @@ .am reload-man END .de an*bookmark . ie (\\\\$1=1) \{\ -. an*cln an*page-ref-nm \\\\$2\" +. ds an*page-ref-nm \\\\$2\" . pdfbookmark -T "\\\\*[an*page-ref-nm]" \\\\$1 \\\\$2 . \} . el .pdfbookmark \\\\$1 \\\\$2 diff --git a/tmac/pdf.tmac b/tmac/pdf.tmac index e26cd1cd0..3824e7c6d 100644 --- a/tmac/pdf.tmac +++ b/tmac/pdf.tmac @@ -191,6 +191,24 @@ am solely responsible for any bugs I may have introduced into this file. .\" .nr PDFOUTLINE.FOLDLEVEL 10000 .\" +.\" Search defined bookmarks for a tag matching $1. This gets around +.\" problems with *roff escape sequences embedded in identifiers (which +.\" is not allowed by the language syntax), and the need to tediously +.\" scrub the strings of them (and throw diagnostics in the user's face +.\" if we don't do a good enough job) at the expense of an O(n) search +.\" every time we reference a bookmark instead of an O(1) one. +.\" +.de pdf:lookup +.nr pdf:index 0 1 +.ds pdf:lookup-result \" empty +.while d pdf:bm\\n+[pdf:index].tag \{\ +. if '\\$1'\\*[pdf:bm\\n[pdf:index].tag]' \{\ +. ds pdf:lookup-result \\*[pdf:bm\\n[pdf:index].tag]\" +. break +. \} +. \} +.. +. .\" The actual job of creating an outline reference .\" is performed by the "pdfbookmark" macro. .\" @@ -267,8 +285,14 @@ am solely responsible for any bugs I may have introduced into this file. . rm pdf:clean . ev . tr \[em]\[em] -. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned] -. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned] +. ie d PRINTSTYLE \{\ +. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned] +. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned] +. \} +. el \{\ +. ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME] +. if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME] +. \} . pdfmark /Dest /\\*[PDFBOOKMARK.NAME] /View [\\*[PDFBOOKMARK.VIEW]] /DEST . nop \!x X ps:exec [/Dest /\\*[PDFBOOKMARK.NAME] /Title (\\*[pdf:cleaned]) /Level \\n[pdf:bm.lev] /OUT pdfmark . pdf:href.options.clear @@ -589,8 +613,14 @@ am solely responsible for any bugs I may have introduced into this file. .\" .ds PDFBOOKMARK.NAME "\\*[pdf:href-N]\\*[pdf:href-D] .pdf*href.set \\*[PDFBOOKMARK.NAME] \\$1 -.ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$* -.if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$* +.ie d PRINTSTYLE \{\ +. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$* +. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$* +. \} +.el \{\ +. ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME] +. if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME] +. \} .\" .\" .\" Irrespective of whether this marker is created, or not, @@ -704,8 +734,14 @@ am solely responsible for any bugs I may have introduced into this file. . ds PDFHREF.DESC \\$* . \} . el \{\ -. ie dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC \\*[pdf:look(\\*[pdf:href-D])] -. el .ds PDFHREF.DESC Unknown +. ds PDFHREF.DESC Unknown +. ie d PRINTSTYLE \{\ +. if dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC \\*[pdf:look(\\*[pdf:href-D])] +. \} +. el \{\ +. pdf:lookup \\*[pdf:href-D] +. ie !'\\*[pdf:lookup-result]'' .ds PDFHREF.DESC \\*[\\*pdf:lookup-result] +. \} . \} . \" Apply border and colour specifications to the PDFMARK string . \" definition, as required. _______________________________________________ Groff-commit mailing list groff-com...@gnu.org https://lists.gnu.org/mailman/listinfo/groff-commit ----- End forwarded message -----
signature.asc
Description: PGP signature