[bug #65585] [gropdf] problems introduced by commit cd9fde325f

2024-04-12 Thread Deri James
Follow-up Comment #2, bug #65585 (group groff):

Hi Branden,

All the issues raised in this bug have now been fixed in the release I've just
done, one regression is outstanding, can't convert "rs" to unicode, but I will
fix that.

Alex is not screaming because the "special sauce" I gave him to create the
kernel book does not use the .MR macro at all, even when the particular man
page calls .MR! Most of them are using .BR name (n). Once he converts them all
to .MR, I'll let him know he can drop my sauce - then you may hear the
gnashing of teeth as well as the screams.

Your comments on glyphuni.cpp equally apply to afmtodit so when you integrate
it into afmtodit I can use it in gropdf.

Most of the features of my branch are now integrated in some way or another,
not always my code. :-)

The only thing I think which is still missing is the proper SS font for
devpdf, which produces "more perfect" typeset equations.

For the NEWS file perhaps you can use some of the information in the attached
pdf.

My condolences for your bereavement.

Regards

Deri

(file #55939)

___

Additional Item Attachment:

File name: NewGropdf.pdf  Size: 104KiB



AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-4448d8da181e3133acac16b178e98513aa6405cd.tar.gz


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #65585] [gropdf] problems introduced by commit cd9fde325f

2024-04-12 Thread G. Branden Robinson
Update of bug #65585 (group groff):

 Summary: Problems introduced by commit cd9fde325f => [gropdf]
problems introduced by commit cd9fde325f

___

Follow-up Comment #1:

Hi Deri,

Thanks for giving my changes a close look.  I'm obviously pretty inexperienced
with this part of the _groff_ code base.

[comment #0 original submission:]
> I have been looking at the current state of pdf.tmac and have found a few
issues. The changes which have been committed are:-
> 
> Introduction of a new flag, -S, which replaces my change which introduced
the concept of passing a single pipe character as a hotspot meant that the
actual text to form the actual hotspot would be "piped" to the document
stream, terminated by doing a markstop. This is what allowed Branden to
implement the man macros which place contents into a diversion before emitting
a hotspot. I have no objection to this change, although there were several
ways of setting up a hotspot containing a single pipe character before this
change, and most users are probably familiar with piping data. (Perhaps I
should have used "<" as the single character). I would be very annoyed if
someone sent me a pdf with a single "|" as a hotspot - would take me about 5
minutes to click it with the mouse!

I don't object to the _mnemonic_ of "|" here; what made me uneasy was the use
of in-band signaling to the macro file when faced with arbitrary user input. 
Someone somewhere would want to make a lone "|" hotspot and would, I predict,
be flabbergasted that they weren't allowed.  Yes, we could advise them to work
around the restriction by sticking a dummy character (or similar) adjacent to
it, but that doesn't feel to me like advice we could be proud of.

Also I don't think "pdf.tmac" is the right layer at which to have _groff_ warn
the user about accessibility flaws.  Maybe there isn't such a layer, apart
from excoriating the document author.

And if someone set a "|" in 36- or 48-point type, presumably it wouldn't be as
much of a problem.  So a preclusion of it (unadorned) as a hotspot seems to me
a bit of a severe measure.
 
> Another change was an attempt to only allow a restart after a preceding
suspend. This is a good idea, but it is in the wrong place, it should be
implemented in gropdf rather than pdf.tmac, since a user may use \X'pdf:
markrestart' as documented in the gropdf man page. Which bypasses the check in
pdf.tmac.

I'll need to understand this better, but I admit that it did feel to me like
only half of a fix.  My instinct is that it would be better to expose
"markstart" and "markend" directly with convenience macros, and update
_gropdf_(1) such that users could understand what they were for and how they
might write their own hyperlinking macros.

I'm pretty uncomfortable with "pdfmark.tmac"'s approach, which assumes that it
has anticipated every user's every need (but also stops documenting its
all-singing, all-dancing comprehensiveness in section 4 of its manual), and
all you need to do to use it is write a macro call resembling an _ffmpeg_
command to get results.  

There's even a place for that--but not, I think, via the means of going
straight to device control commands.  I think we should have a more granular
API for access to PDF features, and I think your approach in "pdf.tmac" has
taken several steps down that path already.

> Branden has implemented a looping construct to hold tags replacing an
associative array. It is not used if the user is using mom macros (it does not
work for mom documents).

I'm aware of it and it's on my to-do list.  I
[https://lists.gnu.org/archive/html/groff/2024-03/msg00022.html spoke of it on
the mailing list in March].

Long story short, I plan to get my new approach working for _mom_(7) but that
means I have to solve the danged old "node embedding in \X commands" problem
that has vexed us for a while.

> In fact it only works reliably for the man macros producing a "book" of man
pages, where it is used to differenciate between an internal link to another
page in the book or a "man:" URL to an external man page. The problem is that
the new code does not implement:-

> .pdfhref L -D name


> Which should emit the descriptive text associated with the named tag at the
time it was defined, but instead it just emits the "name". This is documented
in pdfmark.pdf, and used several times in pdfmark.ms, so it is reasonable to
assume users may be using this in their own documents if they use "pdfmom
--roff  -mspdf". So it is wrong to assume the new code is suitable for
everything except mom.

Okay.  That's a defect/implementation gap I was unaware of.  It might be a
good idea to spawn a ticket off for that and have this one depend on it.

> There is also an issue with the speed of the new code. One test file I used
went from an elapsed time of 0m3.08s to 11m31.62s. (About 670 times slower!).
This was producing a large document (LinuxManBook) from a single file, 

[bug #65585] Problems introduced by commit cd9fde325f

2024-04-12 Thread Deri James
URL:
  

 Summary: Problems introduced by commit cd9fde325f
   Group: GNU roff
   Submitter: deri
   Submitted: Fri 12 Apr 2024 03:51:08 PM UTC
Category: Driver gropdf
Severity: 3 - Normal
  Item Group: Incorrect behaviour
  Status: In Progress
 Privacy: Public
 Assigned to: deri
 Open/Closed: Open
 Discussion Lock: Any
 Planned Release: None


___

Follow-up Comments:


---
Date: Fri 12 Apr 2024 03:51:08 PM UTC By: Deri James 
I have been looking at the current state of pdf.tmac and have found a few
issues. The changes which have been committed are:-

Introduction of a new flag, -S, which replaces my change which introduced the
concept of passing a single pipe character as a hotspot meant that the actual
text to form the actual hotspot would be "piped" to the document stream,
terminated by doing a markstop. This is what allowed Branden to implement the
man macros which place contents into a diversion before emitting a hotspot. I
have no objection to this change, although there were several ways of setting
up a hotspot containing a single pipe character before this change, and most
users are probably familiar with piping data. (Perhaps I should have used "<"
as the single character). I would be very annoyed if someone sent me a pdf
with a single "|" as a hotspot - would take me about 5 minutes to click it
with the mouse!

Another change was an attempt to only allow a restart after a preceding
suspend. This is a good idea, but it is in the wrong place, it should be
implemented in gropdf rather than pdf.tmac, since a user may use \X'pdf:
markrestart' as documented in the gropdf man page. Which bypasses the check in
pdf.tmac. 

Branden has implemented a looping construct to hold tags replacing an
associative array. It is not used if the user is using mom macros (it does not
work for mom documents). In fact it only works reliably for the man macros
producing a "book" of man pages, where it is used to differenciate between an
internal link to another page in the book or a "man:" URL to an external man
page. The problem is that the new code does not implement:-

.pdfhref L -D name

Which should emit the descriptive text associated with the named tag at the
time it was defined, but instead it just emits the "name". This is documented
in pdfmark.pdf, and used several times in pdfmark.ms, so it is reasonable to
assume users may be using this in their own documents if they use "pdfmom
--roff  -mspdf". So it is wrong to assume the new code is suitable for
everything except mom.

There is also an issue with the speed of the new code. One test file I used
went from an elapsed time of 0m3.08s to 11m31.62s. (About 670 times slower!).
This was producing a large document (LinuxManBook) from a single file, using
the command:-

time ~/groff-git/groff/test-pdfmom  --roff -Tpdf -mandoc -petk
LinuxManBook.trf -z

(test-pdfmom is the same as test-groff but calling the pdfmom in the build
directory). I don't know whether this slow down in groff will be acceptable to
most users, I'm meant to be on a sabatical so I am not going to argue either
way, but be aware that is Branden's attempt to solve one issue which was
solved months ago in my branch waiting for merging. My use of stringhex was
described as "obfuscation", which is rather insulting if the intention was to
imply a deliberate attempt to obscure the purpose of my code. The correct
interpretation of its use is to protect the data from interpretation by groff
so that any byte sequence can be used as a string name in groff. This is
analagous to using base64 to protect binary data in emails.

Although Branden's loopy code solves some issues with 1.23.0 it fails in a
number of ways which are dealt with successfully by the pdf.tmac in my branch
(i.e. t.trf in #64576).

There is a bug in Branden's code. The attached file, pdf-L.trf, illustrates
the issue. According to the documentation in pdfmark.pdf, if you use .pdfhref
L with a destination name but no descriptive text, the descriptive text given
when the destination is named is used. With Branden's code, instead of using
the descriptive text it uses the name of the destination instead. The two pdfs
called pdf-L illustrate the problem.

Another bug results in entries in the array Branden loops over get
over-vritten in certain circumstances. This code illustrates the bug.

.ig
  groff -Tpdf -dPDF.EXPORT=1 -z pdf-M.trf

Results:-

.ds pdf:bm1.tag one
.ds pdf:bm1.tag two

(The bm1.tag has been overwritten!!!)

But,

  groff -Tpdf -dPDF.EXPORT=1 -dPRINTSTYLE=1 -z pdf-M.trf

Results:-

.ds pdf:look(one) Once upon...
.ds pdf:look(two) This is two

..
.pdfbookmark -T one 1 Once upon...
.pdfhref M -D two -E This is two