Re: gropdf-ng merge status (was: PDF outline not capturing Cyrillic text)

Deri Tue, 06 Feb 2024 13:37:28 -0800

On Tuesday, 6 February 2024 14:45:59 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> Now _does_ seem to be a good time to catch up on gropdf-ng merge status.
> There were two things I knew were still unmerged: the slanted symbol
> (SS) font support and the stringhex business, which has larger
> consequences than I understood at first.
> 
> At 2024-02-06T13:39:51+0000, Deri wrote:
> > The current gropdf (in the master branch) does support UTF-16BE
> > for pdf outlines (see attached pdf), but Branden has not released
> 
> At this point it's a merge (to the master branch), not a release, but
> true with that caveat.
> 
> So let me take a crack at a code review.


Hi Branden,

Many thanks for your thoughts on my code. I shall reply in general terms since 
your grasp of some of the issues is rather hazy, as you admit.

Huge AGL lookup table

My least favourite solution, but you made me do it! The most elegant and 
efficient solution was to make a one line amendment to afmtodit which added an 
extra column to the groff font files which would have the UTF-16 code for that 
glyph. This would only affect devpdf and devps and I checked the library code 
groff uses to read its font files was not affected by an extra column. I also 
checked the buffer used would not cause an overflow. Despite this, you didn't 
like this solution, without giving a cogent reason,  but suggesting a lookup 
table!

As to whether I should embed the table, or read it in, I deferred to the more 
efficient method use by afmtodit, embed it as part of make. I still would 
prefer the extra column solution, then there is no lookup at all.

use_charnames_in_special

Probably unnecessary once you complete the work to return .device to its 
1.23.0 condition, as you have stated.

pdfmomclean

Not quite sure how your work on #64484 will affect this, we will have to wait 
and see.

Stringhex

Clearly you are still misunderstanding the issue, because there are some 
incorrect statements.

In any lookup there is a key/value pair. If dealing with a document written in 
Japanese, both the key and the value will arrive as unicode. No problem for 
the value, but the key will be invalid if used as part of a register name. 
There are two obvious solutions. One is to encode the key into something, 
easily decoded, which is acceptable to be used as part of a register name, or 
do a loop traversal over two arrays, one holding the keys and one the values. 
I'm pretty sure my 9yr old grandson would come up with a looping solution. I 
really don't understand your opposition to the encoding solution, Ok, I accept 
you would have done it the childs way with the performance hit, but I prefer 
the more elegant encoding solution.

Uniqueness of keys is an issue for either strategy. In mom, a user supplied 
key name is only possible by using the NAMED parameter, and if a user uses the 
same name twice in the document nothing nasty will happen, the overview panel 
will be correct, since each of those is tagged with a safe generated name, and 
if they have used the same name for two different places in the document, when 
they are checking all the intra-document links they will find one of them will 
go to the wrong place. Of course this could be improved by warning when the 
same name is provided for a different destination. The man/mdoc macros 
currently have no named destinations, all generated, but this will change if 
the mdoc section referencing is implemented.

You mention a possible issue if a diversion is passed to stringhex, since this 
is 95% your own code for stringup/down, I'm pretty sure that whatever you do 
to solve the issue in your own code can be equally applied to stringhex, so 
this not an argument you can use to prevent its inclusion.

As regards your point 2, this is a non-issue, in 1.23.0 it works fine with 
.device. You ask what does:-

    \X'pdf: bizzarecmd \[u1234]'

Mean? Well, assuming you are writing in the ethiopic language and wrote:-

    \X'pdf: bizzarecmd ሴ'

And gropdf would do a "bizzarecmd" using the CHARACTER given (ETHIOPIC 
SYLLABLE SEE). Which could be setting a window title in the pdf viewer, I'm 
not sure, I have not written a handler for bizzarecmd. As you can see not 
"misleading to a novice" at all, the fact that preconv changed it to be a 
different form and gropdf changed it back to a character to use in pdf meta-
data is completely transparent to the user.

Your work on \X and .device is to put .device back to how it was in 1.23.0 and 
alter \X to be the same, this is what you said would happen.

The purpose of my patch was intended to give Robin a robust solution to what 
he wanted to do.

You wrote in another email:-

"But tparm(const char *str, long, long, long, long, long, long, long,
long, long) is one of the worst things I've ever seen in C code.

As I just got done saying (more or less) to Deri, when you have to
obfuscate your inputs to cram them into the data structure you're using,
that's a sign that you're using the wrong data structure."

I don't appreciate being conflated with "the worst things". The structure we 
were discussing was a simple key/value pair. I have noticed that you tend to 
call things obfuscated when you have difficulty understanding them. Encoding a 
key into a well known address space (the hex numbers) is not obfuscation. 
Consider BASE64 in mail systems, is that obfuscation or a valid method of 
protecting an essentially ascii environment, to my mind its a sensible 
compromise in systems designed before unicode (rings a bell?).

I'm afraid you give the impression that your ideas on how I should do my 
voluntary contribution to groff have more weight than my own, is that how you 
see it? I welcome people who can find issues with my code, and by issues I 
mean if it produces output other than intended, fails on edge cases I have not 
considered, falls over given valid input. I am quite sure there will be "bugs" 
in my code, it is fairly complex, but subjecting it to a "code review" without 
even running it to see if it does what it says on the box, is not helpful.

Cheers 

Deri

Re: gropdf-ng merge status (was: PDF outline not capturing Cyrillic text)

Reply via email to