On Tuesday, 6 February 2024 14:45:59 GMT G. Branden Robinson wrote: > Hi Deri, > > Now _does_ seem to be a good time to catch up on gropdf-ng merge status. > There were two things I knew were still unmerged: the slanted symbol > (SS) font support and the stringhex business, which has larger > consequences than I understood at first. > > At 2024-02-06T13:39:51+0000, Deri wrote: > > The current gropdf (in the master branch) does support UTF-16BE > > for pdf outlines (see attached pdf), but Branden has not released > > At this point it's a merge (to the master branch), not a release, but > true with that caveat. > > So let me take a crack at a code review.
Hi Branden, Many thanks for your thoughts on my code. I shall reply in general terms since your grasp of some of the issues is rather hazy, as you admit. Huge AGL lookup table My least favourite solution, but you made me do it! The most elegant and efficient solution was to make a one line amendment to afmtodit which added an extra column to the groff font files which would have the UTF-16 code for that glyph. This would only affect devpdf and devps and I checked the library code groff uses to read its font files was not affected by an extra column. I also checked the buffer used would not cause an overflow. Despite this, you didn't like this solution, without giving a cogent reason, but suggesting a lookup table! As to whether I should embed the table, or read it in, I deferred to the more efficient method use by afmtodit, embed it as part of make. I still would prefer the extra column solution, then there is no lookup at all. use_charnames_in_special Probably unnecessary once you complete the work to return .device to its 1.23.0 condition, as you have stated. pdfmomclean Not quite sure how your work on #64484 will affect this, we will have to wait and see. Stringhex Clearly you are still misunderstanding the issue, because there are some incorrect statements. In any lookup there is a key/value pair. If dealing with a document written in Japanese, both the key and the value will arrive as unicode. No problem for the value, but the key will be invalid if used as part of a register name. There are two obvious solutions. One is to encode the key into something, easily decoded, which is acceptable to be used as part of a register name, or do a loop traversal over two arrays, one holding the keys and one the values. I'm pretty sure my 9yr old grandson would come up with a looping solution. I really don't understand your opposition to the encoding solution, Ok, I accept you would have done it the childs way with the performance hit, but I prefer the more elegant encoding solution. Uniqueness of keys is an issue for either strategy. In mom, a user supplied key name is only possible by using the NAMED parameter, and if a user uses the same name twice in the document nothing nasty will happen, the overview panel will be correct, since each of those is tagged with a safe generated name, and if they have used the same name for two different places in the document, when they are checking all the intra-document links they will find one of them will go to the wrong place. Of course this could be improved by warning when the same name is provided for a different destination. The man/mdoc macros currently have no named destinations, all generated, but this will change if the mdoc section referencing is implemented. You mention a possible issue if a diversion is passed to stringhex, since this is 95% your own code for stringup/down, I'm pretty sure that whatever you do to solve the issue in your own code can be equally applied to stringhex, so this not an argument you can use to prevent its inclusion. As regards your point 2, this is a non-issue, in 1.23.0 it works fine with .device. You ask what does:- \X'pdf: bizzarecmd \[u1234]' Mean? Well, assuming you are writing in the ethiopic language and wrote:- \X'pdf: bizzarecmd ሴ' And gropdf would do a "bizzarecmd" using the CHARACTER given (ETHIOPIC SYLLABLE SEE). Which could be setting a window title in the pdf viewer, I'm not sure, I have not written a handler for bizzarecmd. As you can see not "misleading to a novice" at all, the fact that preconv changed it to be a different form and gropdf changed it back to a character to use in pdf meta- data is completely transparent to the user. Your work on \X and .device is to put .device back to how it was in 1.23.0 and alter \X to be the same, this is what you said would happen. The purpose of my patch was intended to give Robin a robust solution to what he wanted to do. You wrote in another email:- "But tparm(const char *str, long, long, long, long, long, long, long, long, long) is one of the worst things I've ever seen in C code. As I just got done saying (more or less) to Deri, when you have to obfuscate your inputs to cram them into the data structure you're using, that's a sign that you're using the wrong data structure." I don't appreciate being conflated with "the worst things". The structure we were discussing was a simple key/value pair. I have noticed that you tend to call things obfuscated when you have difficulty understanding them. Encoding a key into a well known address space (the hex numbers) is not obfuscation. Consider BASE64 in mail systems, is that obfuscation or a valid method of protecting an essentially ascii environment, to my mind its a sensible compromise in systems designed before unicode (rings a bell?). I'm afraid you give the impression that your ideas on how I should do my voluntary contribution to groff have more weight than my own, is that how you see it? I welcome people who can find issues with my code, and by issues I mean if it produces output other than intended, fails on edge cases I have not considered, falls over given valid input. I am quite sure there will be "bugs" in my code, it is fairly complex, but subjecting it to a "code review" without even running it to see if it does what it says on the box, is not helpful. Cheers Deri