Re: PDF outline not capturing Cyrillic text

2024-02-07 Thread Deri
On Wednesday, 7 February 2024 01:07:37 GMT Robin Haberkorn wrote: > Still, when using UTF-8 input, there are problems (missing letters) with > link texts autogenerated by .pdfhref L. [...] > > Best regards, > Robin > > PS: And to comment on some of the heated discussions on this list: > It's

Re: Re: PDF outline not capturing Cyrillic text

2024-02-06 Thread Robin Haberkorn
On Tue, Feb 06, 2024 at 01:39:51PM +, Deri wrote: > Hi Robin, > > The current gropdf (in the master branch) does support UTF-16BE for pdf > outlines (see attached pdf), but Branden has not released the other parts to > make it work! If you can compile and install the current git the

Re: PDF outline not capturing Cyrillic text

2024-02-03 Thread Robin Haberkorn
Regarding cyrillic characters in PDF outlines, I think I got a few insights today. It turns out that the pdfmarks in the postscript code are "text strings" according to the PDF specs, that is either a PDFDocEncoding or UTF-16BE with a leading byte-order marker (cf. PDF Reference 1.7). A

Re: PDF outline not capturing Cyrillic text

2023-08-12 Thread Deri
On Saturday, 12 August 2023 07:35:20 BST G. Branden Robinson wrote: > Hi Deri, > > At 2023-06-23T22:40:42+0100, Deri wrote: > > On Friday, 23 June 2023 19:17:58 BST Robin Haberkorn wrote: > > > So it seems that the main problem really lies in grops and/or gropdf > > > which should ideally work

Re: PDF outline not capturing Cyrillic text

2023-08-12 Thread G. Branden Robinson
Hi Deri, At 2023-06-23T22:40:42+0100, Deri wrote: > On Friday, 23 June 2023 19:17:58 BST Robin Haberkorn wrote: > > So it seems that the main problem really lies in grops and/or gropdf > > which should ideally work with the Unicode escapes produced by > > preconv. I am not sure if we would still

Re: PDF outline not capturing Cyrillic text

2023-06-23 Thread Robin Haberkorn
Hello Peter, I am also now stumbling across Cyrillc-related issues with pdfmark. I am using ms for the time being. The bug also affects autogenerating link texts given via `.pdfhref L`. In the most simple case, preconv will turn your Cyrillic characters into escapes which are apparently not

Re: PDF outline not capturing Cyrillic text

2022-09-22 Thread G. Branden Robinson
[self-reply] At 2022-09-18T09:37:32-0500, G. Branden Robinson wrote: > GNU troff doesn't, as far as I can tell, ever write anything but the > 94 graphical code points in ASCII, spaces, and newlines to its output. I left out a sentence here; it should be clear from the rest of the message but it

Re: PDF outline not capturing Cyrillic text

2022-09-20 Thread Ralph Corderoy
Hi Branden, > A shorter pole might be to establish a protocol for communication of > Unicode code points within device control commands. Portability isn't > much of an issue here: as far as I know there has been no effort to > achieve interoperation of device control escape sequences among >

Re: PDF outline not capturing Cyrillic text

2022-09-18 Thread Oliver Corff
Dear Peter, Dear All, this problem is presumably not limited to groff. I remember the same issue when I was building LaTeX texts with foreign language elements (in my case, among others: Chinese) with the package to create internal links from (hyperref, iirc) table of contents to chapters,etc.

Re: PDF outline not capturing Cyrillic text

2022-09-18 Thread G. Branden Robinson
Hi Peter, At 2022-09-17T17:35:02-0400, Peter Schaffter wrote: > Source documents written in Cyrillic and processed with mom/pdfmom > break the PDF outline: > > 1. The text of titles and headings is not displayed in the outline. [...] > At a guess, it looks as if gropdf or pdfmark isn't