[
https://issues.apache.org/jira/browse/PDFBOX-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073711#comment-18073711
]
Stefan Ziegler commented on PDFBOX-2838:
----------------------------------------
*Use Case 3 — Custom-encoded / raw-byte fonts*
PostScript uses raw charcode bytes for its text operators, not Unicode strings.
The byte values in a PostScript string correspond directly to positions in the
font's {{Encoding}} array. The PDFBox {{showText(String text)}} path calls
{{{}font.encode(text){}}}, which performs a Unicode → charcode lookup. However,
we have already resolved the correct charcode bytes from the PostScript
execution context. Passing the bytes back through a Unicode→encode round-trip
is lossy and incorrect for custom encodings where the mapping may not be
bijective. We need to write the pre-resolved charcode bytes directly as a PDF
string operand followed by {{{}Tj{}}}.
h3. Root cause
{{PDAbstractContentStream}} exposes:
* {{protected void write(String)}} — package/protected, not accessible from
outside
* {{protected void writeBytes(byte[])}} — same
* {{public void showText(String)}} — always calls {{{}font.encode(text){}}},
which throws for Type 3 and is incorrect for pre-encoded bytes
There is *no public method* to write a pre-encoded byte sequence as a PDF
string operand (hex {{<...>}} or literal {{{}(...){}}}) followed by {{{}Tj{}}},
bypassing the Unicode encoding layer.
h3. Feature Request
Please either:
# *Provide a public {{showEncodedText(byte[] encodedBytes)}} method* on
{{PDAbstractContentStream}} that writes the bytes directly as a PDF string
operand + {{{}Tj{}}}, bypassing {{font.encode()}} — similar to what
{{appendRawCommands}} currently enables, but with proper operand formatting and
without the deprecation stigma.
# *Or, at minimum, make {{writeBytes(byte[])}} and {{write(String)}} public*
(or {{package-protected}} with a published escape hatch) so that callers who
understand the PDF content stream format can safely write pre-encoded text
without relying on the deprecated method.
Without one of these, removing the {{appendRawCommands}} calls would require
either reimplementing PDFBox's internal content stream writer or forking the
library — neither of which is acceptable.
> Please make PDPageContentStream non-final
> -----------------------------------------
>
> Key: PDFBOX-2838
> URL: https://issues.apache.org/jira/browse/PDFBOX-2838
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.0
> Reporter: Philip Helger
> Assignee: John Hewson
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: PDPageContentStreamWithCache.java
>
>
> Please make PDPageContentStream non-final as in certain cases it might be
> helpful to cache the last set data on an PDPageContentStream (such as the
> last used Font) to avoid bloating the created PDF. Therefore the methods must
> be overridable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]