[
https://issues.apache.org/jira/browse/PDFBOX-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066695#comment-18066695
]
Michael Klink commented on PDFBOX-6178:
---------------------------------------
I'm afraid this is just the next issue caused by pdfbox handling names as
strings instead of byte sequences. There have been multiple others, see e.g.
the list at the end of [this old stack overflow
answer|https://stackoverflow.com/a/48306517/1729265], but always only symptoms
were addressed, not the cause.
Essentially names should only be transformed to strings when such a
transformation is explicitly required and mentioned to be possible in the spec.
> PdfBox renames RadioButton with Umlaut
> --------------------------------------
>
> Key: PDFBOX-6178
> URL: https://issues.apache.org/jira/browse/PDFBOX-6178
> Project: PDFBox
> Issue Type: Bug
> Components: AcroForm
> Reporter: Maruan Sahyoun
> Priority: Major
>
> From the users mailing list:
> 1. Create a document that contains a radio button with Umlaut in name. I can
> give you an example document.
> Let's say: A radio group "Geschlecht" with the buttons "männlich" and
> "weiblich".
> Do not use PdfBox for this step. I used Acrobat Pro 2020.
> The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the
> PDF.
> 2. Update the value of the radio group with PdfBox to "männlich" and save it
> to a new document.
> {code}
> import java.io.File;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class UpdateRadioGroup {
> private static final String INPUT_FILE = "form_empty.pdf";
> private static final String OUTPUT_FILE = "form_selected.pdf";
> private static final String FIELD_NAME = "Geschlecht";
> private static final String FIELD_VALUE = "männlich";
> public static void main(String[] args) throws Exception {
> try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
> document.getDocumentCatalog()
> .getAcroForm(null)
> .getField(FIELD_NAME)
> .setValue(FIELD_VALUE);
> document.save(new File(OUTPUT_FILE));
> }
> }
> }
> {code}
> 3. Validate the name/value of the "männlich" button in the new document in a
> text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see
> COSName.writePDF() ).
> The Problem
> ===============
> PdfBox renames the radio button from "männlich" to "männlich". Or
> "/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
> When you read the document again, PdfBox converts "#c3#a" to "ä" but
> all other programs do not. I tested Acrobat Pro 2020, actual Acrobat
> Reader, PDFXplorer from https://www.o2sol.com
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]