[ 
https://issues.apache.org/jira/browse/PDFBOX-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067011#comment-18067011
 ] 

Michael Klink commented on PDFBOX-6178:
---------------------------------------

For what it's worth, I took a quick glance and it looked good.

I'm a bit surprised in one regard, though: You strictly reject 0x00 bytes even 
when parsing existing PDFs (where there might be #00 sequences). While this is 
completely correct according to the specification, I'm more used to PDFBox not 
going with the spec in such situations but instead being more accepting. Have 
you checked what the known big players (Acrobat etc.) do when confronted with a 
name like {{{}/NameWithANull#00{}}}? If Acrobat also fails parsing such a file, 
it's fine. Otherwise, though, it would be more PDFBox-ish if names with {{#00}} 
were at least accepted when parsing PDFs.

> PdfBox renames RadioButton with Umlaut
> --------------------------------------
>
>                 Key: PDFBOX-6178
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6178
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: form_empty.pdf, form_selected_acrobat_pro.pdf, 
> form_selected_pdfbox.pdf
>
>
> From the users mailing list:
> 1. Create a document that contains a radio button with Umlaut in name. I can 
> give you an example document.
> Let's say: A radio group "Geschlecht" with the buttons "männlich" and 
> "weiblich".
> Do not use PdfBox for this step. I used Acrobat Pro 2020.
> The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the 
> PDF.
> 2. Update the value of the radio group with PdfBox to "männlich" and save it 
> to a new document.
> {code}
> import java.io.File;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class UpdateRadioGroup {
> private static final String INPUT_FILE = "form_empty.pdf";
> private static final String OUTPUT_FILE = "form_selected.pdf";
> private static final String FIELD_NAME = "Geschlecht";
> private static final String FIELD_VALUE = "männlich";
> public static void main(String[] args) throws Exception {
>          try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
>              document.getDocumentCatalog()
>                      .getAcroForm(null)
>                      .getField(FIELD_NAME)
>                      .setValue(FIELD_VALUE);
>              document.save(new File(OUTPUT_FILE));
>          }
>      }
>  }
> {code}
> 3. Validate the name/value of the "männlich" button in the new document in a 
> text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see 
> COSName.writePDF() ).
> The Problem
>  ===============
>  PdfBox renames the radio button from "männlich" to "männlich".  Or
>  "/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
>  When you read the document again, PdfBox converts "#c3#a" to "ä" but
>  all other programs do not. I tested Acrobat Pro 2020, actual Acrobat
>  Reader, PDFXplorer from https://www.o2sol.com



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to