Brian, you state here that you've applied a patch by one Ken Glidden. I cannot find any post or submission from a person with that name on the PDFBox mailing lists. So I'm concerned about the legal trail here. Can you explain that, please? Thank you.
On 18.02.2009 22:36:01 Brian Carrier (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/PDFBOX-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Brian Carrier resolved PDFBOX-430. > ---------------------------------- > > Resolution: Fixed > > Fixed with patch by Ken Glidden that merges a single diacritic text chunk > into the previous text chunk if they overlap. Note that this will not solve > problems where the diacritic comes much after the text chunk it overlays, but > we have not observed PDF files like that. > > Sending trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripper.java > Sending trunk/src/main/java/org/apache/pdfbox/util/TextPosition.java > Sending trunk/test/input/Acrobat9.pdf-sorted.txt > Sending trunk/test/input/Acrobat9.pdf.txt > Transmitting file data ....Committed revision 745665. > > > > > Incorrect diacritic placement in text extraction > > ------------------------------------------------ > > > > Key: PDFBOX-430 > > URL: https://issues.apache.org/jira/browse/PDFBOX-430 > > Project: PDFBox > > Issue Type: Bug > > Reporter: Brian Carrier > > > > Some PDF files store diacritics (accents over characters) as separate text > > elements. The PDF files essentially have a chunk of text and then backup > > and place the diacritic over one of the characters in the chunk of text. > > With text extraction, the current design does not allow the diacritic to be > > placed over a character in the chunk and instead it is placed after the > > chunk. > > The debug-diac2.pdf file in PDFBOX-429 shows this problem. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. Jeremias Maerki
