[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample2.xps The sample pdf file printed to Microsoft XPS Document Writer (using Times New Roman) > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps, sample2.jpg, > sample2.pdf, sample2.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample2.jpg A printscreen image with the text rendering issue (using Times New Roman). > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps, sample2.jpg, > sample2.pdf, sample2.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample2.pdf A sample pdf file to reproduce the bug (using Times New Roman). > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps, sample2.jpg, > sample2.pdf, sample2.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855283#action_12855283 ] Bertrand GILLIS commented on PDFBOX-686: Thanks Maruan for informing me about that related issue... I've missed PDFBOX-667. But this issue is not limited to Helvetica... I have also the same problem with Times New Roman. > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Wilson resolved PDFBOX-675. -- Resolution: Fixed > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855119#action_12855119 ] Daniel Wilson commented on PDFBOX-675: -- Thanks! I bet I can get it right next time ... > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855093#action_12855093 ] Maruan Sahyoun commented on PDFBOX-686: --- PDFStreamEngine.patch at PDFBOX-667 adresses that issue as the last char within a line of text was treated as a separate loop termination condition. In addition the misplacement occurs because of slightly different font metrics for Helvetica (used within the PDF) and ArialMT (used for drawing). > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855076#action_12855076 ] Andreas Lehmkühler commented on PDFBOX-675: --- It's quite easy: - make your changes to the xml file(s) - generate the documentation running "mvn site install" in the root directory - checkin both the modified xml and html file(s) - the update will be done automatically I've made some minor changes to your xml file. There were some issues with the order and the nesting of some elements. I've commited both in revision 932055 > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855053#action_12855053 ] Daniel Wilson commented on PDFBOX-675: -- I deleted the patch since I figured out how to make the change to the source XML. That is committed in revision 932037. > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Wilson updated PDFBOX-675: - Attachment: (was: dot_net.patch) > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Wilson updated PDFBOX-675: - Fix Version/s: 1.2.0 Summary: Upgrade .Net build to use IKVM version 0.42 (was: Upgrade .Net build to use IKVM version 0.42 - Opinions wanted) Done in revision 932016. > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > Attachments: dot_net.patch > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-675) Upgrade .Net build to use IKVM version 0.42
[ https://issues.apache.org/jira/browse/PDFBOX-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Wilson updated PDFBOX-675: - Attachment: dot_net.patch I have attached a patch for the .NET documentation. I'm kind of confused about how to update the documentation ... which is why I've added it as a patch. Thanks to whomever knows how to apply this! > Upgrade .Net build to use IKVM version 0.42 > > > Key: PDFBOX-675 > URL: https://issues.apache.org/jira/browse/PDFBOX-675 > Project: PDFBox > Issue Type: Improvement >Reporter: Daniel Wilson >Assignee: Daniel Wilson >Priority: Minor > Fix For: 1.2.0 > > Attachments: dot_net.patch > > > The current .Net build script (ant build.NET) is for IKVM 0.38, released 15 > months ago. > Since that time, IKVM has grown to support a larger portion of the Java > object model. I am currently investigating the possibility of improved font > support, as our IKVM-compiled version crashes if PDType1CFont.prepareAWTFont > is called. > The downside of the upgrade will be loss of support for the .Net 1.1 > Framework. In my opinion, that is not a big deal as very few projects still > rely on it. > I welcome opinions before committing any changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Issue Comment Edited: (PDFBOX-686) Invalid text rendering while printing a PDF
I already made a patch for that at another bug reported but it's not avail in trunk. the issue is with PDFStreamEngine. I'll attach the patch to that issue later today. Maruan Am 08.04.2010 um 13:56 schrieb "Bertrand GILLIS (JIRA)" : [ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854913#action_12854913 ] Bertrand GILLIS edited comment on PDFBOX-686 at 4/8/10 11:55 AM: - A printscreen image with the text rendering issue. was (Author: bgillis): A printscreen image whith the text rendering issue. Invalid text rendering while printing a PDF --- Key: PDFBOX-686 URL: https://issues.apache.org/jira/browse/PDFBOX-686 Project: PDFBox Issue Type: Bug Affects Versions: 1.0.0, 1.1.0 Environment: Windows XP SP3 32 bit Sun JDK 1.6.0_19 Reporter: Bertrand GILLIS Fix For: 1.2.0 Attachments: sample.jpg, sample.pdf, sample.xps The space between the last character and the previous character at the end of a line of text is expanded or shrinked of 2px depending on the printer selected. Steps to reproduce: - create a pdf with 1 page - add a phrase that wrap on 2 lines at least - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854913#action_12854913 ] Bertrand GILLIS edited comment on PDFBOX-686 at 4/8/10 11:55 AM: - A printscreen image with the text rendering issue. was (Author: bgillis): A printscreen image whith the text rendering issue. > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample.xps The sample pdf file printed to Microsoft XPS Document Writer > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample.jpg A printscreen image whith the text rendering issue. > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.jpg, sample.pdf, sample.xps > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-686) Invalid text rendering while printing a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bertrand GILLIS updated PDFBOX-686: --- Attachment: sample.pdf A sample pdf file to reproduce the bug. > Invalid text rendering while printing a PDF > --- > > Key: PDFBOX-686 > URL: https://issues.apache.org/jira/browse/PDFBOX-686 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0 > Environment: Windows XP SP3 32 bit > Sun JDK 1.6.0_19 >Reporter: Bertrand GILLIS > Fix For: 1.2.0 > > Attachments: sample.pdf > > > The space between the last character and the previous character at the end of > a line of text is expanded or shrinked of 2px depending on the printer > selected. > Steps to reproduce: > - create a pdf with 1 page > - add a phrase that wrap on 2 lines at least > - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PDFBOX-686) Invalid text rendering while printing a PDF
Invalid text rendering while printing a PDF --- Key: PDFBOX-686 URL: https://issues.apache.org/jira/browse/PDFBOX-686 Project: PDFBox Issue Type: Bug Affects Versions: 1.1.0, 1.0.0 Environment: Windows XP SP3 32 bit Sun JDK 1.6.0_19 Reporter: Bertrand GILLIS Fix For: 1.2.0 Attachments: sample.pdf The space between the last character and the previous character at the end of a line of text is expanded or shrinked of 2px depending on the printer selected. Steps to reproduce: - create a pdf with 1 page - add a phrase that wrap on 2 lines at least - print the pdf page throught org.apache.pdfbox.PrintPDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PDFBOX-684) Incorrect ordering of compound Arabic glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854862#action_12854862 ] Yigal Dayan edited comment on PDFBOX-684 at 4/8/10 8:15 AM: Attaching sample pdf and two utf8 outputs (before and after fix) was (Author: ydayan): Attaching sample pdf and two utf8 outputs (beore and after fix) > Incorrect ordering of compound Arabic glyphs > > > Key: PDFBOX-684 > URL: https://issues.apache.org/jira/browse/PDFBOX-684 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 1.0.0, 1.1.0 >Reporter: Yigal Dayan >Priority: Minor > Attachments: zzz.after_fix.txt, zzz.before_fix.txt, zzz.pdf > > Original Estimate: 3h > Remaining Estimate: 3h > > Some Arabic PDFs contain compound glyphs for stylistic reasons. > Such glyphs encode two letters: FI, SI, LI, LJ, LM, etc. > Before a line gets sent to the bidirectional algorithm, all characters have > been sorted into a visual order, except for these pairs. This is because they > are handled as one unit and maintain their original (logical) order. The bidi > algorithm straightens out most characters, but reverses the glyph pairs. > To fix this, the output of font.encode() should be examined and reversed on > the spot if it contains pairs of Arabic characters. Possibly you need to add > a stub method to PDFStreamEngine (in method processEncodedText) that > PDFTextStripper can override (in sort mode only). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PDFBOX-685) inefficient implementation in org.apache.pdfbox.util.ICU4JImpl.normalizeDiac()
inefficient implementation in org.apache.pdfbox.util.ICU4JImpl.normalizeDiac() -- Key: PDFBOX-685 URL: https://issues.apache.org/jira/browse/PDFBOX-685 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.1.0, 1.0.0 Reporter: Yigal Dayan Priority: Trivial The method normalizeDiac in org.apache.pdfbox.util.ICU4JImpl constructs a long string from individual characters. It should use StringBuilder instead of String. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-684) Incorrect ordering of compound Arabic glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yigal Dayan updated PDFBOX-684: --- Attachment: zzz.after_fix.txt zzz.before_fix.txt zzz.pdf Attaching sample pdf and two utf8 outputs (beore and after fix) > Incorrect ordering of compound Arabic glyphs > > > Key: PDFBOX-684 > URL: https://issues.apache.org/jira/browse/PDFBOX-684 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 1.0.0, 1.1.0 >Reporter: Yigal Dayan >Priority: Minor > Attachments: zzz.after_fix.txt, zzz.before_fix.txt, zzz.pdf > > Original Estimate: 3h > Remaining Estimate: 3h > > Some Arabic PDFs contain compound glyphs for stylistic reasons. > Such glyphs encode two letters: FI, SI, LI, LJ, LM, etc. > Before a line gets sent to the bidirectional algorithm, all characters have > been sorted into a visual order, except for these pairs. This is because they > are handled as one unit and maintain their original (logical) order. The bidi > algorithm straightens out most characters, but reverses the glyph pairs. > To fix this, the output of font.encode() should be examined and reversed on > the spot if it contains pairs of Arabic characters. Possibly you need to add > a stub method to PDFStreamEngine (in method processEncodedText) that > PDFTextStripper can override (in sort mode only). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PDFBOX-684) Incorrect ordering of compound Arabic glyphs
Incorrect ordering of compound Arabic glyphs Key: PDFBOX-684 URL: https://issues.apache.org/jira/browse/PDFBOX-684 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.1.0, 1.0.0 Reporter: Yigal Dayan Priority: Minor Some Arabic PDFs contain compound glyphs for stylistic reasons. Such glyphs encode two letters: FI, SI, LI, LJ, LM, etc. Before a line gets sent to the bidirectional algorithm, all characters have been sorted into a visual order, except for these pairs. This is because they are handled as one unit and maintain their original (logical) order. The bidi algorithm straightens out most characters, but reverses the glyph pairs. To fix this, the output of font.encode() should be examined and reversed on the spot if it contains pairs of Arabic characters. Possibly you need to add a stub method to PDFStreamEngine (in method processEncodedText) that PDFTextStripper can override (in sort mode only). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-186) NullPointerException in getAllKids with corrupted pdf
[ https://issues.apache.org/jira/browse/PDFBOX-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Jaquemet updated PDFBOX-186: Attachment: PDF-corrupted.pdf I submitted the original bug report on sourceforge back then. You'll find attached to this issue the original corrupted PDF file, and here is the java code to reproduce the bug : {code} public static void testPDFBOX186() throws IOException { File corruptedFile = new File("PDF-corrupted.pdf"); PDDocument pdfDocument = PDDocument.load(corruptedFile); StringWriter writer = new StringWriter(); PDFTextStripper stripper = new PDFTextStripper(); stripper.writeText(pdfDocument, writer); } {code} > NullPointerException in getAllKids with corrupted pdf > - > > Key: PDFBOX-186 > URL: https://issues.apache.org/jira/browse/PDFBOX-186 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Priority: Minor > Attachments: PDF-corrupted.pdf, PwC-Tech-Forecast-Spring-2009.pdf > > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1532246 > Originally submitted by ojaquemet on 2006-08-01 01:15. > java.lang.NullPointerException > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) > at > org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) > at > org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) > at [...] > Tested with PDFBox-0.7.2-log4j.jar and > PDFBox-0.7.3-dev-20060731.jar > Because the corrupted PDF is too big (7MB) to be > attached here, you'll be able to find it there: > http://olivier.jaquemet.free.fr/PDF-corrupted.pdf > [comment on SourceForge] > Originally sent by nobody. > Logged In: NO > I get this message too. How do you parse big PDFs? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PDFBOX-683) PDFStreamParser can't read "d0" and "d1" operators
PDFStreamParser can't read "d0" and "d1" operators -- Key: PDFBOX-683 URL: https://issues.apache.org/jira/browse/PDFBOX-683 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.0.0 Reporter: Eric Leleu Priority: Trivial Attachments: PDFStreamParser.patch Hi, I'm using the PDFBox 1.0.0 and I encountered a problem with the parsing of the glyph-XOjbect of Type3 font. According to PDF Reference, a glyph-XOjbect of Type3 font must start with the d0 operator or the d1 operator. During the glyph parsing, "d0" and "d1" are interpreted as "d" operator due to the readOperator method in the PDFStreamParser. In attachment, you can find a patch proposal to fix this problem. Regards, Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-683) PDFStreamParser can't read "d0" and "d1" operators
[ https://issues.apache.org/jira/browse/PDFBOX-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Leleu updated PDFBOX-683: -- Attachment: PDFStreamParser.patch > PDFStreamParser can't read "d0" and "d1" operators > -- > > Key: PDFBOX-683 > URL: https://issues.apache.org/jira/browse/PDFBOX-683 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.0.0 >Reporter: Eric Leleu >Priority: Trivial > Attachments: PDFStreamParser.patch > > > Hi, > I'm using the PDFBox 1.0.0 and I encountered a problem with the parsing of > the glyph-XOjbect of Type3 font. > According to PDF Reference, a glyph-XOjbect of Type3 font must start with the > d0 operator or the d1 operator. > During the glyph parsing, "d0" and "d1" are interpreted as "d" operator due > to the readOperator method in the PDFStreamParser. > In attachment, you can find a patch proposal to fix this problem. > Regards, > Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.