[jira] Commented: (PDFBOX-457) PDF to Image doesn't show correctly the document
[ https://issues.apache.org/jira/browse/PDFBOX-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853781#action_12853781 ] Andreas Lehmkühler commented on PDFBOX-457: --- The given example 580505.PR3.03.PDF uses a CCITTFaxDecode filter as compression algo (it is common for pdfs created by a FAX) . PDFBox doesn't have a builtin support for that filter and that's the reason why getRGBImage returns null. To read those kind of files, the ImageIO-lib [1] has to be added to the classpath . [1] https://jai-imageio.dev.java.net/ PDF to Image doesn't show correctly the document Key: PDFBOX-457 URL: https://issues.apache.org/jira/browse/PDFBOX-457 Project: PDFBox Issue Type: Bug Affects Versions: 0.8.0-incubator Reporter: Marcelo Tavares Assignee: Daniel Wilson Attachments: 580505.PR3.03.PDF, pdfbox-457-as_fax.pdf, pdfbox-457-Scan_from_a_Xerox_WorkCentre_Pro.PDF, pdfbox-457.PNG, testPDFToImage1.png I tried to convert the following document to image, but I got the attached result. It parsed just the text. I also tried different formats like JPG. I ran it using the PDFToImage class passing the document path as parameter. I've read that sometimes the document is not created respecting the PDF standard. But, is there a possibility to ignore it?! In fact, it's very important to me, so, could I use PDF Box despite of those errors? Thank you Marcelo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853794#action_12853794 ] Andreas Lehmkühler commented on PDFBOX-615: --- I'm not an expert, but AFAIU the shfill operator, you are not that far away from the solution. The shfill operator can be used similar to the fill operator under the following terms: - use Graphics2D.setPaint instead of Graphics2D.setColor, all needed information should be in the shading dictionary - take the current clipping area into amount - don't use the current path - use the path information from the shading dictionary (AFAIU that depends on the used function??) - if there aren't any path information in the dictionary, just use the clipping path - the current color in the grpahics state isn't used and must not be altered HTH shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-615: -- Attachment: Centerplan.pdf An other pdf example using a shading dictionary shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-441) remove CosName nameMap cache
[ https://issues.apache.org/jira/browse/PDFBOX-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Nestel updated PDFBOX-441: Attachment: COSName.java Remarks: - The cleanResources thing is a hack, in a major environment, since it is not clear who should call it when. - We had used a ConcurrentHashMap here at some other time. This caused major speed improvement then (older PDFbox anyway). However we realized we would not stand the leak. - What would really be grat would be a beast like http://www.stacksmash.com/jsr166y/ This would allow a ConcurrentHashMap using weak references, one could simply put all the statics in, since they are strongly references they will never get cleared. - In between attached find the beast we are currently relying upon, which is weakreferences done right (the PDFbox 1.1 version is still leaky, since each COSname keeps a strong reference to its key) and with (semi-)fast read/write locking. - Note that we removed the hashCode field member is a deoptimization, since common Java implementations have an hashCode field in their String class anyway (this wasn't true in earlier times, so for old environments this field might still be an optimization) remove CosName nameMap cache Key: PDFBOX-441 URL: https://issues.apache.org/jira/browse/PDFBOX-441 Project: PDFBox Issue Type: Improvement Affects Versions: 0.7.3 Reporter: Sean Bridges Priority: Minor Fix For: 1.2.0 Attachments: COSName.java The CosName class keeps a cache of all instances created in a static synchronized map. I am guessing this is for performance reasons to avoid creating objects, but in our system it is causing performance problems. We are running 7 threads extracting text from pdf's, and we can see a large number of conflicts reading from nameMap. The CosName map is also a potential memory leak, which forces users to periodically clear it, as noted in PDFBOX-351 Can nameMap be removed altogether? At the least, if PDSimpleFont replaced, COSName.getPDFName( FontDescriptor ) with COSName.FONT_DESC It would reduce contention. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-679) Corruption of Arabic output due to Japanese bug fix
[ https://issues.apache.org/jira/browse/PDFBOX-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yigal Dayan updated PDFBOX-679: --- Attachment: zzz.pdf Hi Takashi, I'm attaching an Arabic PDF used as a testcase. Yigal Corruption of Arabic output due to Japanese bug fix --- Key: PDFBOX-679 URL: https://issues.apache.org/jira/browse/PDFBOX-679 Project: PDFBox Issue Type: Bug Affects Versions: 1.1.0 Reporter: Andreas Lehmkühler Attachments: zzz.pdf The recent Japanese bug fix in org.apache.pdfbox.pdmodel.font.PDFont defines a set of encoding names that are given special CJK treatment. This set is too broad. For example, it stipulates that the 'Identity-H' encoding should be processed as JIS. We have Arabic PDFs where compound Arabic glyphs use the 'Identity-H' encoding. In pdfBox 1.0.0 they used to output Arabic but now they output garbage, because the Arabic unicode data is sent to the CJK converter. I've copied that description from the users mailing list [1] [1] http://markmail.org/thread/w5iof5hr3yqhthsp -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutialitions of the original pdf.
Adding XMP data to a PDF causes different kinds of mutialitions of the original pdf. Key: PDFBOX-680 URL: https://issues.apache.org/jira/browse/PDFBOX-680 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 0.7.3 Environment: Windows XP Reporter: Rene Smit Priority: Blocker Fix For: 0.7.3 We are using PdfBox for a Material Workflow application for one of the major Newspaper publishers in the Netherland. One of the things we use PdfBox for is adding MMP data to the XML file. Doing this causes different kinds of mutilation of the original pdf. The way in which this occurs varies. Sometimes a character is altered, sometimes an element or complete ad is mutilated, sometimes the color of/in an ad is changed. These files also tend crash Adobe Acrobat (Professional 9, with Pitstop Professional) (not all files); The files also may create a Failed to open PDF file when trying to place it InDesign (not all files). We use the following source in out application: InputStream pdfStream = Core.getFileDocumentContent(pdfFileDocument.getMendixObject()); PDDocument pdfDoc = PDDocument.load(pdfStream); PDDocumentInformation pdfInfo = pdfDoc.getDocumentInformation(); IMendixObject materiaalMetaMendixObject = xmpDocument.getMendixObject(); SetString memberKeys = materiaalMetaMendixObject.getMembers().keySet(); for (String memberKey : memberKeys) { Object member = materiaalMetaMendixObject.getMember(memberKey).getValue(); if (member!= null) { String memberString = member.toString(); if (memberKey.startsWith(XMP)) pdfInfo.setCustomMetadataValue(memberKey, memberString); } } pdfDoc.setDocumentInformation(pdfInfo); pdfDoc.save(pdfOutputPath + File.separator + fileName); pdfDoc.close(); pdfStream.close(); Please HELP -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutilations of the original pdf.
[ https://issues.apache.org/jira/browse/PDFBOX-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-680: -- Fix Version/s: (was: 0.7.3) Adding XMP data to a PDF causes different kinds of mutilations of the original pdf. --- Key: PDFBOX-680 URL: https://issues.apache.org/jira/browse/PDFBOX-680 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 0.7.3 Environment: Windows XP Reporter: Rene Smit Priority: Blocker Attachments: Examples Pdf Mutilations.jpg We are using PdfBox for a Material Workflow application for one of the major Newspaper publishers in the Netherland. One of the things we use PdfBox for is adding MMP data to the XML file. Doing this causes different kinds of mutilation of the original pdf. The way in which this occurs varies. Sometimes a character is altered, sometimes an element or complete ad is mutilated, sometimes the color of/in an ad is changed. These files also tend crash Adobe Acrobat (Professional 9, with Pitstop Professional) (not all files); The files also may create a Failed to open PDF file when trying to place it InDesign (not all files). We use the following source in out application: InputStream pdfStream = Core.getFileDocumentContent(pdfFileDocument.getMendixObject()); PDDocument pdfDoc = PDDocument.load(pdfStream); PDDocumentInformation pdfInfo = pdfDoc.getDocumentInformation(); IMendixObject materiaalMetaMendixObject = xmpDocument.getMendixObject(); SetString memberKeys = materiaalMetaMendixObject.getMembers().keySet(); for (String memberKey : memberKeys) { Object member = materiaalMetaMendixObject.getMember(memberKey).getValue(); if (member!= null) { String memberString = member.toString(); if (memberKey.startsWith(XMP)) pdfInfo.setCustomMetadataValue(memberKey, memberString); } } pdfDoc.setDocumentInformation(pdfInfo); pdfDoc.save(pdfOutputPath + File.separator + fileName); pdfDoc.close(); pdfStream.close(); Please HELP -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutilations of the original pdf.
[ https://issues.apache.org/jira/browse/PDFBOX-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854070#action_12854070 ] Andreas Lehmkühler commented on PDFBOX-680: --- Am I right that you are using the quite old version 0.7.3? Did you ever try a more recent version, e.g. 1.1.0? If the pdf is somehow scrambled, what kind of MetaData did you add to those pdfs? Is it possible to get a hand of at least one of these files? Adding XMP data to a PDF causes different kinds of mutilations of the original pdf. --- Key: PDFBOX-680 URL: https://issues.apache.org/jira/browse/PDFBOX-680 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 0.7.3 Environment: Windows XP Reporter: Rene Smit Priority: Blocker Attachments: Examples Pdf Mutilations.jpg We are using PdfBox for a Material Workflow application for one of the major Newspaper publishers in the Netherland. One of the things we use PdfBox for is adding MMP data to the XML file. Doing this causes different kinds of mutilation of the original pdf. The way in which this occurs varies. Sometimes a character is altered, sometimes an element or complete ad is mutilated, sometimes the color of/in an ad is changed. These files also tend crash Adobe Acrobat (Professional 9, with Pitstop Professional) (not all files); The files also may create a Failed to open PDF file when trying to place it InDesign (not all files). We use the following source in out application: InputStream pdfStream = Core.getFileDocumentContent(pdfFileDocument.getMendixObject()); PDDocument pdfDoc = PDDocument.load(pdfStream); PDDocumentInformation pdfInfo = pdfDoc.getDocumentInformation(); IMendixObject materiaalMetaMendixObject = xmpDocument.getMendixObject(); SetString memberKeys = materiaalMetaMendixObject.getMembers().keySet(); for (String memberKey : memberKeys) { Object member = materiaalMetaMendixObject.getMember(memberKey).getValue(); if (member!= null) { String memberString = member.toString(); if (memberKey.startsWith(XMP)) pdfInfo.setCustomMetadataValue(memberKey, memberString); } } pdfDoc.setDocumentInformation(pdfInfo); pdfDoc.save(pdfOutputPath + File.separator + fileName); pdfDoc.close(); pdfStream.close(); Please HELP -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Text rendering modes in PDFBox (PDFBOX-678)
would someone like to comment on PDFBOX-678 or shall I simply move forward and start implementing it as proposed? Maruan Sahyoun
Re: Text rendering modes in PDFBox (PDFBOX-678)
Hi, Maruan Sahyoun schrieb: would someone like to comment on PDFBOX-678 or shall I simply move forward and start implementing it as proposed? Sorry for answering that late. Move forward as proposed and attach a patch to PDFBOX-678. If you are not sure to be on the right way just post some code in between. Thanks in advance. Andreas Lehmkühler
[jira] Commented: (PDFBOX-616) Invalid Images Returned
[ https://issues.apache.org/jira/browse/PDFBOX-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854300#action_12854300 ] James A. Thomas commented on PDFBOX-616: Tom: This was most certainly the problem. I added those libraries, and my code works fine now. Thank you! Seems like this should be a comment somewhere in the PDFBox documentation? Alan Invalid Images Returned --- Key: PDFBOX-616 URL: https://issues.apache.org/jira/browse/PDFBOX-616 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 0.8.0-incubator Environment: Multiple (Windows) Reporter: James A. Thomas Attachments: TIFFimageProblem.pdf When getting images from a PDF document using PDXObjectImage (code fragment below), it returns an image with invalid characteristics. The PDXObjectImage is not null, but attributes like .getColorSpace() return null. The image has a height and width, but the getRGBImage() method returns null. This happens on EVERY image of the attached file. Code fragment and output is shown below. If I use the write2file() method of PDXObjectImage to write out the image to a fil, then I get a valid image. (At least, it displays fine.) Code Fragment: // Get a list of pages from the input PDF document List pages = InputDoc.getDocumentCatalog().getAllPages(); // Process each page int i = 0; for (Object obj : pages) { String Barcode = null; i++; PDPage page = (PDPage)obj; // Get the image on the page and process it PDResources resources = page.getResources(); Map images = resources.getImages(); System.out.println(Found + images.size() + images on Page + i); if( images != null ) { Iterator imageIter = images.keySet().iterator(); while ( imageIter.hasNext() ) { String key = (String)imageIter.next(); System.out.println(key = + key); PDXObjectImage image = (PDXObjectImage)images.get( key ); if (image != null) { System.out.println(Image subtype = + image.SUB_TYPE.toString()); System.out.println(Image suffix = + image.getSuffix()); System.out.println(PDX image has height = + image.getHeight() + and width + image.getWidth()); // Convert image to a Buffered Image, so we can // look for a barcode and decode it BufferedImage RGBimage = image.getRGBImage(); if (RGBimage == null) System.out.println(RGBimage is null); } } } Output: Found 1 images on Page 1 key = Obj3 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 2 key = Obj8 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 3 key = Obj13 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 4 key = Obj18 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 5 key = Obj23 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 6 key = Obj28 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 7 key = Obj33 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 8 key = Obj38 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 9 key = Obj43 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null Found 1 images on Page 10 key = Obj48 Image subtype = Image Image suffix = tiff PDX image has height = 2335 and width 1651 RGBimage is null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.