[jira] Commented: (PDFBOX-457) PDF to Image doesn't show correctly the document

2010-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853781#action_12853781
 ] 

Andreas Lehmkühler commented on PDFBOX-457:
---

The given example 580505.PR3.03.PDF uses a CCITTFaxDecode filter as 
compression algo (it is common for pdfs created by a FAX) . PDFBox doesn't have 
a builtin support for that filter and that's the reason why getRGBImage returns 
null. To read those kind of files, the ImageIO-lib [1] has to be added to the 
classpath .


[1] https://jai-imageio.dev.java.net/

 PDF to Image doesn't show correctly the document
 

 Key: PDFBOX-457
 URL: https://issues.apache.org/jira/browse/PDFBOX-457
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 0.8.0-incubator
Reporter: Marcelo Tavares
Assignee: Daniel Wilson
 Attachments: 580505.PR3.03.PDF, pdfbox-457-as_fax.pdf, 
 pdfbox-457-Scan_from_a_Xerox_WorkCentre_Pro.PDF, pdfbox-457.PNG, 
 testPDFToImage1.png


 I tried to convert the following document to image, but I got the attached 
 result. 
 It parsed just the text. I also tried different formats like JPG.  I ran it 
 using the PDFToImage class passing the document path as parameter. 
 I've read that sometimes the document is not created respecting the PDF 
 standard. But, is there a possibility to ignore it?! In fact, it's very 
 important to me, so, could I use PDF Box despite of those errors? 
 Thank you
 Marcelo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PDFBOX-615) shfill operator needs implementation

2010-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853794#action_12853794
 ] 

Andreas Lehmkühler commented on PDFBOX-615:
---

I'm not an expert, but AFAIU the shfill operator, you are not that far away 
from the solution. The shfill operator can be used similar to the fill operator 
under the following terms:

- use Graphics2D.setPaint instead of Graphics2D.setColor, all needed 
information should be in the shading dictionary
- take the current clipping area into amount
- don't use the current path
- use the path information from the shading dictionary (AFAIU that depends on 
the used function??)
- if there aren't any path information in the dictionary, just use the clipping 
path
- the current color in the grpahics state isn't used and must not be altered

HTH

 shfill operator needs implementation
 

 Key: PDFBOX-615
 URL: https://issues.apache.org/jira/browse/PDFBOX-615
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: Daniel Wilson
Assignee: Daniel Wilson

 I have a PDF file (for which I do not yet have release permission) that uses 
 the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 
 987).
 Adobe provides implementation guidance in a 78-page document at 
 http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17
 I will be trying to add this functionality this week, but if anyone has 
 hints, suggestions, etc. they are most certainly welcome!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PDFBOX-615) shfill operator needs implementation

2010-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-615:
--

Attachment: Centerplan.pdf

An other pdf example using a shading dictionary

 shfill operator needs implementation
 

 Key: PDFBOX-615
 URL: https://issues.apache.org/jira/browse/PDFBOX-615
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: Daniel Wilson
Assignee: Daniel Wilson
 Attachments: Centerplan.pdf


 I have a PDF file (for which I do not yet have release permission) that uses 
 the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 
 987).
 Adobe provides implementation guidance in a 78-page document at 
 http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17
 I will be trying to add this functionality this week, but if anyone has 
 hints, suggestions, etc. they are most certainly welcome!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PDFBOX-441) remove CosName nameMap cache

2010-04-06 Thread Frank Nestel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Nestel updated PDFBOX-441:


Attachment: COSName.java

Remarks:
- The cleanResources thing is a hack, in a major environment, since it is not 
clear who should call it when. 
- We had used a ConcurrentHashMap here at some other time. This caused major 
speed improvement then (older PDFbox anyway). However we realized we would not 
stand the leak.
- What would really be grat would be a beast like 
http://www.stacksmash.com/jsr166y/ This would allow a ConcurrentHashMap using 
weak references, one could simply put all the statics in, since they are 
strongly references they will never get cleared.
- In between attached find the beast we are currently relying upon, which is 
weakreferences done right (the PDFbox 1.1 version is still leaky, since each 
COSname keeps a strong reference to its key) and with (semi-)fast read/write 
locking.
- Note that we removed the hashCode field member is a deoptimization, since 
common Java implementations have an hashCode field in their String class anyway 
(this wasn't true in earlier times, so for old environments this field might 
still be an optimization)

 remove CosName nameMap cache
 

 Key: PDFBOX-441
 URL: https://issues.apache.org/jira/browse/PDFBOX-441
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 0.7.3
Reporter: Sean Bridges
Priority: Minor
 Fix For: 1.2.0

 Attachments: COSName.java


 The CosName class keeps a cache of all instances created in a static 
 synchronized map.  I am guessing this is for performance reasons to avoid 
 creating objects, but in our system it is causing performance problems.  We 
 are running 7 threads extracting text from pdf's, and we can see a large 
 number of conflicts reading from nameMap.
 The CosName map is also a potential memory leak, which forces users to 
 periodically clear it, as noted in PDFBOX-351
 Can nameMap be removed altogether?
 At the least, if PDSimpleFont replaced, 
  COSName.getPDFName( FontDescriptor ) 
 with 
 COSName.FONT_DESC
 It would reduce contention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PDFBOX-679) Corruption of Arabic output due to Japanese bug fix

2010-04-06 Thread Yigal Dayan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yigal Dayan updated PDFBOX-679:
---

Attachment: zzz.pdf

Hi Takashi, 

I'm attaching an Arabic PDF used as a testcase.

Yigal


 Corruption of Arabic output due to Japanese bug fix
 ---

 Key: PDFBOX-679
 URL: https://issues.apache.org/jira/browse/PDFBOX-679
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Andreas Lehmkühler
 Attachments: zzz.pdf


 The recent Japanese bug fix in org.apache.pdfbox.pdmodel.font.PDFont
 defines a set of encoding names that are given special CJK treatment. This 
 set is too broad. For example, it stipulates that the 'Identity-H' encoding 
 should be processed as JIS.
 We have Arabic PDFs where compound Arabic glyphs use the 'Identity-H' 
 encoding. In pdfBox 1.0.0 they used to output Arabic but now they output 
 garbage, because the Arabic unicode data is sent to the CJK converter.
 I've copied that description from the users mailing list [1]
 [1] http://markmail.org/thread/w5iof5hr3yqhthsp

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutialitions of the original pdf.

2010-04-06 Thread Rene Smit (JIRA)
Adding XMP data to a PDF causes different kinds of mutialitions of the original 
pdf.


 Key: PDFBOX-680
 URL: https://issues.apache.org/jira/browse/PDFBOX-680
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Affects Versions: 0.7.3
 Environment: Windows XP
Reporter: Rene Smit
Priority: Blocker
 Fix For: 0.7.3


We are using PdfBox for a Material Workflow application for one of the major 
Newspaper publishers in the Netherland.
One of the things we use PdfBox for is adding MMP data to the XML file.
Doing this causes different kinds of mutilation of the original pdf.

The way in which this occurs varies. Sometimes a character is altered, 
sometimes an element or complete ad is mutilated, sometimes the color of/in an 
ad is changed.
These files also tend crash Adobe Acrobat (Professional 9, with Pitstop 
Professional) (not all files);
The files also may create a Failed to open PDF file when trying to place it 
InDesign (not all files). 


We use the following source in out application:

InputStream pdfStream =  
Core.getFileDocumentContent(pdfFileDocument.getMendixObject());
PDDocument pdfDoc = PDDocument.load(pdfStream);
PDDocumentInformation pdfInfo = 
pdfDoc.getDocumentInformation();

IMendixObject materiaalMetaMendixObject = 
xmpDocument.getMendixObject();  
SetString memberKeys = 
materiaalMetaMendixObject.getMembers().keySet();
for (String memberKey : memberKeys) {
Object member = 
materiaalMetaMendixObject.getMember(memberKey).getValue();
if (member!= null) {
String memberString = member.toString();
if (memberKey.startsWith(XMP))

pdfInfo.setCustomMetadataValue(memberKey, memberString);
}
}
pdfDoc.setDocumentInformation(pdfInfo);

pdfDoc.save(pdfOutputPath + File.separator + fileName);
pdfDoc.close();
pdfStream.close();

Please HELP

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutilations of the original pdf.

2010-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-680:
--

Fix Version/s: (was: 0.7.3)

 Adding XMP data to a PDF causes different kinds of mutilations of the 
 original pdf.
 ---

 Key: PDFBOX-680
 URL: https://issues.apache.org/jira/browse/PDFBOX-680
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Affects Versions: 0.7.3
 Environment: Windows XP
Reporter: Rene Smit
Priority: Blocker
 Attachments: Examples Pdf Mutilations.jpg


 We are using PdfBox for a Material Workflow application for one of the major 
 Newspaper publishers in the Netherland.
 One of the things we use PdfBox for is adding MMP data to the XML file.
 Doing this causes different kinds of mutilation of the original pdf.
 The way in which this occurs varies. Sometimes a character is altered, 
 sometimes an element or complete ad is mutilated, sometimes the color of/in 
 an ad is changed.
 These files also tend crash Adobe Acrobat (Professional 9, with Pitstop 
 Professional) (not all files);
 The files also may create a Failed to open PDF file when trying to place it 
 InDesign (not all files). 
 We use the following source in out application:
   InputStream pdfStream =  
 Core.getFileDocumentContent(pdfFileDocument.getMendixObject());
   PDDocument pdfDoc = PDDocument.load(pdfStream);
   PDDocumentInformation pdfInfo = 
 pdfDoc.getDocumentInformation();
   
   IMendixObject materiaalMetaMendixObject = 
 xmpDocument.getMendixObject();  
   SetString memberKeys = 
 materiaalMetaMendixObject.getMembers().keySet();
   for (String memberKey : memberKeys) {
   Object member = 
 materiaalMetaMendixObject.getMember(memberKey).getValue();
   if (member!= null) {
   String memberString = member.toString();
   if (memberKey.startsWith(XMP))
   
 pdfInfo.setCustomMetadataValue(memberKey, memberString);
   }
   }
   pdfDoc.setDocumentInformation(pdfInfo);
   pdfDoc.save(pdfOutputPath + File.separator + fileName);
   pdfDoc.close();
   pdfStream.close();
 Please HELP

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PDFBOX-680) Adding XMP data to a PDF causes different kinds of mutilations of the original pdf.

2010-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854070#action_12854070
 ] 

Andreas Lehmkühler commented on PDFBOX-680:
---

Am I right that you are using the quite old version 0.7.3? 
Did you ever try a more recent version, e.g. 1.1.0?
If the pdf is somehow scrambled, what kind of MetaData did you add to those 
pdfs?
Is it possible to get a hand of at least one of these files?

 Adding XMP data to a PDF causes different kinds of mutilations of the 
 original pdf.
 ---

 Key: PDFBOX-680
 URL: https://issues.apache.org/jira/browse/PDFBOX-680
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Affects Versions: 0.7.3
 Environment: Windows XP
Reporter: Rene Smit
Priority: Blocker
 Attachments: Examples Pdf Mutilations.jpg


 We are using PdfBox for a Material Workflow application for one of the major 
 Newspaper publishers in the Netherland.
 One of the things we use PdfBox for is adding MMP data to the XML file.
 Doing this causes different kinds of mutilation of the original pdf.
 The way in which this occurs varies. Sometimes a character is altered, 
 sometimes an element or complete ad is mutilated, sometimes the color of/in 
 an ad is changed.
 These files also tend crash Adobe Acrobat (Professional 9, with Pitstop 
 Professional) (not all files);
 The files also may create a Failed to open PDF file when trying to place it 
 InDesign (not all files). 
 We use the following source in out application:
   InputStream pdfStream =  
 Core.getFileDocumentContent(pdfFileDocument.getMendixObject());
   PDDocument pdfDoc = PDDocument.load(pdfStream);
   PDDocumentInformation pdfInfo = 
 pdfDoc.getDocumentInformation();
   
   IMendixObject materiaalMetaMendixObject = 
 xmpDocument.getMendixObject();  
   SetString memberKeys = 
 materiaalMetaMendixObject.getMembers().keySet();
   for (String memberKey : memberKeys) {
   Object member = 
 materiaalMetaMendixObject.getMember(memberKey).getValue();
   if (member!= null) {
   String memberString = member.toString();
   if (memberKey.startsWith(XMP))
   
 pdfInfo.setCustomMetadataValue(memberKey, memberString);
   }
   }
   pdfDoc.setDocumentInformation(pdfInfo);
   pdfDoc.save(pdfOutputPath + File.separator + fileName);
   pdfDoc.close();
   pdfStream.close();
 Please HELP

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Text rendering modes in PDFBox (PDFBOX-678)

2010-04-06 Thread Maruan Sahyoun
would someone like to comment on PDFBOX-678 or shall I simply move forward and 
start implementing it as proposed?

Maruan Sahyoun



Re: Text rendering modes in PDFBox (PDFBOX-678)

2010-04-06 Thread Andreas Lehmkuehler

Hi,


Maruan Sahyoun schrieb:

would someone like to comment on PDFBOX-678 or shall I simply move forward and 
start implementing it as proposed?

Sorry for answering that late. Move forward as proposed and attach a patch to
PDFBOX-678. If you are not sure to be on the right way just post some code
in between.

Thanks in advance.
Andreas Lehmkühler


[jira] Commented: (PDFBOX-616) Invalid Images Returned

2010-04-06 Thread James A. Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854300#action_12854300
 ] 

James A. Thomas commented on PDFBOX-616:


Tom: This was most certainly the problem.  I added those libraries, and my code 
works fine now.  Thank you!  

Seems like this should be a comment somewhere in the PDFBox documentation?

Alan


 Invalid Images Returned
 ---

 Key: PDFBOX-616
 URL: https://issues.apache.org/jira/browse/PDFBOX-616
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 0.8.0-incubator
 Environment: Multiple (Windows)
Reporter: James A. Thomas
 Attachments: TIFFimageProblem.pdf


 When getting images from a PDF document using PDXObjectImage (code fragment 
 below), it returns an image with invalid characteristics.  The PDXObjectImage 
 is not null, but attributes like .getColorSpace() return null.  The image has 
 a height and width, but the getRGBImage() method returns null.
 This happens on EVERY image of the attached file.  Code fragment and output 
 is shown below.  
 If I use the write2file() method of PDXObjectImage to write out the image to 
 a fil, then I get a valid image.  (At least, it displays fine.)
 Code Fragment:
 // Get a list of pages from the input PDF document
 List pages = InputDoc.getDocumentCatalog().getAllPages();
 // Process each page
 int i = 0;
 for (Object obj : pages)
 {
 String Barcode = null;
 i++;
 PDPage page = (PDPage)obj;
 // Get the image on the page and process it
 PDResources resources = page.getResources();
 Map images = resources.getImages();
 System.out.println(Found  + images.size() +  images on 
 Page  + i);
 if( images != null )
 {
 Iterator imageIter = images.keySet().iterator();
 while ( imageIter.hasNext() )
 {
 String key = (String)imageIter.next();
 System.out.println(key =  + key);
 PDXObjectImage image = 
 (PDXObjectImage)images.get( key );
 if (image != null)
 {
 System.out.println(Image subtype =  + 
 image.SUB_TYPE.toString());
 System.out.println(Image suffix =  + 
 image.getSuffix());
 System.out.println(PDX image has height =  + 
 image.getHeight()
 +  and width  + 
 image.getWidth());
 // Convert image to a Buffered Image, so we can
 // look for a barcode and decode it
 BufferedImage RGBimage = image.getRGBImage();
 if (RGBimage == null)
 System.out.println(RGBimage is null);
 }
 }
 }
 Output:
 Found 1 images on Page 1
 key = Obj3
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 2
 key = Obj8
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 3
 key = Obj13
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 4
 key = Obj18
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 5
 key = Obj23
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 6
 key = Obj28
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 7
 key = Obj33
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 8
 key = Obj38
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 9
 key = Obj43
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null
 Found 1 images on Page 10
 key = Obj48
 Image subtype = Image
 Image suffix = tiff
 PDX image has height = 2335 and width 1651
 RGBimage is null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.