Problem with processTextPosition

2014-05-16 Thread DImuthu Upeksha
Hi all,

I was tying to manually feed text position objects to
processTextPosition method in PDFTextStripper class. I created a sub
class of PDFTextStripper and override processStream method. In
processStream method I manually created two text position objects for
words "W" and "H". At the end I passed them to processTextPosition

processTextPosition(textPosition1);
processTextPosition(textPosition2);

Then I tested it using

PDFTextStripper ocrStripper = new PDFOCRTextStripper();
PDDocument document = PDDocument.load("some pdf file");
String data = ocrStripper.getText(document);
System.out.println(data);

Output was : H W

Then I changed the sequence of passing TextPosition objects in [1]

processTextPosition(textPosition2);
processTextPosition(textPosition1);

Output was : WH

--

As far as I understood processTextPosition works with the text
position metadata like x and y co-ordinates of the input text. It
should not depend on the order of the input sequence. But in case It
seems like processTextPosition method works according to order of
input.
Ex. If I input W first, it prints W first without considering it's
actual position.

Is this the normal behaviour? Or am I missing something here?

[1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
-- 
Regards

W.Dimuthu Upeksha
Undergraduate

Department of Computer Science And Engineering

University of Moratuwa, Sri Lanka


[jira] [Commented] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000649#comment-14000649
 ] 

Tilman Hausherr commented on PDFBOX-1756:
-

It doesn't happen with the non sequential parser, but it happens when saving 
that file:
{code}
Exception in thread "main" java.lang.ClassCastException: 
org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
at 
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:519)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:449)
at 
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1099)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:555)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1364)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1238)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1220)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:455)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:189)
{code}

> ClassCastException CosString cannot be cast to COSName
> --
>
> Key: PDFBOX-1756
> URL: https://issues.apache.org/jira/browse/PDFBOX-1756
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.2
> Environment: Ubuntu Linux & Windows 7 (both JDK6)
>Reporter: William Palmer
>Priority: Minor
> Attachments: testPDF_twoAuthors.pdf
>
>
> Opening and saving a PDF causes this exception in 1.8.2:
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSString cannot be cast to 
> org.apache.pdfbox.cos.COSName
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507)
>   at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
>   at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
>   at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> The PDF is here: 
> http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf
> Code to reproduce the exception:
> PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf")));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999265#comment-13999265
 ] 

Tilman Hausherr commented on PDFBOX-2079:
-

The bug happens since rev 1585781, which fixed PDFBOX-2016, which was another 
bug with lengths. I suspect that because of the sequential parsing, the correct 
length wasn't available when reading the PDF, so we were reading "endstream" 
(although the length is available downwards!). That length read was wrong 
because of what you mentioned in the beginning. 

I will need to find out why the sequential parser reads CR LF, whether this is 
correct or not, and whether it can be changed.

Anyway, it shows once again that you shouldn't use load(). There's an 
useNonSequentialParser config option in TIKA.

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000320#comment-14000320
 ] 

Juraj Lonc commented on PDFBOX-2081:


I have also tried to replace
{code}
graphics.setClip(getGraphicsState().getCurrentClippingPath());
{code}
by
{code}
Rectangle2D rc0=getGraphicsState().getCurrentClippingPath().getBounds2D();
Rectangle2D rc1=new Rectangle2D.Double(rc0.getMinX(), rc0.getMinY(), 
rc0.getWidth()+1000, rc0.getHeight());
graphics.setClip(rc1);
{code}
so I made clipping area wider. This "helped" too - lines were rendered.

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, 
> rendered_(with_null_clipping).png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715
 ] 

proba edited comment on PDFBOX-2078 at 5/15/14 1:55 PM:


Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a colour changing problem for myself in barcode pdf to image 
transformation, but thats a different story.

If you happen to know the answer that would be lovely.

The barcode colours on the picture get inverted (black goes to white and white 
goes to black) which i saw was reported before on these forums.
Is there an easy known solution to this?


was (Author: proba):
Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a font changing problem for myself in barcode pdf to image 
transformation, but thats a different story.

If you happen to know the answer that would be lovely.

The barcode colours on the picture get inverted (black goes to white and white 
goes to black) which i saw was reported before on these forums.
Is there an easy known solution to this?

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2081:
--

 Summary: Lines that exceeds clipping area are not drawn
 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered.png

PDF contains shapes that are partly on the paper and partly outside (shape 
overflows paper borders).
Those shapes are not rendered to image.

It is caused by clipping area.
When I replace line in PDFDrawer.strokePath()
{noformat}
graphics.setClip(getGraphicsState().getCurrentClippingPath());
{noformat}
to
{noformat}
graphics.setClip(null);
{noformat}
then everything is rendered correctly.

Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715
 ] 

proba commented on PDFBOX-2078:
---

writing them down with imageIOwrite.
To be precise:

ImageIO.write(bi, "jpg", new File("d:\\pdfimageold"+count+".jpg"));

Tried other types as well naturally. 

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998936#comment-13998936
 ] 

Tilman Hausherr commented on PDFBOX-2079:
-

One good news: it does not happen with loadNonSeq(). Only with load().

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-1994.
--

Resolution: Not a Problem
  Assignee: Andreas Lehmkühler

Set to closed, as the issue seems to be about the used environment and not 
about PDFBox.

> PDDocument.load(filename.pdf) hangs for pdf files having size
> -
>
> Key: PDFBOX-1994
> URL: https://issues.apache.org/jira/browse/PDFBOX-1994
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: brijesh
>Assignee: Andreas Lehmkühler
>
> The below code i am using for loading my pdf. but my pdf file is not a zero 
> sized files and having full permission and it is not a corrupt file also. but 
> i ddint get any error after code. it just hangs. 
> it is working in local, but not working in server .
> (created ,jar files and then exe, then the .exe will excuted in the server)
> java using 1,4
> PDDocument pdf=PDDocument.load("d:\\filename.pdf");
> pdf.print();
> please provide me why the same code is not working in server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605
 ] 

Andreas Lehmkühler edited comment on PDFBOX-1463 at 5/15/14 9:45 AM:
-

I ran into this problem recently as well. I am experiencing this issue on a 
Solaris machine as well as on an Ubuntu box. 
I am using Java 1.6 on both machines and it only happens with certain Arial 
Fonts e.g.: 

JFIGPU+Arial-BoldMT
KLSYIK+ArialMT

Normal Arial works just fine though and it appears to be rendered correctly. 
I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes 
because the original PDF is again confidential. Before using PDFBox 2.0.0 this 
PDF caused a JVM crash just as described in PDFBOX-1426



was (Author: francesca.herpertz):
I ran into this problem recently as well. I am experiencing this issue on a 
Solaris machine as well as on an Ubuntu box. 
I am using Java 1.6 on both machines and it only happens with certain Arial 
Fonts e.g.: 

JFIGPU+Arial-BoldMT
KLSYIK+ArialMT

Normal Arial works just fine though and it appears to be rendered correctly. 
I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes 
because the original PDF is again confidential. Before using PDFBox 2.0.0 this 
PDF caused a JVM crash just as described in this jira ticket - PDFBox-1426. 


> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-16 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2070:
---

 Summary: Filter.decode() modifies PDF if there is a filter array
 Key: PDFBOX-2070
 URL: https://issues.apache.org/jira/browse/PDFBOX-2070
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Attachments: after.pdf, before.pdf

If there are several filters (filter array) in an image, PDFBox is inserting an 
empty DecodeParms object, instead of either inserting an empty COSAarray, or 
(better) do nothing. Saving such a PDF results in it not being displayable in 
the Acrobat Reader.

Test code:
{code}
PDDocument d = PDDocument.load("before.pdf");
new PDFRenderer(d).renderImage(0);
d.save("after.pdf");
{code}
The rendering is important because without it, the filtered objects aren't 
decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName

2014-05-16 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated PDFBOX-1756:


Attachment: testPDF_twoAuthors.pdf

Shareable test document from TIKA-1252.  Same issue.

> ClassCastException CosString cannot be cast to COSName
> --
>
> Key: PDFBOX-1756
> URL: https://issues.apache.org/jira/browse/PDFBOX-1756
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.2
> Environment: Ubuntu Linux & Windows 7 (both JDK6)
>Reporter: William Palmer
>Priority: Minor
> Attachments: testPDF_twoAuthors.pdf
>
>
> Opening and saving a PDF causes this exception in 1.8.2:
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSString cannot be cast to 
> org.apache.pdfbox.cos.COSName
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507)
>   at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
>   at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
>   at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> The PDF is here: 
> http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf
> Code to reproduce the exception:
> PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf")));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered_(missing_lines).png

Previously uploaded file was not the one I wanted to upload. Now I have 
attached image that was actually rendered

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread proba (JIRA)
proba created PDFBOX-2080:
-

 Summary: Barcode getting color inverted in pdf to image conversion
 Key: PDFBOX-2080
 URL: https://issues.apache.org/jira/browse/PDFBOX-2080
 Project: PDFBox
  Issue Type: Bug
Reporter: proba
 Attachments: FPR0T9.pdf, slika2_3.jpg

While converting a 1 page pdf to an image (both attached below), the image 
converts properly, however the barcodes colours invert.

The code used to do the conversion looks like this right now:

  public static void convertPDFToJPG(String src){

try{

  //load pdf file in the document object
  PDDocument doc=PDDocument.load(new FileInputStream(src));
  //Get all pages from document and store them in a list
  List pages=doc.getDocumentCatalog().getAllPages();
  //create iterator object so it is easy to access each page 
from the list
  Iterator i= pages.iterator();
  int count=1; //count variable used to separate each image file
  //Convert every page of the pdf document to a unique image 
file
  System.out.println("Please wait...");
  while(i.hasNext()){
PDPage page=i.next(); 
BufferedImage bi=page.convertToImage( 
BufferedImage.TYPE_INT_RGB,  300);
FileOutputStream fos = new FileOutputStream(new 
File("d:\\slika2_3.jpg"));
//ImageIO.write(bi, "jpg", new 
File("d:\\pdfimageold.jpg"));
boolean foundWriter = ImageIOUtil.writeImage(bi, "jpg", 
fos, 300);
count++;
  
  }
  System.out.println("Conversion complete");
}catch(IOException ie){ie.printStackTrace();}
  }




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated PDFBOX-2079:


Attachment: PDFBOX-2079-TEST_CASE.patch
embedded_zip.pdf

test file (from TIKA-1124) and test case attached

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered_(with_null_clipping).png

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, 
> rendered_(with_null_clipping).png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998641#comment-13998641
 ] 

Andreas Lehmkühler commented on PDFBOX-1463:


[~Francesca.Herpertz]: This ticket seems to be about type1 font issues and 
PDFBOX-1426 is about truetype font issues.

I'm going to close this one, as the origin poster couldn't provide any 
addtional information to help us solving this issue.

Please, create a new ticket and provide as much as possible details about the 
issue (issue description, stack trace, version info etc.) A sample pdf would be 
a definite plus

> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998884#comment-13998884
 ] 

Tilman Hausherr commented on PDFBOX-2070:
-

Did some DRY refactoring in rev 1594969 by moving 3 searches for an imagereader 
into its own method. Btw I have no idea why JPXFilter.readJPX() is static, so I 
removed that too.

> Filter.decode() modifies PDF if there is a filter array
> ---
>
> Key: PDFBOX-2070
> URL: https://issues.apache.org/jira/browse/PDFBOX-2070
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: after.pdf, before.pdf
>
>
> If there are several filters (filter array) in an image, PDFBox is inserting 
> an empty DecodeParms object here
> {code}
> params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
> {code}
> instead of either inserting an empty COSArray, or (better) do nothing. Saving 
> such a PDF results in it not being displayable in the Acrobat Reader.
> Test code:
> {code}
> PDDocument d = PDDocument.load("before.pdf");
> new PDFRenderer(d).renderImage(0);
> d.save("after.pdf");
> {code}
> The rendering is important because without it, the filtered objects aren't 
> decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2079:
---

Assignee: Tilman Hausherr

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-2078) DPI always 96

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2078.
---

Resolution: Fixed
  Assignee: Tilman Hausherr

I'm closing this one as it wasn't a problem. Please open a new issue about the 
other problem, and don't forget to attach the PDF and the image.

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>Assignee: Tilman Hausherr
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1895) Modifying a damaged PDF damages it further

2014-05-16 Thread Pat Hickey (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993984#comment-13993984
 ] 

Pat Hickey commented on PDFBOX-1895:


The input is a file that Adobe Reader will display (reference above).
This code copies the input to the output *without* decrypting.
Should I expect that Adobe Reader will display the file now?
It does not. Everything is completely garbled.
And it *still* complains about missing fonts.
I'm beginning to suspect a conflation of font and decryption issues.
The trick will be how to debug this w/o writing another parser. :(
{code}
public static void main( String[] args ) {
PDDocument document = PDDocument.load( args[ 0 ] );
document.save( args[ 1 ] );
document.close();
System.exit( 0 );
}
{code}


> Modifying a damaged PDF damages it further
> --
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?
> Update: the PDF file in question is actually corrupt, but somehow modifying 
> it with PDFBox causes it to no longer be readable with Adobe Reader.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-958) convertToImage mangles images which were in the PDF

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-958:
--

Attachment: (was: Wrycan® Lorem Ipsum Test.pdf)

> convertToImage mangles images which were in the PDF
> ---
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.2.1, 1.4.0, 1.5.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
>Reporter: Eric Schwarzenbach
>Assignee: Andreas Lehmkühler
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, 
> PDFBOX958-WrycanLoremIpsumTest.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a 
> number of them (coming from disparate sources and method of creation) seem to 
> produce images where an image that was embedded in the page of the PDF shows 
> somewhat mangled. It seems to be divided by horizontal stripes, where some 
> stripes look normal, others seem to have some kind of "smearing" effect going 
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where 
> page images are crucial, and inability to produce reasonable looking page 
> images is pretty much a deal breaker. 
> The code we use to extract the images looks more or less like the following:
>   BufferedImage image = 
> page.convertToImage();
>   
>   SmartDeferredFileOutputStream outStream 
> = new SmartDeferredFileOutputStream();
>   String[] writerFormatNames = 
> ImageIO.getWriterFormatNames();
>   ImageIO.write(image, "jpeg", outStream);
>   outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an 
> image file that is indeed the correct format, and both images look exactly 
> the same. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2070:


Description: 
If there are several filters (filter array) in an image, PDFBox is inserting an 
empty DecodeParms object here
{code}
params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
{code}
instead of either inserting an empty COSArray, or (better) do nothing. Saving 
such a PDF results in it not being displayable in the Acrobat Reader.

Test code:
{code}
PDDocument d = PDDocument.load("before.pdf");
new PDFRenderer(d).renderImage(0);
d.save("after.pdf");
{code}
The rendering is important because without it, the filtered objects aren't 
decoded.

  was:
If there are several filters (filter array) in an image, PDFBox is inserting an 
empty DecodeParms object, instead of either inserting an empty COSAarray, or 
(better) do nothing. Saving such a PDF results in it not being displayable in 
the Acrobat Reader.

Test code:
{code}
PDDocument d = PDDocument.load("before.pdf");
new PDFRenderer(d).renderImage(0);
d.save("after.pdf");
{code}
The rendering is important because without it, the filtered objects aren't 
decoded.


> Filter.decode() modifies PDF if there is a filter array
> ---
>
> Key: PDFBOX-2070
> URL: https://issues.apache.org/jira/browse/PDFBOX-2070
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: after.pdf, before.pdf
>
>
> If there are several filters (filter array) in an image, PDFBox is inserting 
> an empty DecodeParms object here
> {code}
> params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
> {code}
> instead of either inserting an empty COSArray, or (better) do nothing. Saving 
> such a PDF results in it not being displayable in the Acrobat Reader.
> Test code:
> {code}
> PDDocument d = PDDocument.load("before.pdf");
> new PDFRenderer(d).renderImage(0);
> d.save("after.pdf");
> {code}
> The rendering is important because without it, the filtered objects aren't 
> decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998925#comment-13998925
 ] 

Tilman Hausherr commented on PDFBOX-2079:
-

I can confirm the wrong length :-( and will investigate this.

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread Francesca Nina Herpertz (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605
 ] 

Francesca Nina Herpertz edited comment on PDFBOX-1463 at 5/15/14 9:21 AM:
--

I ran into this problem recently as well. I am experiencing this issue on a 
Solaris machine as well as on an Ubuntu box. 
I am using Java 1.6 on both machines and it only happens with certain Arial 
Fonts e.g.: 

JFIGPU+Arial-BoldMT
KLSYIK+ArialMT

Normal Arial works just fine though and it appears to be rendered correctly. 
I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes 
because the original PDF is again confidential. Before using PDFBox 2.0.0 this 
PDF caused a JVM crash just as described in this jira ticket - PDFBox-1426. 



was (Author: francesca.herpertz):
I ran into this problem recently as well. I am experiencing this issue on a 
Solaris machine as well as on an Ubuntu box. 
I am using Java 1.6 on both machines and it only happens with certain Arial 
Fonts e.g.: 

JFIGPU+Arial-BoldMT
KLSYIK+ArialMT

Normal Arial works just fine though and it appears to be rendered correctly. 


> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1463#comment-1463
 ] 

Andreas Lehmkühler commented on PDFBOX-2080:


The barcode is inverted in 1.8.4, 1.8.5 and the 1.8-branch. It looks good in 
the current trunk (build 294), but the background of the page isn't white.

> Barcode getting color inverted in pdf to image conversion
> -
>
> Key: PDFBOX-2080
> URL: https://issues.apache.org/jira/browse/PDFBOX-2080
> Project: PDFBox
>  Issue Type: Bug
>Reporter: proba
> Attachments: FPR0T9.pdf, slika2_3.jpg
>
>
> While converting a 1 page pdf to an image (both attached below), the image 
> converts properly, however the barcodes colours invert.
> The code used to do the conversion looks like this right now:
>   public static void convertPDFToJPG(String src){
> try{
>   //load pdf file in the document object
>   PDDocument doc=PDDocument.load(new FileInputStream(src));
>   //Get all pages from document and store them in a list
>   List pages=doc.getDocumentCatalog().getAllPages();
>   //create iterator object so it is easy to access each page 
> from the list
>   Iterator i= pages.iterator();
>   int count=1; //count variable used to separate each image 
> file
>   //Convert every page of the pdf document to a unique image 
> file
>   System.out.println("Please wait...");
>   while(i.hasNext()){
> PDPage page=i.next(); 
> BufferedImage bi=page.convertToImage( 
> BufferedImage.TYPE_INT_RGB,  300);
> FileOutputStream fos = new FileOutputStream(new 
> File("d:\\slika2_3.jpg"));
> //ImageIO.write(bi, "jpg", new 
> File("d:\\pdfimageold.jpg"));
> boolean foundWriter = ImageIOUtil.writeImage(bi, 
> "jpg", fos, 300);
> count++;
>   
>   }
>   System.out.println("Conversion complete");
> }catch(IOException ie){ie.printStackTrace();}
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000280#comment-14000280
 ] 

Juraj Lonc commented on PDFBOX-2081:


I know that line completely disables clipping and I know it is not a solution ;)
I have used it just for description of the problem.

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered.png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: (was: rendered.png)

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-958) convertToImage mangles images which were in the PDF

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-958:
--

Attachment: PDFBOX958-WrycanLoremIpsumTest.pdf

> convertToImage mangles images which were in the PDF
> ---
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.2.1, 1.4.0, 1.5.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
>Reporter: Eric Schwarzenbach
>Assignee: Andreas Lehmkühler
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, 
> PDFBOX958-WrycanLoremIpsumTest.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a 
> number of them (coming from disparate sources and method of creation) seem to 
> produce images where an image that was embedded in the page of the PDF shows 
> somewhat mangled. It seems to be divided by horizontal stripes, where some 
> stripes look normal, others seem to have some kind of "smearing" effect going 
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where 
> page images are crucial, and inability to produce reasonable looking page 
> images is pretty much a deal breaker. 
> The code we use to extract the images looks more or less like the following:
>   BufferedImage image = 
> page.convertToImage();
>   
>   SmartDeferredFileOutputStream outStream 
> = new SmartDeferredFileOutputStream();
>   String[] writerFormatNames = 
> ImageIO.getWriterFormatNames();
>   ImageIO.write(image, "jpeg", outStream);
>   outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an 
> image file that is indeed the correct format, and both images look exactly 
> the same. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999694#comment-13999694
 ] 

Juraj Lonc commented on PDFBOX-2074:


I am curious whether Adobe Reader ignores such entries (entries are invalid) or 
processes them (entries are valid).

> 4-bytes CMap entry causes exception
> ---
>
> Key: PDFBOX-2074
> URL: https://issues.apache.org/jira/browse/PDFBOX-2074
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf
>
>
> I have PDF that has CMap entry consisting of 4 bytes. It is just one entry 
> with that size, other entries have 2-bytes.
> Adobe reader has no problems with that, PDFBox throws Exception.
> I think this Exception should not be thrown. It should be skipped or 
> truncated tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2069) PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea

2014-05-16 Thread Joel Hirsh (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Hirsh updated PDFBOX-2069:
---

Attachment: PDFBox-2609-patch.zip

Patch that addresses this problem

> PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea
> 
>
> Key: PDFBOX-2069
> URL: https://issues.apache.org/jira/browse/PDFBOX-2069
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.5
> Environment: Windows
>Reporter: Joel Hirsh
>  Labels: pdfbox
> Fix For: 2.0.0
>
> Attachments: PDFBOX-2609.pdf, PDFBox-2609-patch.zip
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Attached PDF is getting incorrect spacing using example program 
> ExtractTrextByArea.java as follows:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction Activity
> Date D e s c r i p t i o n Deposits W i t h d r a w a l s
> 0 4 / 0 8  B E G I N N I N G  BALANCE
> 04 / 0 8  W I THDRAWAL - ATM  3 1 1 7 3 0 0 . 0 0 -
> 62 M I L L  H I L L  ROAD WOODSTOCK N Y
> 04 / 1 0  W I THDRAWAL - ACH 2 0 0 . 0 0 -
> HUMAN RIGHTS WAT-B I L L  PAYMT
> 04 / 12  C K #  1 2 7 3 11 0 . 0 0 -
> 0 4 / 1 5  W I THDRAWAL - ACH 2 0 2 . 5 7 -
> NEW SOUTH INSURA -B I LL PAYMT
> 04 / 1 5  W I THDRAWAL - ACH 3 6 . 2 6 -
> WASTE CONNECTION-BILL PAYMT
> 04 / 1 7  W I THDRAWAL - ACH 71 2 . 0 0 -
> N  PYMT T
> 04 / 1 8  W I THDRAWAL - ACH 2958 9 . 0 0 3
> N  PYMT T
> 04 / 1 9  W I THDRAWAL - ACH 76 8 . 1 2 -
> I believe this because PDF streams with Tc before Tm are having the matrix 
> applied to the Tc, which is contrary to my experience with graphic pipelines. 
>  Most PDF streams seem to to have Tc after Tm, and thus do not hit this 
> situation.
> I have attached a patch to two files that corrects the problem for this file, 
> and also works correctly on my test suite of about 40 files from other 
> sources.  
> The result for the attached file now becomes:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction  Activity
> Date  Description Deposits  Withdrawals
> 04/08  BEGINNING  BALANCE
> 04/08  WITHDRAWAL-ATM  3 117 300.00-
> 62 MILL  HILL  ROAD  WOODSTOCK  NY
> 04/10  WITHDRAWAL-ACH 200.00-
> HUMAN RIGHTS  WAT-BILL  PAYMT
> 04/12  CK#  1273 110.00-
> 04/15  WITHDRAWAL-ACH 202.57-
> NEW SOUTH  INSURA-BILL  PAYMT
> 04/15  WITHDRAWAL-ACH 36.26-
> WASTE CONNECTION-BILL  PAYMT
> 04/17  WITHDRAWAL-ACH 712.00-
> N  PYMT T
> 04/18  WITHDRAWAL-ACH 29589.00 3
> N  PYMT T
> 04/19  WITHDRAWAL-ACH 768.12-



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993667#comment-13993667
 ] 

Tilman Hausherr commented on PDFBOX-2057:
-

I added code to handle bitmask transparency properly. Please give feedback 
whether you can get rid of your workaround.

I also added a test to prevent this from breaking in the future.

I also removed double assignments from CCITTFactory and added am test to check 
that they are really there.

This was done in rev 1593569, which also added the modifications of PDFBOX-2068.

Next: will look whether the problem occurs for jpeg and in 1.8.

> Importing BufferedImage into PDPixelMap is broken in 1.8.5
> --
>
> Key: PDFBOX-2057
> URL: https://issues.apache.org/jira/browse/PDFBOX-2057
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6
> Environment: windows vista / jdk 1.7.0_45
>Reporter: Michaël Michaud
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 1.8.6, 2.0.0
>
> Attachments: CS-Convocation entretien signed.pdf, CS-Convocation 
> entretien-IText.pdf, CS-Convocation entretien-PDFBox-with-workarround.pdf, 
> CS-Convocation entretien-PDFBox.pdf, ImageFilterOp.java, 
> differentBufferedImages.pdf, renderTransparentImage.zip
>
>
> Try to import a BufferedImage in a PDDocument with PDPixelMap
> BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the 
> pdf file contains instruction /ColorSpace /DeviceGray)
> BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox 
> 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB).
> Code used to demonstrate the problem is as follows (image has also been 
> colored with some Graphics instructions to demonstrate that 1.8.4 is working) 
> :
> {code}
> try {
> PDDocument doc = new PDDocument();
> PDPage page = new PDPage();
> doc.addPage(page);
> BufferedImage awtImage = new BufferedImage(100,100, 
> BufferedImage.TYPE_4BYTE_ABGR);
> PDPixelMap ximage = new PDPixelMap(doc, awtImage);
> PDPageContentStream contentStream = new PDPageContentStream(doc, 
> page);
> contentStream.drawXObject(ximage, 200, 200, 100, 100);
> contentStream.close();
> doc.save("C:\\Temp\\PDF\\test185_4babgr.pdf");
> } catch(COSVisitorException|IOException e) {
> e.printStackTrace();
> }
> {code}
> I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an 
> exception with PDFBox 1.8.4 and 1.8.5 :
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: Raster 
> IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = 
> 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 
> numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 
> transparency = 1 has alpha = false isAlphaPre = false
>   at java.awt.image.BufferedImage.(BufferedImage.java:630)
>   at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107)
> {code}
> My main purpose was to use a BufferedImage with a CMYK ColorSpace, but 
> PDPixelMap seems to accept 1 component and 3 component ColorSpace only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName

2014-05-16 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998856#comment-13998856
 ] 

Tim Allison edited comment on PDFBOX-1756 at 5/15/14 4:00 PM:
--

Shareable test document from TIKA-1252.  Same issue.

ClassCastException also now happens on initial loading/parsing.  This is caught 
and logged, and upon a quick review, it looks like text is being succesffuly 
extracted.

{noformat}
 WARN [main] (COSDocument.java:302) - java.lang.ClassCastException: 
org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to 
org.apache.pdfbox.cos.COSName
at 
org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294)
at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:627)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1224)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1189)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:118)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
{noformat}


was (Author: talli...@mitre.org):
Shareable test document from TIKA-1252.  Same issue.

> ClassCastException CosString cannot be cast to COSName
> --
>
> Key: PDFBOX-1756
> URL: https://issues.apache.org/jira/browse/PDFBOX-1756
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.2
> Environment: Ubuntu Linux & Windows 7 (both JDK6)
>Reporter: William Palmer
>Priority: Minor
> Attachments: testPDF_twoAuthors.pdf
>
>
> Opening and saving a PDF causes this exception in 1.8.2:
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSString cannot be cast to 
> org.apache.pdfbox.cos.COSName
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507)
>   at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
>   at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
>   at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
>   at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> The PDF is here: 
> http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf
> Code to reproduce the exception:
> PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf")));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: PDF file characters x and y coordinates

2014-05-16 Thread Alin Mazilu
I process about 2000 PDF files daily and I never had had an issue with the
coordinates. One piece of advise though: write your own
TextPositionComparator.

~Alin


On Fri, May 16, 2014 at 8:39 AM, Simer P  wrote:

> I just needed to confirm this with you guys.
>
> Can the X and Y coordinates returned in the
> processTextPosition(TextPosition text) ever be incorrect ?
>
> Because it doesn't really matter in what order the text is extracted ... if
> the x and y coordinates are accurate then I can rearrange the characters
> based on the applications requirements.
>
> So can the X and Y coordinates every be wrong ?
>
> Cheers
>


PDF file characters x and y coordinates

2014-05-16 Thread Simer P
I just needed to confirm this with you guys.

Can the X and Y coordinates returned in the
processTextPosition(TextPosition text) ever be incorrect ?

Because it doesn't really matter in what order the text is extracted ... if
the x and y coordinates are accurate then I can rearrange the characters
based on the applications requirements.

So can the X and Y coordinates every be wrong ?

Cheers


[jira] [Commented] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000183#comment-14000183
 ] 

Tilman Hausherr commented on PDFBOX-2080:
-

My bet is on PDFBOX-1950. That was fixed in 2.0 only.

> Barcode getting color inverted in pdf to image conversion
> -
>
> Key: PDFBOX-2080
> URL: https://issues.apache.org/jira/browse/PDFBOX-2080
> Project: PDFBox
>  Issue Type: Bug
>Reporter: proba
> Attachments: FPR0T9.pdf, slika2_3.jpg
>
>
> While converting a 1 page pdf to an image (both attached below), the image 
> converts properly, however the barcodes colours invert.
> The code used to do the conversion looks like this right now:
>   public static void convertPDFToJPG(String src){
> try{
>   //load pdf file in the document object
>   PDDocument doc=PDDocument.load(new FileInputStream(src));
>   //Get all pages from document and store them in a list
>   List pages=doc.getDocumentCatalog().getAllPages();
>   //create iterator object so it is easy to access each page 
> from the list
>   Iterator i= pages.iterator();
>   int count=1; //count variable used to separate each image 
> file
>   //Convert every page of the pdf document to a unique image 
> file
>   System.out.println("Please wait...");
>   while(i.hasNext()){
> PDPage page=i.next(); 
> BufferedImage bi=page.convertToImage( 
> BufferedImage.TYPE_INT_RGB,  300);
> FileOutputStream fos = new FileOutputStream(new 
> File("d:\\slika2_3.jpg"));
> //ImageIO.write(bi, "jpg", new 
> File("d:\\pdfimageold.jpg"));
> boolean foundWriter = ImageIOUtil.writeImage(bi, 
> "jpg", fos, 300);
> count++;
>   
>   }
>   System.out.println("Conversion complete");
> }catch(IOException ie){ie.printStackTrace();}
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2082) signing corrupts PDF when signature exactly fits allocated space

2014-05-16 Thread Koloom (JIRA)
Koloom created PDFBOX-2082:
--

 Summary: signing corrupts PDF when signature exactly fits 
allocated space
 Key: PDFBOX-2082
 URL: https://issues.apache.org/jira/browse/PDFBOX-2082
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Reporter: Koloom
Priority: Critical


The current check does not take "<>" into account, so if you are (un)lucky, the 
signature overwrites ">" and corrupts the PDF.

Fix for 1.8:

diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
index 3165589..755e849 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
@@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable
 SignatureInterface signatureInterface = 
doc.getSignatureInterface();
 byte[] sign = signatureInterface.sign(new 
ByteArrayInputStream(pdfContent));
 String signature = new COSString(sign).getHexString();
+++signaturePosition[0]; // move past "<"
+--signaturePosition[1]; // move in front of ">"
 int leftSignaturerange = 
signaturePosition[1]-signaturePosition[0]-signature.length();
 if(leftSignaturerange<0)
 {
 throw new IOException("Can't write signature, not enough 
space");
 }
-getStandardOutput().setPos(signaturePosition[0]+1);
+getStandardOutput().setPos(signaturePosition[0]);
 getStandardOutput().write(signature.getBytes());
 }
 }

Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a 
signature. It quite confused me to see 16k+4 bytes allocated when I called 
setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte 
takes 2 bytes in the pdf). 

Fix for 1.8:

diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
index 358364a..23dd3ab 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
@@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable
 int preferedSignatureSize = options.getPreferedSignatureSize();
 if (preferedSignatureSize > 0)
 {
-sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]);
+sigObject.setContents(new byte[preferedSignatureSize]);
 }
 else
 {




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2080:


Comment: was deleted

(was: My bet is on PDFBOX-1950. That was fixed in 2.0 only.)

> Barcode getting color inverted in pdf to image conversion
> -
>
> Key: PDFBOX-2080
> URL: https://issues.apache.org/jira/browse/PDFBOX-2080
> Project: PDFBox
>  Issue Type: Bug
>Reporter: proba
> Attachments: FPR0T9.pdf, slika2_3.jpg
>
>
> While converting a 1 page pdf to an image (both attached below), the image 
> converts properly, however the barcodes colours invert.
> The code used to do the conversion looks like this right now:
>   public static void convertPDFToJPG(String src){
> try{
>   //load pdf file in the document object
>   PDDocument doc=PDDocument.load(new FileInputStream(src));
>   //Get all pages from document and store them in a list
>   List pages=doc.getDocumentCatalog().getAllPages();
>   //create iterator object so it is easy to access each page 
> from the list
>   Iterator i= pages.iterator();
>   int count=1; //count variable used to separate each image 
> file
>   //Convert every page of the pdf document to a unique image 
> file
>   System.out.println("Please wait...");
>   while(i.hasNext()){
> PDPage page=i.next(); 
> BufferedImage bi=page.convertToImage( 
> BufferedImage.TYPE_INT_RGB,  300);
> FileOutputStream fos = new FileOutputStream(new 
> File("d:\\slika2_3.jpg"));
> //ImageIO.write(bi, "jpg", new 
> File("d:\\pdfimageold.jpg"));
> boolean foundWriter = ImageIOUtil.writeImage(bi, 
> "jpg", fos, 300);
> count++;
>   
>   }
>   System.out.println("Conversion complete");
> }catch(IOException ie){ie.printStackTrace();}
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2078) DPI always 96

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1495#comment-1495
 ] 

Tilman Hausherr commented on PDFBOX-2078:
-

The dpi isn't part of the BufferedImage, it is calculated into a zoom factor 
for rendering. So you have to pass it as a parameter again when saving, it is 
meta data, and its use is not properly supported by ImageIO (look at the source 
code of ImageIOUtils :-)  )

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>Assignee: Tilman Hausherr
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999128#comment-13999128
 ] 

Tim Allison commented on PDFBOX-2079:
-

Good to know.  Thank you for confirming and taking a look so quickly!

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread Francesca Nina Herpertz (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605
 ] 

Francesca Nina Herpertz commented on PDFBOX-1463:
-

I ran into this problem recently as well. I am experiencing this issue on a 
Solaris machine as well as on an Ubuntu box. 
I am using Java 1.6 on both machines and it only happens with certain Arial 
Fonts e.g.: 

JFIGPU+Arial-BoldMT
KLSYIK+ArialMT

Normal Arial works just fine though and it appears to be rendered correctly. 


> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715
 ] 

proba edited comment on PDFBOX-2078 at 5/15/14 1:55 PM:


Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a colour changing problem for myself in barcode pdf to image 
transformation, but thats a different story.

If you happen to know the answer though that would be lovely.

The barcode colours on the picture get inverted (black goes to white and white 
goes to black) which i saw was reported before on these forums.
Is there an easy known solution to this?


was (Author: proba):
Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a colour changing problem for myself in barcode pdf to image 
transformation, but thats a different story.

If you happen to know the answer that would be lovely.

The barcode colours on the picture get inverted (black goes to white and white 
goes to black) which i saw was reported before on these forums.
Is there an easy known solution to this?

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2070.
-

Resolution: Fixed
  Assignee: Tilman Hausherr

I'm not happy that three classes (ccitt filter: sets DeviceGray if not set;
jbig2 filter: sets DeviceGray if not set; jpx filter: sets BPC, Decode, width, 
height, colorspace) alter the pdf (oops, that was my idea a few months ago), 
but I don't have a better idea. Correcting this will possibly require major 
changes. Thus setting to resolved for now, as the original bug is fixed.

> Filter.decode() modifies PDF if there is a filter array
> ---
>
> Key: PDFBOX-2070
> URL: https://issues.apache.org/jira/browse/PDFBOX-2070
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: after.pdf, before.pdf
>
>
> If there are several filters (filter array) in an image, PDFBox is inserting 
> an empty DecodeParms object here
> {code}
> params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
> {code}
> instead of either inserting an empty COSArray, or (better) do nothing. Saving 
> such a PDF results in it not being displayable in the Acrobat Reader.
> Test code:
> {code}
> PDDocument d = PDDocument.load("before.pdf");
> new PDFRenderer(d).renderImage(0);
> d.save("after.pdf");
> {code}
> The rendering is important because without it, the filtered objects aren't 
> decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1392#comment-1392
 ] 

Tilman Hausherr commented on PDFBOX-2081:
-

Getting better output by setting no clipping region (which is what you do) is 
too good to be true, although I was able to get improved rendering for two test 
files, the ones from PDFBOX-677 and PDFBOX-1288. On the other hand, the tiger 
test file now doesn't clip something (the chin) where it should have been 
clipped.

A look at the spec found this weird part:
{quote}
The initial clipping path includes the entire page. A clipping path operator (W 
or W*, shown in Table 4.11) may appear after the last path construction 
operator and before the path-painting operator that terminates a path object. 
Although the clipping path operator appears before the painting operator, it 
does not alter the clipping path at the point where it appears. Rather, it 
modifies the effect of the succeeding painting operator. After the path has 
been painted, the clipping path in the graphics state is set to the 
intersection of the current clipping path and the newly constructed path.
{quote}
A look at the code shows that the clipping path is set in EndPath(), and this 
is called by the "n" operator. My understanding of the weird spec text is that 
the clipping path must be set after a paint operator, so it should also be set 
after any of the fill and stroke operators.

I don't know if that is the cause of the problem, more analysis of PDFs needs 
to be done.

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered.png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715
 ] 

proba edited comment on PDFBOX-2078 at 5/15/14 1:09 PM:


Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a font changing problem for myself in barcode pdf to image 
transformation, but thats a different story.

If you happen to know the answer that would be lovely.

The barcode colours on the picture get inverted (black goes to white and white 
goes to black) which i saw was reported before on these forums.
Is there an easy known solution to this?


was (Author: proba):
Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a font changing problem for myself in barcode pdf to image 
transformation, but thats a different story

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715
 ] 

proba edited comment on PDFBOX-2078 at 5/15/14 12:53 PM:
-

Using ImageIOUtil fixed the DPI issue, thank you.

Now I figured out a font changing problem for myself in barcode pdf to image 
transformation, but thats a different story


was (Author: proba):
writing them down with imageIOwrite.
To be precise:

ImageIO.write(bi, "jpg", new File("d:\\pdfimageold"+count+".jpg"));

Tried other types as well naturally. 

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-1463.
--

Resolution: Cannot Reproduce
  Assignee: Andreas Lehmkühler

Set to closed as we didn't get any addtional input to solve the issue

> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
>Assignee: Andreas Lehmkühler
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread proba (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

proba updated PDFBOX-2080:
--

Attachment: FPR0T9.pdf

> Barcode getting color inverted in pdf to image conversion
> -
>
> Key: PDFBOX-2080
> URL: https://issues.apache.org/jira/browse/PDFBOX-2080
> Project: PDFBox
>  Issue Type: Bug
>Reporter: proba
> Attachments: FPR0T9.pdf, slika2_3.jpg
>
>
> While converting a 1 page pdf to an image (both attached below), the image 
> converts properly, however the barcodes colours invert.
> The code used to do the conversion looks like this right now:
>   public static void convertPDFToJPG(String src){
> try{
>   //load pdf file in the document object
>   PDDocument doc=PDDocument.load(new FileInputStream(src));
>   //Get all pages from document and store them in a list
>   List pages=doc.getDocumentCatalog().getAllPages();
>   //create iterator object so it is easy to access each page 
> from the list
>   Iterator i= pages.iterator();
>   int count=1; //count variable used to separate each image 
> file
>   //Convert every page of the pdf document to a unique image 
> file
>   System.out.println("Please wait...");
>   while(i.hasNext()){
> PDPage page=i.next(); 
> BufferedImage bi=page.convertToImage( 
> BufferedImage.TYPE_INT_RGB,  300);
> FileOutputStream fos = new FileOutputStream(new 
> File("d:\\slika2_3.jpg"));
> //ImageIO.write(bi, "jpg", new 
> File("d:\\pdfimageold.jpg"));
> boolean foundWriter = ImageIOUtil.writeImage(bi, 
> "jpg", fos, 300);
> count++;
>   
>   }
>   System.out.println("Conversion complete");
> }catch(IOException ie){ie.printStackTrace();}
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2082) signing corrupts PDF when signature exactly fits allocated space

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Štěpán Schejbal updated PDFBOX-2082:


Description: 
The current check does not take "<>" into account, so if you are (un)lucky, the 
signature overwrites ">" and corrupts the PDF.

Fix for 1.8:

{code}
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
index 3165589..755e849 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
@@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable
 SignatureInterface signatureInterface = 
doc.getSignatureInterface();
 byte[] sign = signatureInterface.sign(new 
ByteArrayInputStream(pdfContent));
 String signature = new COSString(sign).getHexString();
+++signaturePosition[0]; // move past "<"
+--signaturePosition[1]; // move in front of ">"
 int leftSignaturerange = 
signaturePosition[1]-signaturePosition[0]-signature.length();
 if(leftSignaturerange<0)
 {
 throw new IOException("Can't write signature, not enough 
space");
 }
-getStandardOutput().setPos(signaturePosition[0]+1);
+getStandardOutput().setPos(signaturePosition[0]);
 getStandardOutput().write(signature.getBytes());
 }
 }
{code}

Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a 
signature. It quite confused me to see 16k+4 bytes allocated when I called 
setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte 
takes 2 bytes in the pdf). 

Fix for 1.8:

{code}
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
index 358364a..23dd3ab 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
@@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable
 int preferedSignatureSize = options.getPreferedSignatureSize();
 if (preferedSignatureSize > 0)
 {
-sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]);
+sigObject.setContents(new byte[preferedSignatureSize]);
 }
 else
 {
{code}

  was:
The current check does not take "<>" into account, so if you are (un)lucky, the 
signature overwrites ">" and corrupts the PDF.

Fix for 1.8:

diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
index 3165589..755e849 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java
@@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable
 SignatureInterface signatureInterface = 
doc.getSignatureInterface();
 byte[] sign = signatureInterface.sign(new 
ByteArrayInputStream(pdfContent));
 String signature = new COSString(sign).getHexString();
+++signaturePosition[0]; // move past "<"
+--signaturePosition[1]; // move in front of ">"
 int leftSignaturerange = 
signaturePosition[1]-signaturePosition[0]-signature.length();
 if(leftSignaturerange<0)
 {
 throw new IOException("Can't write signature, not enough 
space");
 }
-getStandardOutput().setPos(signaturePosition[0]+1);
+getStandardOutput().setPos(signaturePosition[0]);
 getStandardOutput().write(signature.getBytes());
 }
 }

Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a 
signature. It quite confused me to see 16k+4 bytes allocated when I called 
setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte 
takes 2 bytes in the pdf). 

Fix for 1.8:

diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
index 358364a..23dd3ab 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java
@@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable
 int preferedSignatureSize = options.getPreferedSignatureSize();
 if (preferedSignatureSize > 0)
 {
-sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]);
+sigObject.setContents(new byte[preferedSignatureSize]);
 }
 else
 {



> signing corrupts PDF when signature exactly fits allocated space
> ---

[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered.png
Obyčajné zásielky.pdf

> Lines that exceeds clipping area are not drawn
> --
>
> Key: PDFBOX-2081
> URL: https://issues.apache.org/jira/browse/PDFBOX-2081
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: Obyčajné zásielky.pdf, rendered.png
>
>
> PDF contains shapes that are partly on the paper and partly outside (shape 
> overflows paper borders).
> Those shapes are not rendered to image.
> It is caused by clipping area.
> When I replace line in PDFDrawer.strokePath()
> {noformat}
> graphics.setClip(getGraphicsState().getCurrentClippingPath());
> {noformat}
> to
> {noformat}
> graphics.setClip(null);
> {noformat}
> then everything is rendered correctly.
> Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tim Allison (JIRA)
Tim Allison created PDFBOX-2079:
---

 Summary: Extra new line characters extracted in 1.8.5 for embedded 
files leading to ZipFile exception in Java 1.6
 Key: PDFBOX-2079
 URL: https://issues.apache.org/jira/browse/PDFBOX-2079
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.5
Reporter: Tim Allison
Priority: Minor
 Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf

For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- "\r\n" 
at the end of the stream.  This leads to a ZipException for ZipFile(s) in Java 
1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2078) DPI always 96

2014-05-16 Thread proba (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999684#comment-13999684
 ] 

proba commented on PDFBOX-2078:
---

I did, and thank you for the fast answers.

Might i just suggest changing (or slightly altering?) the description of the 
resolution parameter in the convertToImage description?

Parameters:
resolution - the resolution in dpi (dots per inch)

Its possible i'm in the wrong and reading the description wrong here, but as 
pointed out in the original post the DPI doesnt actually change when changing 
the resolution.

> DPI always 96
> -
>
> Key: PDFBOX-2078
> URL: https://issues.apache.org/jira/browse/PDFBOX-2078
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5
>Reporter: proba
>Assignee: Tilman Hausherr
>
> I'm trying to convert a 1 page pdf report to an image using convertToImage.
> My used command goes as follows:
>  BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
> No matter how much i change the resolution (300 in the example), the DPI 
> stays the same, even though the quality and the dimensions of the picture 
> change.
> Adding a comparison between a 96 resolution picture and what should be a 300 
> resolution picture (notice the DPI)
> http://i58.tinypic.com/9sv339.png



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX

2014-05-16 Thread Francesca Nina Herpertz (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998720#comment-13998720
 ] 

Francesca Nina Herpertz commented on PDFBOX-1463:
-

I think it was resolved together with PDFBOX-1426. I cannot reproduce it with 
PDFBox 2.0.0. 

After redeploying the application also PDFs with the fonts described in my 
previous comment could be rendered. It seems that it was a weblogic caching 
issue and an old version of the application was still active. I will not open 
an additional ticket as it seems to be resolved with version 2.0.0.





> Unreadable fonts on UNIX
> 
>
> Key: PDFBOX-1463
> URL: https://issues.apache.org/jira/browse/PDFBOX-1463
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
> Environment: UNIX
>Reporter: Sindhu N Kashyap
>Assignee: Andreas Lehmkühler
> Attachments: screenshot-1.jpg
>
>
> I'm converting PDFs to tif. The conversion is fine when run in Windows. When 
> i run the same code in UNIX ,its converting with a font that is unreadable. I 
> put some font ttf files in the classes path but that has not made any 
> difference. Please help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2073) PDF files with unusual Japanese font can not be rewrite correctly

2014-05-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999069#comment-13999069
 ] 

Tilman Hausherr commented on PDFBOX-2073:
-

This will probably take several months, we usually have about 4 releases per 
year. 
https://archive.apache.org/dist/pdfbox/

You can get an intermediate version here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.6-SNAPSHOT/


> PDF files with unusual Japanese font can not be rewrite correctly
> -
>
> Key: PDFBOX-2073
> URL: https://issues.apache.org/jira/browse/PDFBOX-2073
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
> Environment: Windows 7 32bit
>Reporter: May Yu
>Assignee: Tilman Hausherr
>Priority: Critical
>  Labels: encoding
> Fix For: 1.8.6, 2.0.0
>
> Attachments: font_screenshot1.png, landscape.pdf, pdf_property.png
>
>
> While rotate attached pdf file, The Japanese characters cannot display in the 
> output pdf file. 
> This problem can also occur when marge PDF files.
> We suspect that this caused by the name of font type.
> Environment
> -
> OS: Windows 7 (32bit)
> jvm   : 1.6
> pdfbox: 1.8.5
> -
> Code to reproduce the problem
> -
> public static void main(String[] args) {
> String filePath = "D:\\test\\landscape.pdf";
> String newPDFFile = "D:\\test\\new_landscape.pdf";
> try {
> PDDocument rotatedDocument = PDDocument.load(filePath);
> PDDocument document = new PDDocument();
> int pageNumber = document.getNumberOfPages();
> for (int i=0; i PDPage page = 
> (PDPage)document.getDocumentCatalog().getAllPages().get(i);
> page.setRotation(-90);
> rotatedDocument.addPage(page);
> }
> rotatedDocument.save(newPDFFile);
> document.close();
> rotatedDocument.close();
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
> -



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-958) convertToImage mangles images which were in the PDF

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-958.
-

Resolution: Fixed

Reopened to replace a missing attachment

> convertToImage mangles images which were in the PDF
> ---
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.2.1, 1.4.0, 1.5.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
>Reporter: Eric Schwarzenbach
>Assignee: Andreas Lehmkühler
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, 
> PDFBOX958-WrycanLoremIpsumTest.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a 
> number of them (coming from disparate sources and method of creation) seem to 
> produce images where an image that was embedded in the page of the PDF shows 
> somewhat mangled. It seems to be divided by horizontal stripes, where some 
> stripes look normal, others seem to have some kind of "smearing" effect going 
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where 
> page images are crucial, and inability to produce reasonable looking page 
> images is pretty much a deal breaker. 
> The code we use to extract the images looks more or less like the following:
>   BufferedImage image = 
> page.convertToImage();
>   
>   SmartDeferredFileOutputStream outStream 
> = new SmartDeferredFileOutputStream();
>   String[] writerFormatNames = 
> ImageIO.getWriterFormatNames();
>   ImageIO.write(image, "jpeg", outStream);
>   outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an 
> image file that is indeed the correct format, and both images look exactly 
> the same. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion

2014-05-16 Thread proba (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

proba updated PDFBOX-2080:
--

Attachment: slika2_3.jpg

> Barcode getting color inverted in pdf to image conversion
> -
>
> Key: PDFBOX-2080
> URL: https://issues.apache.org/jira/browse/PDFBOX-2080
> Project: PDFBox
>  Issue Type: Bug
>Reporter: proba
> Attachments: FPR0T9.pdf, slika2_3.jpg
>
>
> While converting a 1 page pdf to an image (both attached below), the image 
> converts properly, however the barcodes colours invert.
> The code used to do the conversion looks like this right now:
>   public static void convertPDFToJPG(String src){
> try{
>   //load pdf file in the document object
>   PDDocument doc=PDDocument.load(new FileInputStream(src));
>   //Get all pages from document and store them in a list
>   List pages=doc.getDocumentCatalog().getAllPages();
>   //create iterator object so it is easy to access each page 
> from the list
>   Iterator i= pages.iterator();
>   int count=1; //count variable used to separate each image 
> file
>   //Convert every page of the pdf document to a unique image 
> file
>   System.out.println("Please wait...");
>   while(i.hasNext()){
> PDPage page=i.next(); 
> BufferedImage bi=page.convertToImage( 
> BufferedImage.TYPE_INT_RGB,  300);
> FileOutputStream fos = new FileOutputStream(new 
> File("d:\\slika2_3.jpg"));
> //ImageIO.write(bi, "jpg", new 
> File("d:\\pdfimageold.jpg"));
> boolean foundWriter = ImageIOUtil.writeImage(bi, 
> "jpg", fos, 300);
> count++;
>   
>   }
>   System.out.println("Conversion complete");
> }catch(IOException ie){ie.printStackTrace();}
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1895) Modifying a damaged PDF damages it further

2014-05-16 Thread Pat Hickey (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993402#comment-13993402
 ] 

Pat Hickey edited comment on PDFBOX-1895 at 5/9/14 5:01 AM:


I finally found the missing object. 
It is the encryption object.
I have pasted its content below. 
The /U token is only 16 bytes long... 
doesn't the spec say it should be 32?
{{
270 0 obj
<<
/CF <<
/StdCF <<
/AuthEvent /DocOpen
/CFM /V2
/Length 16
>>
>>
/EncryptMetadata false
/Filter /Standard
/Length 128
/O <1C05A048615171E5D46A21726E33D63AB2FFD258E5D9745CC19FAFD8CBC8B086>
/P -3900
/R 4
/StmF /StdCF
/StrF /StdCF
/U <568E89D6FDE15C453FCD04E69160C5BD>
/V 4
>>
endobj
}}


was (Author: brzrkr):
I finally found the missing object. 
It is the encryption object.
I have pasted its content below. 
The /U token is only 16 bytes long... 
doesn't the spec say it should be 32?
270 0 obj
<<
/CF <<
/StdCF <<
/AuthEvent /DocOpen
/CFM /V2
/Length 16
>>
>>
/EncryptMetadata false
/Filter /Standard
/Length 128
/O <1C05A048615171E5D46A21726E33D63AB2FFD258E5D9745CC19FAFD8CBC8B086>
/P -3900
/R 4
/StmF /StdCF
/StrF /StdCF
/U <568E89D6FDE15C453FCD04E69160C5BD>
/V 4
>>
endobj


> Modifying a damaged PDF damages it further
> --
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?
> Update: the PDF file in question is actually corrupt, but somehow modifying 
> it with PDFBox causes it to no longer be readable with Adobe Reader.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-958) convertToImage mangles images which were in the PDF

2014-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998738#comment-13998738
 ] 

Andreas Lehmkühler commented on PDFBOX-958:
---

Hi Tilman,

I've sent the pdf via pm.

BR
Andreas Lehmkühler




> convertToImage mangles images which were in the PDF
> ---
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.2.1, 1.4.0, 1.5.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
>Reporter: Eric Schwarzenbach
>Assignee: Andreas Lehmkühler
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, Wrycan® 
> Lorem Ipsum Test.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a 
> number of them (coming from disparate sources and method of creation) seem to 
> produce images where an image that was embedded in the page of the PDF shows 
> somewhat mangled. It seems to be divided by horizontal stripes, where some 
> stripes look normal, others seem to have some kind of "smearing" effect going 
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where 
> page images are crucial, and inability to produce reasonable looking page 
> images is pretty much a deal breaker. 
> The code we use to extract the images looks more or less like the following:
>   BufferedImage image = 
> page.convertToImage();
>   
>   SmartDeferredFileOutputStream outStream 
> = new SmartDeferredFileOutputStream();
>   String[] writerFormatNames = 
> ImageIO.getWriterFormatNames();
>   ImageIO.write(image, "jpeg", outStream);
>   outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an 
> image file that is indeed the correct format, and both images look exactly 
> the same. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2079:


Affects Version/s: 2.0.0
   1.8.6

> Extra new line characters extracted in 1.8.5 for embedded files leading to 
> ZipFile exception in Java 1.6
> 
>
> Key: PDFBOX-2079
> URL: https://issues.apache.org/jira/browse/PDFBOX-2079
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Minor
> Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf
>
>
> For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from 
> an embedded zip (well, docx) file.  PDFBox 1.8.5 extracts 17662 bytes -- 
> "\r\n" at the end of the stream.  This leads to a ZipException for ZipFile(s) 
> in Java 1.6, but not Java 1.7. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PDFBOX-958) convertToImage mangles images which were in the PDF

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reopened PDFBOX-958:
---


> convertToImage mangles images which were in the PDF
> ---
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.2.1, 1.4.0, 1.5.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
>Reporter: Eric Schwarzenbach
>Assignee: Andreas Lehmkühler
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, 
> PDFBOX958-WrycanLoremIpsumTest.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a 
> number of them (coming from disparate sources and method of creation) seem to 
> produce images where an image that was embedded in the page of the PDF shows 
> somewhat mangled. It seems to be divided by horizontal stripes, where some 
> stripes look normal, others seem to have some kind of "smearing" effect going 
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where 
> page images are crucial, and inability to produce reasonable looking page 
> images is pretty much a deal breaker. 
> The code we use to extract the images looks more or less like the following:
>   BufferedImage image = 
> page.convertToImage();
>   
>   SmartDeferredFileOutputStream outStream 
> = new SmartDeferredFileOutputStream();
>   String[] writerFormatNames = 
> ImageIO.getWriterFormatNames();
>   ImageIO.write(image, "jpeg", outStream);
>   outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an 
> image file that is indeed the correct format, and both images look exactly 
> the same. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)