[jira] [Commented] (PDFBOX-2053) Issue with PDFBox position reading

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988602#comment-13988602
 ] 

Tilman Hausherr commented on PDFBOX-2053:
-

This is very similar to PDFBOX-62, although the fix I proposed there doesn't 
work there, for a reason that I don't know yet.

> Issue with PDFBox position reading
> --
>
> Key: PDFBOX-2053
> URL: https://issues.apache.org/jira/browse/PDFBOX-2053
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Orbel Mkrtchyan
> Attachments: test.pdf
>
>
> Using PDFBox 1.8.4,
> bug #1:
>   PDDocument doc = new PDDocument();
>   doc.load("test-pcc7247.pdf");
>   doc.save("out.pdf");
>   doc.close();
> The resulting file is corrupted, contains 0 pages and cannot be viewed by 
> Acrobat Reader.
> bug #2: consider the following code snippet. The code runs like this:
>   Extractor extractor = new Extractor();
>   extractor.writeText(pdDoc, output);
> Using the code defined like this:
> public class Extractor extends PDFTextStripper {
> ...
> protected void writePage() throws IOException
> {
> for( int i = 0; i < charactersByArticle.size(); i++)
> {
> List textList = charactersByArticle.get( i );
> Iterator textIter = textList.iterator();
> while( textIter.hasNext() )
> {
> TextPosition position = (TextPosition)textIter.next();
> In the given piece of code, position variable correctly iterates through the 
> letters of the first line of the provided pdf document, but its coordinates 
> (x, y, widths, etc) are always the same. Just to be clear, 1 position always 
> relates to 1 letter, and its widths array's length always equals 1. So we get 
> the same coordinates for every letter in a line. Expected behaviour is either 
> having new coordinates per letter or having widths[] contain widths for the 
> characters of a whole line of text



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988598#comment-13988598
 ] 

Tilman Hausherr commented on PDFBOX-2054:
-

Committed a first round in rev 1592153 for the trunk and rev 1592155 for the 
1.8 branch.

> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Assignee: Tilman Hausherr
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 1.8.5 and JIRA

2014-05-02 Thread Tilman Hausherr

Hallo Andreas,

Thanks for all your work; only one thing is missing, 1.8.5 is still 
listed as "unreleased version" in JIRA, e.g. here:

https://issues.apache.org/jira/browse/PDFBOX/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

Tilman

Am 02.05.2014 09:27, schrieb Andreas Lehmkühler:

Hi,

due to the newest PDFBox 1.8.5 release I've closed all 1.8.5 related issues
in a bulk operation. I've disabled the email notification to avoid an email
flood.
I've also added the all new version 1.8.6 for our next bugfix release ...

I'll update the download page once the mirrors copied the version from our
repository.

BR
Andreas Lehmkühler




[jira] [Resolved] (PDFBOX-2056) incomplete build tests

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2056.
-

Resolution: Fixed

Added PDFCloneUtilityTest, PDLabTest, PDPixelMapTest, TestPDFText2HTML, 
PDColorStateTest in rev 1592149.

> incomplete build tests
> --
>
> Key: PDFBOX-2056
> URL: https://issues.apache.org/jira/browse/PDFBOX-2056
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.6
>
>
> In the 1.8 branch, tests need to be explicitely mentioned in the pom or in 
> TestAll.java. At least 5 tests are missing, among them 3 that I wrote. I am 
> adding these.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2056) incomplete build tests

2014-05-02 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2056:
---

 Summary: incomplete build tests
 Key: PDFBOX-2056
 URL: https://issues.apache.org/jira/browse/PDFBOX-2056
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.4, 1.8.5, 1.8.6
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.6


In the 1.8 branch, tests need to be explicitely mentioned in the pom or in 
TestAll.java. At least 5 tests are missing, among them 3 that I wrote. I am 
adding these.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2052.
-

   Resolution: Fixed
Fix Version/s: 1.8.6

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 1.8.6, 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2052:


Affects Version/s: 1.8.6
   1.8.5
   1.8.4

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987663#comment-13987663
 ] 

Tilman Hausherr edited comment on PDFBOX-2052 at 5/2/14 11:44 PM:
--

Thank you, after seeing the test now I understand what it is all about. I took 
your code and added some I wrote because I wanted to check that a real PDF file 
was created and cloned. This was done in rev 1591899 for the trunk and in rev 
1592122 for the 1.8 branch. Thanks for the contribution!


was (Author: tilman):
Thank you, after seeing the test now I understand what it is all about. I took 
your code and added some of my own I wrote because I wanted to check that a 
real PDF file was created and cloned. This was done in rev 1591899 for the 
trunk. I will care for the 1.8 later (if possible).

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988311#comment-13988311
 ] 

Tilman Hausherr commented on PDFBOX-2054:
-

You can use 
{noformat}
  {code}
   ...
  {code}
{noformat}
for such text, but I got your point. I will replace these (and maybe others) 
with calls to the logging system.

> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Assignee: Tilman Hausherr
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2054:
---

Assignee: Tilman Hausherr

> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Assignee: Tilman Hausherr
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987830#comment-13987830
 ] 

Hong-Thai Nguyen edited comment on PDFBOX-2054 at 5/2/14 9:04 PM:
--

Yes, we use PDFbox via Tika in many connectors. We direct out & err separately 
in logs files, and log4j in other file with something like this:
java -Dlog4j.configuration=file:log4j.properties -cp CP_PATH runClass  
^>%LOG_FOLDER%\out.txt 2^>%LOG_FOLDER%\err.txt

Normally, we expect that all log must be group in log4j (with configuration on 
logger, level ...), stdout (out.txt) and errou(err.txt) are generally reserved 
for console things.

Otherwise, printStackTrace() & out.println() has nothing about date/time and 
more less uncontrollable.



was (Author: thaichat04):
Yes, we use PDFbox via Tika in many connectors. We direct out & err separately 
in logs files, and log4j in other file with something like this:
java -Dlog4j.configuration=file:log4j.properties -cp CP_PATH runClass  
^>%LOG_FOLDER%\out.txt 2^>%LOG_FOLDER%\err.txt

Normally, we expect that all log must be group in log4j (with configuration on 
logger, level ...), out.txt and err.txt are reserved for something else.

Otherwise, printStackTrace() & out.println() mentions nothing about date/time 
and more less controllable.


> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 1.8.5 and Website

2014-05-02 Thread Andreas Lehmkuehler

Hi,


Am 02.05.2014 11:08, schrieb Maruan Sahyoun:

Hi,

I’ve updated the PDFBox API docs to reflect 1.8.5 on the website.

thanks Maruan!

So, I guess everything's done for that release, let's continue with the next 
one ;-)

BR
Andreas Lehmkühler


BR
Maruan

Am 02.05.2014 um 09:27 schrieb Andreas Lehmkühler :


Hi,

due to the newest PDFBox 1.8.5 release I've closed all 1.8.5 related issues
in a bulk operation. I've disabled the email notification to avoid an email
flood.
I've also added the all new version 1.8.6 for our next bugfix release ...

I'll update the download page once the mirrors copied the version from our
repository.

BR
Andreas Lehmkühler







[jira] [Commented] (PDFBOX-2049) load PDF file throws WrappedIOException in v1.8.4 but not in v0.7.3

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987871#comment-13987871
 ] 

Tilman Hausherr commented on PDFBOX-2049:
-

1.8.5 has been released:
https://pdfbox.apache.org/downloads.html

> load PDF file throws WrappedIOException in v1.8.4 but not in v0.7.3
> ---
>
> Key: PDFBOX-2049
> URL: https://issues.apache.org/jira/browse/PDFBOX-2049
> Project: PDFBox
>  Issue Type: Bug
>  Components: .NET
>Affects Versions: 1.8.4
> Environment: Visual Studio 2005
>Reporter: Venkatesan
>
> We are using .Net Version of PDFBox V1.8.4, It throws WrappedIOException for 
> one PDF file at the below line.
> PDDocument doc = PDDocument.load("path of the PDF");
> Were the Same PDF file is read Successfully in the PDFBox V0.7.3.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[ANNOUNCE] Apache PDFBox 1.8.5 released

2014-05-02 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 1.8.5. The release is available for download at:

   http://pdfbox.apache.org/downloads.html

See the full release notes below for details about this release.


Release Notes -- Apache PDFBox -- Version 1.8.5

Introduction


The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is an incremental bugfix release based on the earlier 1.8.4 release. It
contains a couple of fixes and small improvements.

For more details on all fixes included in this release, please refer to the 
following

issues on the PDFBox issue tracker at 
https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-198] - Tiff image problems
[PDFBOX-205] - Miscellaneous errors on valid files
[PDFBOX-778] - OutOfMemory when extracting text from pdf
[PDFBOX-1069] - Ubuntu throws exceptions when fonts missing
[PDFBOX-1074] - TIFFFaxDecoder5 when using PDFImageWriter
[PDFBOX-1147] - Printing a PDF with an image inside show black.
[PDFBOX-1164] - Inline image parsing error causes RuntimeException + FIX
[PDFBOX-1664] - NullPointerException in PDType1Font.java
[PDFBOX-1708] - IndexOutOfBoundsException on convertToImage with an embedded 
Fax-Image

[PDFBOX-1811] - java.io.IOException: Object at offset does not end with 'endobj'
[PDFBOX-1860] - HTML converter escapes formatting close tags
[PDFBOX-1870] - PDFunctionType0 incorrect
[PDFBOX-1876] - Incorrect color for DeviceN type 4 shading object
[PDFBOX-1877] - Radial Shading (type 3) fails Ghent Workgroup tests
[PDFBOX-1880] - [PATCH] Type 1 Shading must not ignore current transformation 
matrix
[PDFBOX-1882] - Negative array size exception when reading a string from a OTF 
font
[PDFBOX-1884] - Avoid NPE when encountering null PDComplexFileSpecification
[PDFBOX-1887] - Bugfixes + Optimization of Gouraud Shading
[PDFBOX-1888] - JBIG2Filter is creating an ImageInputStream (with temp file) and 
not

closing it
[PDFBOX-1896] - Support MMType1 (Multiple Master) Fonts
[PDFBOX-1901] - null check confusing
[PDFBOX-1917] - Rendering hangs
[PDFBOX-1924] - Gouraud shading: detect empty triangles
[PDFBOX-1966] - Type 1, 4 and 5 shadings for shFill()
[PDFBOX-1970] - 1.8 shadings are sometimes flipped
[PDFBOX-1977] - LZWFilter fails
[PDFBOX-1984] - PDFont documentation correction needed for getFontWidth and 
getFontHeight

[PDFBOX-1988] - PDFBox ExtractText issue of PDF with no embedded fonts
[PDFBOX-1998] - PDF rendering with reversed colors
[PDFBOX-1999] - JBIG2Filter - FlateDecoded Globals Table
[PDFBOX-2004] - PDF2Image hangs/loops forever processing PDF
[PDFBOX-2008] - Off-by-one error in BaseParser.readGenerationNumber()
[PDFBOX-2016] - Stream parsing still incorrect if length value is wrong
[PDFBOX-2018] - Dashed line with incorrect line cap
[PDFBOX-2024] - /Rotate 180 PDF is not displayed correctly in PDFReader app
[PDFBOX-2026] - cannot load jpg into new pdf
[PDFBOX-2030] - Using new PDPixelMap() results in black image in PDF
[PDFBOX-2031] - GrayScale images become inverted
[PDFBOX-2032] - [PATCH] TTF Type12 IOException: Invalid Characters codes
[PDFBOX-2035] - Ignore badly formatted toUnicode CMaps
[PDFBOX-2036] - Add test with LZW fail sequence
[PDFBOX-2042] - ColorSpace with empty Range array

Improvement

[PDFBOX-52] - DCTFilter is not implemented yet
[PDFBOX-615] - shfill operator needs implementation
[PDFBOX-1734] - ImageIoUtil.WriteImage doesn't work with tiff images
[PDFBOX-1869] - Implementation for ShadingType 1
[PDFBOX-1897] - There are some errors within the source code documentation 
(javadocs)

[PDFBOX-1902] - generics added to maputil
[PDFBOX-1909] - Close open streams
[PDFBOX-1914] - Shading package: Move "function" methods to base class and more
refactoring
[PDFBOX-1946] - Running within an Applet has many AccessControlException 's
[PDFBOX-1964] - PDFMergerUtility support merging using non sequential parser
[PDFBOX-1975] - Improve TestImageIOUtils unit tests to check image resolution 
and
compression
[PDFBOX-2010] - make "protected PDFont getDescendantFont()" public as it is in 
2.0.0
[PDFBOX-2039] - Class PDDocument should implement java.io.Closeable


Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by SHA1 and MD5 checksums and a PGP
signature that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://svn.apache.org/repos/asf/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apach

[jira] [Closed] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2055.
---

Resolution: Won't Fix

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987830#comment-13987830
 ] 

Hong-Thai Nguyen commented on PDFBOX-2054:
--

Yes, we use PDFbox via Tika in many connectors. We direct out & err separately 
in logs files, and log4j in other file with something like this:
java -Dlog4j.configuration=file:log4j.properties -cp CP_PATH runClass  
^>%LOG_FOLDER%\out.txt 2^>%LOG_FOLDER%\err.txt

Normally, we expect that all log must be group in log4j (with configuration on 
logger, level ...), out.txt and err.txt are reserved for something else.

Otherwise, printStackTrace() & out.println() mentions nothing about date/time 
and more less controllable.


> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987819#comment-13987819
 ] 

Tilman Hausherr commented on PDFBOX-2054:
-

Do you have a file or an application that produces this exception? I will 
remove the println and replace it with logging (which can be kept silent), but 
at this time I would not yet change the handling by rethrowing the exception.

> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987818#comment-13987818
 ] 

Hong-Thai Nguyen commented on PDFBOX-2055:
--

Great. I heard alot about noSeq with many improvement. First  time, we got a 
PDF which can not be handled by traditional loader method, but success with 
noSeq :). Can close issue.

Thanks

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2054:


Affects Version/s: 2.0.0
   1.8.6
   1.8.5

> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987807#comment-13987807
 ] 

Tilman Hausherr commented on PDFBOX-2055:
-

Use loadNonSeq(localFile, null) instead of load(localFile).

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2055:


Affects Version/s: 2.0.0
   1.8.6
   1.8.5

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 1.8.6, 2.0.0
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987772#comment-13987772
 ] 

Tilman Hausherr commented on PDFBOX-2055:
-

I can confirm that it is happening on 1.8.6 :-( No idea yet why it didn't 
happen with the pdfbox command line app.

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987751#comment-13987751
 ] 

Hong-Thai Nguyen commented on PDFBOX-2055:
--

Thank [~tilman], as you guessed, I mean effectively 'can't' ;)
For the test, I'm  generating first page to an Image via API:
{code}
@VisibleForTesting
  ImageResult generateImage(File localFile, int width, int height) throws 
Exception {
BufferedImage image;
PDDocument document = PDDocument.load(localFile);
try {
  image = computeImage(document);
} finally {
  document.close();
  document = null;
}

byte[] bytes = ImageResizer.resize(image, width, height);
if (bytes != null && image != null) {
  return new ImageResult(bytes, "image/png", image.getWidth(), 
image.getHeight());
} else {
  return null;
}
  }

  private BufferedImage computeImage(PDDocument document) throws IOException {
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
try {
  BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 
resolution);
  return image;
} finally {
  page = null;
}
  }
{code}

I confirm that this exception is real on 1.8.4

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987741#comment-13987741
 ] 

Tilman Hausherr commented on PDFBOX-2055:
-

I assume you meant to write "can't". The 1.8.5 version will be released very 
soon. "Very soon" as in "within a few hours or days". The cut has already been 
done. I mentioned both versions just in case you want to test them.

Btw I just tested with 1.8.4 and it works fine?! (I used the PDFReader command 
line application)

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987671#comment-13987671
 ] 

Hong-Thai Nguyen commented on PDFBOX-2055:
--

Thank your feedback. For some reasons, we can use snapshot versions.
I didn't follow recently feeds, but why we must in concurrently 1.8.5 & 1.8.6 
version ? Do you have any idea when these version will be released ?

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987663#comment-13987663
 ] 

Tilman Hausherr edited comment on PDFBOX-2052 at 5/2/14 1:17 PM:
-

Thank you, after seeing the test now I understand what it is all about. I took 
your code and added some of my own I wrote because I wanted to check that a 
real PDF file was created and cloned. This was done in rev 1591899 for the 
trunk. I will care for the 1.8 later (if possible).


was (Author: tilman):
Thank you, after seeing the test now I understand what it is all about. I took 
your code and added some of my own I wrote because I wanted to check that a 
real PDF file was created and cloned. This was done in rev 1591899 for the 
trunk.

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987663#comment-13987663
 ] 

Tilman Hausherr commented on PDFBOX-2052:
-

Thank you, after seeing the test now I understand what it is all about. I took 
your code and added some of my own I wrote because I wanted to check that a 
real PDF file was created and cloned. This was done in rev 1591899 for the 
trunk.

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987661#comment-13987661
 ] 

Tilman Hausherr commented on PDFBOX-2055:
-

I can't reproduce this with 1.8.5, so maybe it has been fixed in that version. 
However you won't like the output. The output is fine with the unreleased 2.0 
version, get it here (where you can also find the 1.8.5 and 1.8.6 versions):
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/



> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen updated PDFBOX-2054:
-

Description: 
For example at GlyfSimpleDescript.java
{code}
...
catch (ArrayIndexOutOfBoundsException e)
{
System.out.println("error: array index out of bounds");
}
{code}

and also 'printStackTrace' like in PageDrawer.java:
{code}
...
catch( IOException io )
{
io.printStackTrace();
}
{code}

Should forward exception or keep silence.

  was:
For example at GlyfSimpleDescript.java
{code}
...
catch (ArrayIndexOutOfBoundsException e)
{
System.out.println("error: array index out of bounds");
}
{code}

Should forward exception or keep silence.


> Remove System.out.println()
> ---
>
> Key: PDFBOX-2054
> URL: https://issues.apache.org/jira/browse/PDFBOX-2054
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> For example at GlyfSimpleDescript.java
> {code}
> ...
> catch (ArrayIndexOutOfBoundsException e)
> {
> System.out.println("error: array index out of bounds");
> }
> {code}
> and also 'printStackTrace' like in PageDrawer.java:
> {code}
> ...
> catch( IOException io )
> {
> io.printStackTrace();
> }
> {code}
> Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen updated PDFBOX-2055:
-

Attachment: eu_competition_newsletter_04_june_-_10_june_2010.pdf

> IOException when converting PDF to image
> 
>
> Key: PDFBOX-2055
> URL: https://issues.apache.org/jira/browse/PDFBOX-2055
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Hong-Thai Nguyen
>Priority: Critical
> Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf
>
>
> With attach PDF file, we got IO Exception when using PDPage.convertToImage():
> {code}
> java.io.IOException
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
>   at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>   at 
> com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2055) IOException when converting PDF to image

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created PDFBOX-2055:


 Summary: IOException when converting PDF to image
 Key: PDFBOX-2055
 URL: https://issues.apache.org/jira/browse/PDFBOX-2055
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.4
Reporter: Hong-Thai Nguyen
Priority: Critical
 Attachments: eu_competition_newsletter_04_june_-_10_june_2010.pdf

With attach PDF file, we got IO Exception when using PDPage.convertToImage():
{code}
java.io.IOException
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:336)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
at 
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
at 
org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:107)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:127)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
at 
com.polyspot.connector.imageservice.generators.PDFBoxImageGenerator.computeImage(PDFBoxImageGenerator.java:75)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2054) Remove System.out.println()

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created PDFBOX-2054:


 Summary: Remove System.out.println()
 Key: PDFBOX-2054
 URL: https://issues.apache.org/jira/browse/PDFBOX-2054
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.4
Reporter: Hong-Thai Nguyen
Priority: Minor


For example at GlyfSimpleDescript.java
{code}
...
catch (ArrayIndexOutOfBoundsException e)
{
System.out.println("error: array index out of bounds");
}
{code}

Should forward exception or keep silence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2052:
---

Assignee: Tilman Hausherr

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2053) Issue with PDFBox position reading

2014-05-02 Thread Orbel Mkrtchyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orbel Mkrtchyan updated PDFBOX-2053:


Attachment: test.pdf

> Issue with PDFBox position reading
> --
>
> Key: PDFBOX-2053
> URL: https://issues.apache.org/jira/browse/PDFBOX-2053
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Orbel Mkrtchyan
> Attachments: test.pdf
>
>
> Using PDFBox 1.8.4,
> bug #1:
>   PDDocument doc = new PDDocument();
>   doc.load("test-pcc7247.pdf");
>   doc.save("out.pdf");
>   doc.close();
> The resulting file is corrupted, contains 0 pages and cannot be viewed by 
> Acrobat Reader.
> bug #2: consider the following code snippet. The code runs like this:
>   Extractor extractor = new Extractor();
>   extractor.writeText(pdDoc, output);
> Using the code defined like this:
> public class Extractor extends PDFTextStripper {
> ...
> protected void writePage() throws IOException
> {
> for( int i = 0; i < charactersByArticle.size(); i++)
> {
> List textList = charactersByArticle.get( i );
> Iterator textIter = textList.iterator();
> while( textIter.hasNext() )
> {
> TextPosition position = (TextPosition)textIter.next();
> In the given piece of code, position variable correctly iterates through the 
> letters of the first line of the provided pdf document, but its coordinates 
> (x, y, widths, etc) are always the same. Just to be clear, 1 position always 
> relates to 1 letter, and its widths array's length always equals 1. So we get 
> the same coordinates for every letter in a line. Expected behaviour is either 
> having new coordinates per letter or having widths[] contain widths for the 
> characters of a whole line of text



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Cornelis Hoeflake (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987550#comment-13987550
 ] 

Cornelis Hoeflake edited comment on PDFBOX-2052 at 5/2/14 9:59 AM:
---

Please see patch file for COSStreamArray test.

The idea of calling cloneForNewDocument is from the PDFMergerUtility class. 
That class is my entry point.


was (Author: c.hoeflake):
Please see patch file for COSStreamArray test

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Cornelis Hoeflake (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cornelis Hoeflake updated PDFBOX-2052:
--

Attachment: clone-test-patch.diff

Please see patch file for COSStreamArray test

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff, clone-test-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-1091) NPE in PDFont.getEncodingFromFont

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-1091.
---

Resolution: Cannot Reproduce

Closing this one as no sample file was provided; the bug may have been fixed 
long ago, the getEncodingFromFont() call no longer exists in PDFont, and type1 
fonts are handled differently now, even in the 1.8 version.

> NPE in PDFont.getEncodingFromFont
> -
>
> Key: PDFBOX-1091
> URL: https://issues.apache.org/jira/browse/PDFBOX-1091
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Reporter: Robert Russo
>
> I'm using solr version 3.3 and am trying to index pdf documents.  I keep 
> getting the following error:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
> content Processing Document # 199
> at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
> Caused by: org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.ParserDecorator$1@6f340905
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
> ... 8 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pdfbox.pdmodel.font.PDFont.getEncodingFromFont(PDFont.java:832)
> at 
> org.apache.pdfbox.pdmodel.font.PDFont.determineEncoding(PDFont.java:293)
> at org.apache.pdfbox.pdmodel.font.PDFont.(PDFont.java:178)
> at 
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.(PDSimpleFont.java:79)
> at 
> org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:139)
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:109)
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76)
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:441)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:365)
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:321)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> ... 10 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2053) Issue with PDFBox position reading

2014-05-02 Thread Orbel Mkrtchyan (JIRA)
Orbel Mkrtchyan created PDFBOX-2053:
---

 Summary: Issue with PDFBox position reading
 Key: PDFBOX-2053
 URL: https://issues.apache.org/jira/browse/PDFBOX-2053
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.3
Reporter: Orbel Mkrtchyan


Using PDFBox 1.8.4,
bug #1:
PDDocument doc = new PDDocument();
doc.load("test-pcc7247.pdf");
doc.save("out.pdf");
doc.close();

The resulting file is corrupted, contains 0 pages and cannot be viewed by 
Acrobat Reader.


bug #2: consider the following code snippet. The code runs like this:
  Extractor extractor = new Extractor();
  extractor.writeText(pdDoc, output);

Using the code defined like this:

public class Extractor extends PDFTextStripper {
...
protected void writePage() throws IOException
{
for( int i = 0; i < charactersByArticle.size(); i++)
{
List textList = charactersByArticle.get( i );
Iterator textIter = textList.iterator();
while( textIter.hasNext() )
{
TextPosition position = (TextPosition)textIter.next();

In the given piece of code, position variable correctly iterates through the 
letters of the first line of the provided pdf document, but its coordinates (x, 
y, widths, etc) are always the same. Just to be clear, 1 position always 
relates to 1 letter, and its widths array's length always equals 1. So we get 
the same coordinates for every letter in a line. Expected behaviour is either 
having new coordinates per letter or having widths[] contain widths for the 
characters of a whole line of text



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2051) PDFPrinter does not use getPageable()

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2051.
-

Resolution: Fixed
  Assignee: Tilman Hausherr

Thank you, this makes sense, I added it in rev 1591837.

> PDFPrinter does not use getPageable()
> -
>
> Key: PDFBOX-2051
> URL: https://issues.apache.org/jira/browse/PDFBOX-2051
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: pageable-patch.diff
>
>
> The print method (print(PrinterJob job, boolean isSilent)) does not use the 
> getPageable() method, but constructs a PDFPageable directly.
> I think it is better to use that method and is very helpful when someone 
> wants to extend the PDFPrinter and PDFPageable to do some custom behaviour.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


1.8.5 and Website

2014-05-02 Thread Maruan Sahyoun
Hi,

I’ve updated the PDFBox API docs to reflect 1.8.5 on the website.

BR
Maruan

Am 02.05.2014 um 09:27 schrieb Andreas Lehmkühler :

> Hi,
> 
> due to the newest PDFBox 1.8.5 release I've closed all 1.8.5 related issues
> in a bulk operation. I've disabled the email notification to avoid an email
> flood.
> I've also added the all new version 1.8.6 for our next bugfix release ...
> 
> I'll update the download page once the mirrors copied the version from our
> repository.
> 
> BR
> Andreas Lehmkühler



[jira] [Commented] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987513#comment-13987513
 ] 

Tilman Hausherr commented on PDFBOX-2052:
-

To help me understand what this is about, could you attach / include
- a sample PDF with a COSStreamArray
- sample code or a sample command line that fails because the clone is 
incomplete?


> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2034) TestFilters is non-deterministic

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2034.
-

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.6

> TestFilters is non-deterministic
> 
>
> Key: PDFBOX-2034
> URL: https://issues.apache.org/jira/browse/PDFBOX-2034
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 1.8.5, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: LZW
> Fix For: 1.8.6, 2.0.0
>
>
> This is a follow-up of PDFBOX-1977, which was created by John.
> 
> TestFilters uses Random().nextLong() to generate a seed for random data, 
> which means that it is non-determinate. Depending on the seed value, the test 
> may fail or succeed.
> 
> So what we need is:
> - a set of [deterministic 
> tests|http://martinfowler.com/articles/nonDeterminism.html]
> - a set of non-deterministic tests
> To see why, see the discussion in PDFBOX-1977.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PDFBOX-2046) [PATCH] Can't read the embedded Type1 font

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2046:
---

Assignee: Tilman Hausherr

> [PATCH] Can't read the embedded Type1 font
> --
>
> Key: PDFBOX-2046
> URL: https://issues.apache.org/jira/browse/PDFBOX-2046
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: simon steiner
>Assignee: Tilman Hausherr
>  Labels: type1, type1font
> Fix For: 2.0.0
>
> Attachments: issue.pdf, type1parser.patch
>
>
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> x.pdf
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=end] but expected LITERAL
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=currentdict] but expected 
> LITERAL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2046) [PATCH] Can't read the embedded Type1 font

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2046.
-

   Resolution: Fixed
Fix Version/s: 2.0.0

No second opinion came for 4 days, so at I'd assume I'm not totally wrong. (I 
did read the type 1 and the postscript specs before making the change) I'm 
setting this to resolved.

> [PATCH] Can't read the embedded Type1 font
> --
>
> Key: PDFBOX-2046
> URL: https://issues.apache.org/jira/browse/PDFBOX-2046
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: simon steiner
>  Labels: type1, type1font
> Fix For: 2.0.0
>
> Attachments: issue.pdf, type1parser.patch
>
>
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> x.pdf
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=end] but expected LITERAL
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=currentdict] but expected 
> LITERAL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2048) TextExtraction only working after uncompressing with pdftk

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2048.
-

Resolution: Fixed

> TextExtraction only working after uncompressing with pdftk
> --
>
> Key: PDFBOX-2048
> URL: https://issues.apache.org/jira/browse/PDFBOX-2048
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Rendering, Text extraction
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>
> From Jonas Karlsson on the user list:
> ===
> We have a user with PDFs generated by a commercial transcription service.
> When we try to extract text from these pdfs, pdfbox returns a few empty
> lines. We get this result both from our own code, and when using the
> ExtractText command line tool
> If I specify the non-sequential parser, with the -nonSeq flag, the
> following error is produced:
> Apr 28, 2014 10:35:11 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser
> validateStreamLength
> SEVERE: The end of the stream doesn't point to the correct offset, using
> workaround to read the stream
> If I uncompress the file with pdftk, pdfbox is able to successfully extract
> the text.
> ===
> I have been given permission to attach the file "committers only". So don't 
> pass it around, avoid quoting details from the file. The file is also not 
> rendering. The lengths of the streams are 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2046) [PATCH] Can't read the embedded Type1 font

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2046:


Affects Version/s: 2.0.0

> [PATCH] Can't read the embedded Type1 font
> --
>
> Key: PDFBOX-2046
> URL: https://issues.apache.org/jira/browse/PDFBOX-2046
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: simon steiner
>  Labels: type1, type1font
> Attachments: issue.pdf, type1parser.patch
>
>
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> x.pdf
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=end] but expected LITERAL
> SEVERE: Can't read the embedded Type1 font
> java.io.IOException: Found Token[kind=NAME, text=currentdict] but expected 
> LITERAL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2047) read operations alter PDLab object

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2047.
-

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.6

> read operations alter PDLab object
> --
>
> Key: PDFBOX-2047
> URL: https://issues.apache.org/jira/browse/PDFBOX-2047
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.6, 2.0.0
>
>
> This is a follow-up to PDFBOX-2042 but for PDLab: "A read operation must not 
> alter the pdf." 
> The problem described PDFBOX-2042 happened because the constructor called 
> loadICCProfile(), which called getRangeForComponent(c), which altered its own 
> object with (broken) default values. PDLab has no such constructor so Jurajs 
> test won't show any problem, but this different test will:
> {code}
> PDLab pdLab = new PDLab();
> COSArray cosArray = (COSArray) pdLab.getCOSObject();
> COSDictionary dict = (COSDictionary)cosArray.getObject(1);
> pdLab.getBlackPoint();
> pdLab.getWhitepoint();
> pdLab.getARange();
> pdLab.getBRange();
> assertEquals("read operations should not change the size of /Lab 
> objects", 0, dict.size());
> dict.toString(); // rev 1571125 does stack overflow here in 2.0
> {code}
> removing the assert brings a stack overflow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2047) read operations alter PDLab object

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984061#comment-13984061
 ] 

Tilman Hausherr edited comment on PDFBOX-2047 at 5/2/14 8:39 AM:
-

Done in rev 1590878 for the 1.8 branch.


was (Author: tilman):
Done in rev 1590878 for the 1.8 branch. Will set to resolve after the 1.8.5 
release (which will not include this change).

> read operations alter PDLab object
> --
>
> Key: PDFBOX-2047
> URL: https://issues.apache.org/jira/browse/PDFBOX-2047
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.6, 2.0.0
>
>
> This is a follow-up to PDFBOX-2042 but for PDLab: "A read operation must not 
> alter the pdf." 
> The problem described PDFBOX-2042 happened because the constructor called 
> loadICCProfile(), which called getRangeForComponent(c), which altered its own 
> object with (broken) default values. PDLab has no such constructor so Jurajs 
> test won't show any problem, but this different test will:
> {code}
> PDLab pdLab = new PDLab();
> COSArray cosArray = (COSArray) pdLab.getCOSObject();
> COSDictionary dict = (COSDictionary)cosArray.getObject(1);
> pdLab.getBlackPoint();
> pdLab.getWhitepoint();
> pdLab.getARange();
> pdLab.getBRange();
> assertEquals("read operations should not change the size of /Lab 
> objects", 0, dict.size());
> dict.toString(); // rev 1571125 does stack overflow here in 2.0
> {code}
> removing the assert brings a stack overflow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2048) TextExtraction only working after uncompressing with pdftk

2014-05-02 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984050#comment-13984050
 ] 

Tilman Hausherr edited comment on PDFBOX-2048 at 5/2/14 8:41 AM:
-

Change committed in the trunk in rev 1590873, and rev 1590874 in the 1.8 branch.

Jonas, you can find a new jar file at 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.6-SNAPSHOT/
within a few hours. However it will be a few months before this will be 
released officially.


was (Author: tilman):
Change committed in the trunk in rev 1590873, and rev 1590874 in the 1.8 branch.

Jonas, you can find a new jar file at 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.6-SNAPSHOT/
within a few hours. However it will be a few months before this will be 
released officially.

I will set to resolve after the release of 1.8.5 (which will not include this 
change, because the cut was already done).

> TextExtraction only working after uncompressing with pdftk
> --
>
> Key: PDFBOX-2048
> URL: https://issues.apache.org/jira/browse/PDFBOX-2048
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Rendering, Text extraction
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>
> From Jonas Karlsson on the user list:
> ===
> We have a user with PDFs generated by a commercial transcription service.
> When we try to extract text from these pdfs, pdfbox returns a few empty
> lines. We get this result both from our own code, and when using the
> ExtractText command line tool
> If I specify the non-sequential parser, with the -nonSeq flag, the
> following error is produced:
> Apr 28, 2014 10:35:11 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser
> validateStreamLength
> SEVERE: The end of the stream doesn't point to the correct offset, using
> workaround to read the stream
> If I uncompress the file with pdftk, pdfbox is able to successfully extract
> the text.
> ===
> I have been given permission to attach the file "committers only". So don't 
> pass it around, avoid quoting details from the file. The file is also not 
> rendering. The lengths of the streams are 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2050) Add predictor to LZW filter

2014-05-02 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2050.
-

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.6

> Add predictor to LZW filter
> ---
>
> Key: PDFBOX-2050
> URL: https://issues.apache.org/jira/browse/PDFBOX-2050
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: LZW
> Fix For: 1.8.6, 2.0.0
>
>
> According to the PDF spec "LZW and Flate Predictor Functions", both can have 
> post processing with predictors. It is implemented for Flate but not for LZW. 
> I am adding this by using the existing code from the Flate filter, which I 
> will be moving into a helper class.
> While looking at the Flate filter, I noticed that its footprint is higher 
> than needed, i.e. there is too much buffering. I will test this separately 
> and commit only if I can measure an improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


1.8.5 and JIRA

2014-05-02 Thread Andreas Lehmkühler
Hi,

due to the newest PDFBox 1.8.5 release I've closed all 1.8.5 related issues
in a bulk operation. I've disabled the email notification to avoid an email
flood.
I've also added the all new version 1.8.6 for our next bugfix release ...

I'll update the download page once the mirrors copied the version from our
repository.

BR
Andreas Lehmkühler


[jira] [Updated] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Cornelis Hoeflake (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cornelis Hoeflake updated PDFBOX-2052:
--

Attachment: clone-patch.diff

Please see attached patch which adds support for handling COSStreamArray

> PDFCloneUtility does not handle COSStreamArray
> --
>
> Key: PDFBOX-2052
> URL: https://issues.apache.org/jira/browse/PDFBOX-2052
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
> Fix For: 2.0.0
>
> Attachments: clone-patch.diff
>
>
> A document which has COSStreamArray's, cannot be cloned. There is no handling 
> for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2052) PDFCloneUtility does not handle COSStreamArray

2014-05-02 Thread Cornelis Hoeflake (JIRA)
Cornelis Hoeflake created PDFBOX-2052:
-

 Summary: PDFCloneUtility does not handle COSStreamArray
 Key: PDFBOX-2052
 URL: https://issues.apache.org/jira/browse/PDFBOX-2052
 Project: PDFBox
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
 Fix For: 2.0.0
 Attachments: clone-patch.diff

A document which has COSStreamArray's, cannot be cloned. There is no handling 
for COSStreamArray.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2051) PDFPrinter does not use getPageable()

2014-05-02 Thread Cornelis Hoeflake (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cornelis Hoeflake updated PDFBOX-2051:
--

Attachment: pageable-patch.diff

Please see attached patch file which uses getPageable()

> PDFPrinter does not use getPageable()
> -
>
> Key: PDFBOX-2051
> URL: https://issues.apache.org/jira/browse/PDFBOX-2051
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Cornelis Hoeflake
> Fix For: 2.0.0
>
> Attachments: pageable-patch.diff
>
>
> The print method (print(PrinterJob job, boolean isSilent)) does not use the 
> getPageable() method, but constructs a PDFPageable directly.
> I think it is better to use that method and is very helpful when someone 
> wants to extend the PDFPrinter and PDFPageable to do some custom behaviour.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2051) PDFPrinter does not use getPageable()

2014-05-02 Thread Cornelis Hoeflake (JIRA)
Cornelis Hoeflake created PDFBOX-2051:
-

 Summary: PDFPrinter does not use getPageable()
 Key: PDFBOX-2051
 URL: https://issues.apache.org/jira/browse/PDFBOX-2051
 Project: PDFBox
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
 Fix For: 2.0.0


The print method (print(PrinterJob job, boolean isSilent)) does not use the 
getPageable() method, but constructs a PDFPageable directly.
I think it is better to use that method and is very helpful when someone wants 
to extend the PDFPrinter and PDFPageable to do some custom behaviour.



--
This message was sent by Atlassian JIRA
(v6.2#6252)