[jira] [Commented] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Andrei Solntsev (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977879#comment-13977879
 ] 

Andrei Solntsev commented on PDFBOX-2039:
-

No problems, the interface java.io.Closeable is available since Java 1.5

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2039:


Affects Version/s: 1.8.5
   1.8.4

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977504#comment-13977504
 ] 

Tilman Hausherr edited comment on PDFBOX-2039 at 4/23/14 6:12 AM:
--

-The 1.8 version must support JDK5, so it is not possible.- The 2.0 version has 
COSDocument and PDDocument that are Closeable.


was (Author: tilman):
The 1.8 version must support JDK5, so it is not possible. The 2.0 version has 
COSDocument and PDDocument that are Closeable.

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2039:


Fix Version/s: 2.0.0
   1.8.5

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
 Fix For: 1.8.5, 2.0.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977890#comment-13977890
 ] 

Tilman Hausherr commented on PDFBOX-2039:
-

For a start, I added it for PDDocument and COSDocument in 1.8 in rev 1589346, 
because these are the methods where it is implemented in 2.0. I'll have a look 
at other methods later. You can get it immediately with svn if you want to test 
improved code, or here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.5-SNAPSHOT/

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
 Fix For: 1.8.5, 2.0.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2038) Method VisualSignatureParser#parse does not close COSDocument

2014-04-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2038:


Affects Version/s: 2.0.0
   1.8.5

 Method VisualSignatureParser#parse does not close COSDocument
 -

 Key: PDFBOX-2038
 URL: https://issues.apache.org/jira/browse/PDFBOX-2038
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 I am adding a visual signature to my PDF.
 SignatureOptions options = new SignatureOptions();
 options.setVisualSignature( new FileInputStream(my.jpg) );
 After a while I am getting the following warning in logs:
 Warning: COSDocument: You did not close a PDF Document
 The problem cause is probably the method 
 org.apache.pdfbox.pdfparser.VisualSignatureParser#parse which creates 
 instance of COSDocument, but does not close it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2038) Method VisualSignatureParser#parse does not close COSDocument

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977904#comment-13977904
 ] 

Tilman Hausherr commented on PDFBOX-2038:
-

setVisualSignature does create a COSDocument, which contains the visual 
signature that is to be used later. So it can't be closed immediately. 
Currently, all you can do is to call options.getVisualSignature() and close 
that object. An improvement might be to add a close() method to 
SignatureOptions, but I'm not one of the signature people here, so I'd rather 
wait for their opinion.

 Method VisualSignatureParser#parse does not close COSDocument
 -

 Key: PDFBOX-2038
 URL: https://issues.apache.org/jira/browse/PDFBOX-2038
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 I am adding a visual signature to my PDF.
 SignatureOptions options = new SignatureOptions();
 options.setVisualSignature( new FileInputStream(my.jpg) );
 After a while I am getting the following warning in logs:
 Warning: COSDocument: You did not close a PDF Document
 The problem cause is probably the method 
 org.apache.pdfbox.pdfparser.VisualSignatureParser#parse which creates 
 instance of COSDocument, but does not close it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-23 Thread ahfei (JIRA)
ahfei created PDFBOX-2041:
-

 Summary: Convert PDF to Image (Strange Color)
 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei


Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
Below is code i'm using : 

BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);
ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
200);

After convert, this image isn't look like pdf. Half page of it become blue and 
black color. Attached images  PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-23 Thread ahfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ahfei updated PDFBOX-2041:
--

Description: 
Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
Below is code i'm using : 

BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);
ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
200);

After convert, this image isn't look like pdf. Half page of it become blue and 
black color. 

Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 

  was:
Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
Below is code i'm using : 

BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);
ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
200);

After convert, this image isn't look like pdf. Half page of it become blue and 
black color. Attached images  PDF.


 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei

 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1241) Better handle of missing offset at the end of a file

2014-04-23 Thread Manuel Mahringer (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978008#comment-13978008
 ] 

Manuel Mahringer commented on PDFBOX-1241:
--

With the trunk version from 22.04.2014 the issue isn't reproduceable anymore.

 Better handle of missing offset at the end of a file
 

 Key: PDFBOX-1241
 URL: https://issues.apache.org/jira/browse/PDFBOX-1241
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing, Text extraction
Affects Versions: 1.6.0
 Environment: All platforms affected
Reporter: Ernst Eibensteiner
 Attachments: On the Insert tab.pdf


 We came across PDF files that do not have an offset at the end of the file.
 This leads to the following exeption:
 c:\tmp java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 On the Insert 
 tab.pdf
 ExtractText failed with the following exception:
 java.io.IOException: Error: Expected an integer type, actual=''
 at 
 org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
 at 
 org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6
 63)
 at 
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464)
 at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978)
 at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196)
 at org.apache.pdfbox.ExtractText.main(ExtractText.java:76)
 at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
 While these PDFs are non-conforming, it'd be an improvement to allow them to 
 be read and processed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2042:
--

 Summary: ColorSpace without Range
 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc


I have PDF document where I am modifying PDPage content stream.
Saved document is invalid (Adobe reader complains about it).

I have narrowed it down to ColorSpace. 

Original document has colorspace:
/ColorSpace 
/Cs6 [/ICCBased 
/Alternate /DeviceRGB
/Filter /FlateDecode
/Length 2597
/N 3
]

Modified document has colorspace:
/ColorSpace 
/Cs6 [/ICCBased 
/Alternate /DeviceRGB
/Filter /FlateDecode
/Length 2597
/N 3
/Range []
]

When I manually remove /Range [] from PDF then Adobe reader opens it without 
an error.

Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: pdfbox18.pdf

Original (working) file.

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: pdfbox18.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: pdfbox20.pdf

Modified file in pdfbox 2.0.0 (error in Adobe Reader)

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


community bonding period

2014-04-23 Thread Tilman Hausherr
Although I'm only mentoring Shaola, maybe some of it is useful for 
Dimuthu as well:


From the mentors list:
===
We now are in the community bonding period [1] which lasts until May 19. 
During this period students should learn about your project, your 
release processes, the Apache Way, how we do things around here, 
interact with the community and close any knowledge gaps they might 
have. [1] 
http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html

===
Here's a FAQ about Apache:
https://www.apache.org/foundation/faq.html
IMHO most important are What is Apache about? and What is Apache not 
about?. (My personal addendum to that is Apache is not like 
Wikipedia. If you've ever edited in wikipedia, you'll notice the 
difference after a few days)


https://www.apache.org/foundation/how-it-works.html
The roles are simpler than in that text, all committers here are PMC 
members, and the PMC chair (Andreas) is also ASF member.


Only committers and above have write access to the official PDFBOX 
repository. So the best would be to set up a copy on an open source 
repository.

https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities

We're trying to be transparent. So stuff that deals with the 
implementation of the project should probably be in the ticket. To see 
what I mean, have a look at 
https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues. 
PDFBOX-615 started with I will be trying to add this functionality this 
week but it became a huge effort by several people that ended 4 years 
later :-) See also John's remarks about my code. It annoyed me somewhat 
at the beginning, but at the end it resulted in much better code.


Note that you can edit in JIRA. See an example here
https://issues.apache.org/jira/browse/PDFBOX-2039
i.e. you can modify previous posts.

Stuff that deals with PDFBOX in general is best in this (publicly 
readable) mailing list. The advantage is that others might answer you 
(if they want) when I'm working, sleeping, or not on the internet for 
whatever reason. Stuff that deals with java, svn and maven - e-mail me 
if you don't get the answer within a few minutes from google or from 
stackoverflow, i.e. don't waste time searching.


Using other libraries: this is OK as long as they have an Apache license 
or a compatible license (GPL is not). However we don't use many 
libraries, everything is already big, so if you want, ask first. (Sorry 
if you already mentioned a library, will reread your proposal again 
later) Of course it is always OK to temporary use whatever you want to 
just test a theory / strategy / algorithm.
Using other code: the code should rather be your own, but you can use 
small excerpts from stackoverflow.com etc but indicate it in your code 
with a link. Always comment in the code if you were inspired by other 
peoples code or algorithms or research papers, just look at the existing 
shading code for how I did it.


Don't forget the Apache header in new modules.

Your code should work on JDK5, so that we can use it in the 1.8 version 
too. So don't use diamond operators, lambda expressions or even 
String.isEmpty().


IDE: I recommend netbeans but you're free to use your own. Just make 
sure that svn (and whatever the hoster will use) and maven are 
integrated in it, this will make your life easier.


A personal recommendation from my student days in the 80ies: don't work 
all night. Such code was usually found to be poor/worthless after I had 
the much needed sleep.


Andreas: correct me if I forgot something.

Tilman



Re: community bonding period

2014-04-23 Thread John Hewson
Great advice!

-- John

 On 23 Apr 2014, at 15:49, Tilman Hausherr thaush...@t-online.de wrote:
 
 Although I'm only mentoring Shaola, maybe some of it is useful for Dimuthu as 
 well:
 
 From the mentors list:
 ===
 We now are in the community bonding period [1] which lasts until May 19. 
 During this period students should learn about your project, your release 
 processes, the Apache Way, how we do things around here, interact with the 
 community and close any knowledge gaps they might have. [1] 
 http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
 ===
 Here's a FAQ about Apache:
 https://www.apache.org/foundation/faq.html
 IMHO most important are What is Apache about? and What is Apache not 
 about?. (My personal addendum to that is Apache is not like Wikipedia. If 
 you've ever edited in wikipedia, you'll notice the difference after a few 
 days)
 
 https://www.apache.org/foundation/how-it-works.html
 The roles are simpler than in that text, all committers here are PMC members, 
 and the PMC chair (Andreas) is also ASF member.
 
 Only committers and above have write access to the official PDFBOX 
 repository. So the best would be to set up a copy on an open source 
 repository.
 https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities
 
 We're trying to be transparent. So stuff that deals with the implementation 
 of the project should probably be in the ticket. To see what I mean, have a 
 look at https://issues.apache.org/jira/browse/PDFBOX-615 and the related 
 issues. PDFBOX-615 started with I will be trying to add this functionality 
 this week but it became a huge effort by several people that ended 4 years 
 later :-) See also John's remarks about my code. It annoyed me somewhat at 
 the beginning, but at the end it resulted in much better code.
 
 Note that you can edit in JIRA. See an example here
 https://issues.apache.org/jira/browse/PDFBOX-2039
 i.e. you can modify previous posts.
 
 Stuff that deals with PDFBOX in general is best in this (publicly readable) 
 mailing list. The advantage is that others might answer you (if they want) 
 when I'm working, sleeping, or not on the internet for whatever reason. Stuff 
 that deals with java, svn and maven - e-mail me if you don't get the answer 
 within a few minutes from google or from stackoverflow, i.e. don't waste time 
 searching.
 
 Using other libraries: this is OK as long as they have an Apache license or a 
 compatible license (GPL is not). However we don't use many libraries, 
 everything is already big, so if you want, ask first. (Sorry if you already 
 mentioned a library, will reread your proposal again later) Of course it is 
 always OK to temporary use whatever you want to just test a theory / strategy 
 / algorithm.
 Using other code: the code should rather be your own, but you can use small 
 excerpts from stackoverflow.com etc but indicate it in your code with a link. 
 Always comment in the code if you were inspired by other peoples code or 
 algorithms or research papers, just look at the existing shading code for how 
 I did it.
 
 Don't forget the Apache header in new modules.
 
 Your code should work on JDK5, so that we can use it in the 1.8 version too. 
 So don't use diamond operators, lambda expressions or even String.isEmpty().
 
 IDE: I recommend netbeans but you're free to use your own. Just make sure 
 that svn (and whatever the hoster will use) and maven are integrated in it, 
 this will make your life easier.
 
 A personal recommendation from my student days in the 80ies: don't work all 
 night. Such code was usually found to be poor/worthless after I had the much 
 needed sleep.
 
 Andreas: correct me if I forgot something.
 
 Tilman
 


[jira] [Closed] (PDFBOX-1241) Better handle of missing offset at the end of a file

2014-04-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-1241.
---

   Resolution: Fixed
Fix Version/s: 1.8.3

I tested with old versions, it failed until before 1.8.3. Since 1.8.3. has 
already been released, I assume I should close it and not just set it do 
resolved.

 Better handle of missing offset at the end of a file
 

 Key: PDFBOX-1241
 URL: https://issues.apache.org/jira/browse/PDFBOX-1241
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing, Text extraction
Affects Versions: 1.6.0
 Environment: All platforms affected
Reporter: Ernst Eibensteiner
 Fix For: 1.8.3

 Attachments: On the Insert tab.pdf


 We came across PDF files that do not have an offset at the end of the file.
 This leads to the following exeption:
 c:\tmp java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 On the Insert 
 tab.pdf
 ExtractText failed with the following exception:
 java.io.IOException: Error: Expected an integer type, actual=''
 at 
 org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
 at 
 org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6
 63)
 at 
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464)
 at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
 at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978)
 at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196)
 at org.apache.pdfbox.ExtractText.main(ExtractText.java:76)
 at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
 While these PDFs are non-conforming, it'd be an improvement to allow them to 
 be read and processed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable

2014-04-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2039.
-

Resolution: Fixed
  Assignee: Tilman Hausherr

Done in rev 1589459 and 1589467 in the trunk and rev 1589465 in the 1.8 branch. 
Thanks for pointing me to this!

 Class PDDocument should implement java.io.Closeable
 ---

 Key: PDFBOX-2039
 URL: https://issues.apache.org/jira/browse/PDFBOX-2039
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Andrei Solntsev
Assignee: Tilman Hausherr
Priority: Minor
 Fix For: 1.8.5, 2.0.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 It would make it possible to use Java 7 try-with-resources feature:
 try (PDDocument doc = PDDocument.load(outputFile)) {
   // bla-bla
   // no need to call doc.close(); explicitly
 }
 P.S. Actually all org.apache.pdfbox.* classes with method close() could 
 implement java.io.Closeable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: community bonding period

2014-04-23 Thread DImuthu Upeksha
Hi Tilman,
Thanks for the information. That helped me a lot. I'll work accordingly.

On Wed, Apr 23, 2014 at 9:14 PM, John Hewson j...@jahewson.com wrote:
 Great advice!

 -- John

 On 23 Apr 2014, at 15:49, Tilman Hausherr thaush...@t-online.de wrote:

 Although I'm only mentoring Shaola, maybe some of it is useful for Dimuthu 
 as well:

 From the mentors list:
 ===
 We now are in the community bonding period [1] which lasts until May 19. 
 During this period students should learn about your project, your release 
 processes, the Apache Way, how we do things around here, interact with the 
 community and close any knowledge gaps they might have. [1] 
 http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
 ===
 Here's a FAQ about Apache:
 https://www.apache.org/foundation/faq.html
 IMHO most important are What is Apache about? and What is Apache not 
 about?. (My personal addendum to that is Apache is not like Wikipedia. If 
 you've ever edited in wikipedia, you'll notice the difference after a few 
 days)

 https://www.apache.org/foundation/how-it-works.html
 The roles are simpler than in that text, all committers here are PMC 
 members, and the PMC chair (Andreas) is also ASF member.

 Only committers and above have write access to the official PDFBOX 
 repository. So the best would be to set up a copy on an open source 
 repository.
 https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities

 We're trying to be transparent. So stuff that deals with the implementation 
 of the project should probably be in the ticket. To see what I mean, have a 
 look at https://issues.apache.org/jira/browse/PDFBOX-615 and the related 
 issues. PDFBOX-615 started with I will be trying to add this functionality 
 this week but it became a huge effort by several people that ended 4 years 
 later :-) See also John's remarks about my code. It annoyed me somewhat at 
 the beginning, but at the end it resulted in much better code.

 Note that you can edit in JIRA. See an example here
 https://issues.apache.org/jira/browse/PDFBOX-2039
 i.e. you can modify previous posts.

 Stuff that deals with PDFBOX in general is best in this (publicly readable) 
 mailing list. The advantage is that others might answer you (if they want) 
 when I'm working, sleeping, or not on the internet for whatever reason. 
 Stuff that deals with java, svn and maven - e-mail me if you don't get the 
 answer within a few minutes from google or from stackoverflow, i.e. don't 
 waste time searching.

 Using other libraries: this is OK as long as they have an Apache license or 
 a compatible license (GPL is not). However we don't use many libraries, 
 everything is already big, so if you want, ask first. (Sorry if you already 
 mentioned a library, will reread your proposal again later) Of course it is 
 always OK to temporary use whatever you want to just test a theory / 
 strategy / algorithm.
 Using other code: the code should rather be your own, but you can use small 
 excerpts from stackoverflow.com etc but indicate it in your code with a 
 link. Always comment in the code if you were inspired by other peoples 
 code or algorithms or research papers, just look at the existing shading 
 code for how I did it.

 Don't forget the Apache header in new modules.

 Your code should work on JDK5, so that we can use it in the 1.8 version too. 
 So don't use diamond operators, lambda expressions or even String.isEmpty().

 IDE: I recommend netbeans but you're free to use your own. Just make sure 
 that svn (and whatever the hoster will use) and maven are integrated in it, 
 this will make your life easier.

 A personal recommendation from my student days in the 80ies: don't work all 
 night. Such code was usually found to be poor/worthless after I had the much 
 needed sleep.

 Andreas: correct me if I forgot something.

 Tilman




-- 
Regards

W.Dimuthu Upeksha
Undergraduate

Department of Computer Science And Engineering

University of Moratuwa, Sri Lanka


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978953#comment-13978953
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/23/14 9:49 PM:
--

1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then 
trash characters. Deleting all after that one makes the file much smaller, 
518KB instead of 4,85MB. How did you get that file?!
2. I am able to render it. Your jpg file looks like it was cut off at some time.
3. The 2.0 version isn't able to open it with the non sequential parser, the 
sequential parser can open it.
4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, 
maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 
version that I use for myself.



 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979011#comment-13979011
 ] 

Tilman Hausherr commented on PDFBOX-2042:
-

It would be helpful if you provide the code that modifies the content stream. I 
couldn't reproduce the problem by just opening the document and saving it. But 
your theory about PDICCBased.getRangeArray(0) makes sense, that one would 
return an empty range array for the 0 parameter.

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr commented on PDFBOX-2041:
-

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

I don't have Ubuntu, so someone else will have to answer that.

 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)