Re: [VOTE] Release Apache PDFBox 1.8.0

2013-03-20 Thread Thomas Chojecki

+1

looking at the changelog, IMO there should be more releases. ;-)

Am 19.03.2013 20:07, schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 1.8.0 release is available at:

http://people.apache.org/~lehmi/pdfbox/1.8.0/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/pdfbox/tags/1.8.0/

The SHA1 checksum of the archive is 
f97856ac7187e86e460b4f14eb6462787ac89776.


Please vote on releasing this package as Apache PDFBox 1.8.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 1.8.0
[ ] -1 Do not release this package because...


Here is my +1

BR
Andreas Lehmkühler


Re: [VOTE] Release Apache PDFBox 1.8.0

2013-03-20 Thread Maruan Sahyoun
+1 

PDFBOX-1335 can be set to closed too.

Thank you Andreas for setting up the release!

Maruan Sahyoun

Am 19.03.2013 um 20:07 schrieb Andreas Lehmkuehler :

> Hi,
> 
> a candidate for the PDFBox 1.8.0 release is available at:
> 
>http://people.apache.org/~lehmi/pdfbox/1.8.0/
> 
> The release candidate is a zip archive of the sources in:
> 
>http://svn.apache.org/repos/asf/pdfbox/tags/1.8.0/
> 
> The SHA1 checksum of the archive is f97856ac7187e86e460b4f14eb6462787ac89776.
> 
> Please vote on releasing this package as Apache PDFBox 1.8.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>[ ] +1 Release this package as Apache PDFBox 1.8.0
>[ ] -1 Do not release this package because...
> 
> 
> Here is my +1
> 
> BR
> Andreas Lehmkühler



Re: [VOTE] Release Apache PDFBox 1.8.0

2013-03-20 Thread Glen Peterson
+1

Works great for me as a drop-in replacement for 1.7.  Nice work!

On Wed, Mar 20, 2013 at 4:20 AM, Maruan Sahyoun  wrote:
> +1
>
> PDFBOX-1335 can be set to closed too.
>
> Thank you Andreas for setting up the release!
>
> Maruan Sahyoun
>
> Am 19.03.2013 um 20:07 schrieb Andreas Lehmkuehler :
>
>> Hi,
>>
>> a candidate for the PDFBox 1.8.0 release is available at:
>>
>>http://people.apache.org/~lehmi/pdfbox/1.8.0/
>>
>> The release candidate is a zip archive of the sources in:
>>
>>http://svn.apache.org/repos/asf/pdfbox/tags/1.8.0/
>>
>> The SHA1 checksum of the archive is f97856ac7187e86e460b4f14eb6462787ac89776.
>>
>> Please vote on releasing this package as Apache PDFBox 1.8.0.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 PDFBox PMC votes are cast.
>>
>>[ ] +1 Release this package as Apache PDFBox 1.8.0
>>[ ] -1 Do not release this package because...
>>
>>
>> Here is my +1
>>
>> BR
>> Andreas Lehmkühler
>



-- 
Glen K. Peterson
(828) 393-0081


Re: What's wrong with this font ?

2013-03-20 Thread Andreas Lehmkühler
Hi,


"Sébastien Dailly"  hat am 20. März 2013 um
11:45 geschrieben:
> Hello,
>
> I've got a problem while reading the attached document. (It has been
> deflated, anonymised, text has been removed, and character shuffled).
>
> The text extraction works fine with some pdf reader (I tried with
> Acrobat and Evince), but the text read by pdfbox is not the expected
> one, as if pdfbox is using a wrong font description for reading the text
> : instead of
>
>
> > 60CO L4PU7L
>  > 03D4 DR DVGWEWNER5L STLERC
> > MLIPHOAP6 AE0TE
>
> I've got
>
> > UvIKGMuK6RuN0TN
> > 0 E4RREDRRRElPéNéOND5vRRrTvNDp
> > 60pMRRRv4KS7v
>
>
> I'm using pdfbox 1.6.0 for that.
Please update to a more recent version like 1.7.1. or wait some more days as the
release
process for the all new 1.8.0 version just started yesterday.

> Is the document invalid ? What can I do for reading correctly the document ?
If after upgrading to a more recent version the issue still persists create an
issue
on JIRA [1] and attach the pdf in question to it.

P.S.: Ensure that you are correctly subscribed to the mailing list [2] otherwise
you won't
get any answers.

> Thanks !
>
> --
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL

BR
Andreas Lehkühler
[1] https://issues.apache.org/jira/browse/PDFBOX
[2] http://pdfbox.apache.org/mail-lists.html


Re: What's wrong with this font ?

2013-03-20 Thread Maruan Sahyoun
Hi,

using the latest version of pdfbox (1.7.1) that's what I got

MLIPHOAP6 AE0TE
03D4  DR   DVGWEWNER5L  STLERC
60CO   L4PU7L

Please give it a try.

Maruan Sahyoun


Am 20.03.2013 um 11:45 schrieb Sébastien Dailly 
:

> Hello,
> 
> I've got a problem while reading the attached document. (It has been 
> deflated, anonymised, text has been removed, and character shuffled).
> 
> The text extraction works fine with some pdf reader (I tried with Acrobat and 
> Evince), but the text read by pdfbox is not the expected one, as if pdfbox is 
> using a wrong font description for reading the text : instead of
> 
> 
>> 60CO L4PU7L
> > 03D4 DR DVGWEWNER5L STLERC
>> MLIPHOAP6 AE0TE
> 
> I've got
> 
>> UvIKGMuK6RuN0TN
>> 0 E4RREDRRRElPéNéOND5vRRrTvNDp
>> 60pMRRRv4KS7v
> 
> 
> I'm using pdfbox 1.6.0 for that.
> 
> Is the document invalid ? What can I do for reading correctly the document ?
> 
> Thanks !
> 
> -- 
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL
> 



Re: What's wrong with this font ?

2013-03-20 Thread Sébastien Dailly

Le 20/03/2013 11:57, Maruan Sahyoun a écrit :

Hi,

using the latest version of pdfbox (1.7.1) that's what I got

MLIPHOAP6 AE0TE
03D4  DR   DVGWEWNER5L  STLERC
60CO   L4PU7L

Please give it a try.



Thanks for answering so quickly.

Sorry for the noise, I should have begun with the last pdfbox version. 
I'll upgrade and run some tests with the new library.



--
Sébastien Dailly
+33 1 56 29 78 67
ELETTERMAIL


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread MH (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607598#comment-13607598
 ] 

MH commented on PDFBOX-1176:


Well, it would be at least a big step forward if adding a watermark text would 
work. I tried with PDPageContentStream.drawString() to get at least some kind 
of workaround. (The examples I have found to add an Image via 
PDFContent.addImage() is no option for us.) But the first problem is, that this 
text overlaps the content and the text is always in black - even though I set 
cs.setStrokeingColor(Color.RED). My code:

---
final PDPageContentStream cs = new PDPageContentStream(doc, page, true, false, 
true); //4th parameter = resetContext
try {

//simple text above the page:
cs.beginText();
cs.setStrokingColor(fontColor); //doesn't work
cs.setFont(font, fontSize);
cs.moveTo(10, 10);
//cs.moveTextPositionByAmount(10, 10);
//cs.setTextTranslation((pageSize.getWidth() / 2.0) - (stringWidth / 2.0), 
(pageSize.getHeight() / 2.0) - (fontSize / 2.0));
cs.setTextRotation((double) 0.2, (pageSize.getWidth() / 2.0) - (stringWidth 
/ 2.0), (pageSize.getHeight() / 2.0) - (fontSize / 2.0));
//cs.setTextScaling(10.0, 10.0, (pageSize.getWidth() / 2.0) - (stringWidth 
/ 2.0), (pageSize.getHeight() / 2.0) - (fontSize / 2.0));
cs.setStrokingColor(Color.BLUE); //doesn't work
cs.drawString(text);
cs.endText();
} finally {
if (cs != null) {
cs.close();
}
}
-

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607654#comment-13607654
 ] 

Maruan Sahyoun commented on PDFBOX-1176:


Try this

PDDocument document = PDDocument.load(  );
PDPage page = (PDPage) 
document.getDocumentCatalog().getAllPages().get(0);

// The transparency, opacity of graphic objects can be set directly 
on the drawing commands
// but need to be set to a graphic state which will become part of 
the 
// resources.
 
/* Set up the graphic state */

// Define a new extended graphic state
PDExtendedGraphicsState extendedGraphicsState = new 
PDExtendedGraphicsState();
// Set the transparency/opacity
extendedGraphicsState.setNonStrokingAlphaConstant(0.5f);
// Get the page resources.
PDResources resources = page.findResources();

// Get the defined graphic states.
Map graphicsStateDictionary = resources.getGraphicsStates();

graphicsStateDictionary.put("TransparentState", 
extendedGraphicsState);
resources.setGraphicsStates(graphicsStateDictionary);

/* End of setup */

PDFont font = PDType1Font.HELVETICA;

// Now we will be able to call the state definition before doing 
the drawing
PDPageContentStream contentStream = new 
PDPageContentStream(document, page,true,true);  
contentStream.appendRawCommands("/TransparentState gs\n");
contentStream.setNonStrokingColor(Color.yellow);
contentStream.beginText();
contentStream.setFont( PDType1Font.HELVETICA, 72 );
contentStream.moveTextPositionByAmount( 10, 10 );
contentStream.setTextRotation(1,100,100);
contentStream.drawString( "Watermark" );
contentStream.endText();
contentStream.close();

document.save("watermark.pdf");

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread MH (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607704#comment-13607704
 ] 

MH commented on PDFBOX-1176:


This leads to a NullPointerException at 

graphicsStateDictionary.put("TransparentState", extendedGraphicsState);

because resources.getGraphicsStates(); returns null!

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607720#comment-13607720
 ] 

Maruan Sahyoun commented on PDFBOX-1176:


Well, I omitted the checks for simplicity. This is not production strength but 
should illustrate a potential approach!

if (graphicsStateDictionary == null){
graphicsStateDictionary = new TreeMap();
}

There are more checks needed e.g.. page.findResources() could be null 

And, if it's the same watermark on every page, doing an overlay/underlay would 
be better as the 'object' is then defined only once and reused. Look at the 
OverlayPDF command line tool to see how this can be done.



> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PDFBOX-1176) Watermark

2013-03-20 Thread MH (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607704#comment-13607704
 ] 

MH edited comment on PDFBOX-1176 at 3/20/13 3:31 PM:
-

This leads to a NullPointerException at 

graphicsStateDictionary.put("TransparentState", extendedGraphicsState);

because resources.getGraphicsStates(); returns null! I worked around this via

Map graphicsStateDictionary = 
resources.getGraphicsStates(); //returns null !!!
if (graphicsStateDictionary != null) {
graphicsStateDictionary.put("TransparentState", 
extendedGraphicsState);

resources.setGraphicsStates(graphicsStateDictionary);
} else {
Map m = new 
HashMap<>();
m.put("TransparentState", 
extendedGraphicsState);
resources.setGraphicsStates(m);
}

and this seems to work: the text is drawn "behind" - but still in color black - 
cs.setStrokingColor(Color.RED) does nothing! 

  was (Author: mhilpert):
This leads to a NullPointerException at 

graphicsStateDictionary.put("TransparentState", extendedGraphicsState);

because resources.getGraphicsStates(); returns null!
  
> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread MH (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607722#comment-13607722
 ] 

MH commented on PDFBOX-1176:


Overlay doesn't work for my PDFs: UnsupportedOperationException: Layout pages 
with COSArray currently not supported.

So, the only problem left is the font color.

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607728#comment-13607728
 ] 

Maruan Sahyoun commented on PDFBOX-1176:


as you might have seen in the sample code you need to use 
contentStream.setNonStrokingColor(Color.yellow);

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread MH (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607765#comment-13607765
 ] 

MH commented on PDFBOX-1176:


setNonStrokingColor() ... how intuitive! 

So, the visual output is like a watermark - but it's a transparent text on each 
page. Better than nothing. I wonder if the same can be by drawing the text to 
an "underlay"?

> Watermark
> -
>
> Key: PDFBOX-1176
> URL: https://issues.apache.org/jira/browse/PDFBOX-1176
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Rubesh MX
>  Labels: Watermark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am checking if watermarks can  be added to a PDF doc and the same way can 
> be removed, so far I could not find any option to do that with PDFBox; It 
> will be better if we have an option to add and remove watermak to a PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (PDFBOX-1176) Watermark

2013-03-20 Thread Maruan Sahyoun
can we move the discussion to the us...@pdfbox.apache.org mailing list?

Maruan Sahyoun

Am 20.03.2013 um 17:01 schrieb "MH (JIRA)" :

> 
>   [ 
> https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607765#comment-13607765
>  ] 
> 
> MH commented on PDFBOX-1176:
> 
> 
> setNonStrokingColor() ... how intuitive! 
> 
> So, the visual output is like a watermark - but it's a transparent text on 
> each page. Better than nothing. I wonder if the same can be by drawing the 
> text to an "underlay"?
> 
>> Watermark
>> -
>> 
>>   Key: PDFBOX-1176
>>   URL: https://issues.apache.org/jira/browse/PDFBOX-1176
>>   Project: PDFBox
>>Issue Type: Wish
>>  Reporter: Rubesh MX
>>Labels: Watermark
>> Original Estimate: 24h
>> Remaining Estimate: 24h
>> 
>> I am checking if watermarks can  be added to a PDF doc and the same way can 
>> be removed, so far I could not find any option to do that with PDFBox; It 
>> will be better if we have an option to add and remove watermak to a PDF.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1544) Not able to loadNonSeq document larger than 2GB

2013-03-20 Thread Pierre Huttin (JIRA)
Pierre Huttin created PDFBOX-1544:
-

 Summary: Not able to loadNonSeq document larger than 2GB
 Key: PDFBOX-1544
 URL: https://issues.apache.org/jira/browse/PDFBOX-1544
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing, PDModel
Affects Versions: 1.7.1
Reporter: Pierre Huttin


When I try to open open a document larger than 2GB (I have test with a 21GB 
document) using the method PDDocument.loadNonSeq(). The PDFParser trigger me 
the following error:

Exception in thread "main" java.io.IOException: Error: Expected an integer 
type, actual='22580639698'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1608) 
   
at 
org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:677)

at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:237)
at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:574)
   
at 
org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1124)   

at 
org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1107)   


the problem seems to come from BaseParser which try to return int type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2013-03-20 Thread MartinV (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MartinV updated PDFBOX-1545:


Priority: Major  (was: Minor)

> ReplaceString fails to replace text, however RemoveText or TextExtraction 
> works fine
> 
>
> Key: PDFBOX-1545
> URL: https://issues.apache.org/jira/browse/PDFBOX-1545
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.7.1
> Environment: ubuntu 32bit, Java 6
>Reporter: MartinV
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
> in this pdf :
> https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
> (anyone with link can view and download it...)
> As i found during iteration in "Tj" and "tj" operations :
>  COSString previous = (COSString)tokens.get( j-1 );
>  String string = previous.getString();
> Those strings are just empty or with length of 2 (some whitespaces only) so 
> cannot be replaced.
> I tried this on version 1.7.1 and then i download latest code from SVN 
> (today) and both version had the same behaviour. I my PDF special in any way 
> or which objects should be explored next ? I tried another two PDF downloaded 
> from google drive and both had the same issue (maybe google formats PDF in 
> special way ?).
> I am suprised that RemoveText works fine in this PDF and also test extraction 
> give me good result - so there must be a way... Thank you
> PS: I don`t mind to fix bug on my own it but i do not have any significant 
> knowledge of internal PDF structure. Hints welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2013-03-20 Thread MartinV (JIRA)
MartinV created PDFBOX-1545:
---

 Summary: ReplaceString fails to replace text, however RemoveText 
or TextExtraction works fine
 Key: PDFBOX-1545
 URL: https://issues.apache.org/jira/browse/PDFBOX-1545
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.1
 Environment: ubuntu 32bit, Java 6
Reporter: MartinV
Priority: Minor


org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in 
this pdf :

https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
(anyone with link can view and download it...)

As i found during iteration in "Tj" and "tj" operations :
 COSString previous = (COSString)tokens.get( j-1 );
 String string = previous.getString();
Those strings are just empty or with length of 2 (some whitespaces only) so 
cannot be replaced.

I tried this on version 1.7.1 and then i download latest code from SVN (today) 
and both version had the same behaviour. I my PDF special in any way or which 
objects should be explored next ? I tried another two PDF downloaded from 
google drive and both had the same issue (maybe google formats PDF in special 
way ?).

I am suprised that RemoveText works fine in this PDF and also test extraction 
give me good result - so there must be a way... Thank you

PS: I don`t mind to fix bug on my own it but i do not have any significant 
knowledge of internal PDF structure. Hints welcomed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache PDFBox 1.8.0

2013-03-20 Thread Guillaume Bailleul
+1

I tested preflight with my PDF/A set, it works as expected.

Guillaume

On Wed, Mar 20, 2013 at 11:19 AM, Glen Peterson  wrote:
> +1
>
> Works great for me as a drop-in replacement for 1.7.  Nice work!
>
> On Wed, Mar 20, 2013 at 4:20 AM, Maruan Sahyoun  
> wrote:
>> +1
>>
>> PDFBOX-1335 can be set to closed too.
>>
>> Thank you Andreas for setting up the release!
>>
>> Maruan Sahyoun
>>
>> Am 19.03.2013 um 20:07 schrieb Andreas Lehmkuehler :
>>
>>> Hi,
>>>
>>> a candidate for the PDFBox 1.8.0 release is available at:
>>>
>>>http://people.apache.org/~lehmi/pdfbox/1.8.0/
>>>
>>> The release candidate is a zip archive of the sources in:
>>>
>>>http://svn.apache.org/repos/asf/pdfbox/tags/1.8.0/
>>>
>>> The SHA1 checksum of the archive is 
>>> f97856ac7187e86e460b4f14eb6462787ac89776.
>>>
>>> Please vote on releasing this package as Apache PDFBox 1.8.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 PDFBox PMC votes are cast.
>>>
>>>[ ] +1 Release this package as Apache PDFBox 1.8.0
>>>[ ] -1 Do not release this package because...
>>>
>>>
>>> Here is my +1
>>>
>>> BR
>>> Andreas Lehmkühler
>>
>
>
>
> --
> Glen K. Peterson
> (828) 393-0081


[jira] [Updated] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2013-03-20 Thread MartinV (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MartinV updated PDFBOX-1545:


Description: 
org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in 
this pdf :

https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
(anyone with link can view and download it...)

As i found during iteration in "Tj" and "tj" operations :
 COSString previous = (COSString)tokens.get( j-1 );
 String string = previous.getString();
Those strings are just empty or with length of 2 (some whitespaces only) ... i 
would expect to get some separated group of words from my PDF.

I tried this on version 1.7.1 and then i download latest code from SVN (today) 
and both version had the same behaviour. I my PDF special in any way or which 
objects should be explored next ? I tried another two PDF downloaded from 
google drive and both had the same issue (maybe google formats PDF in special 
way ?).

I am suprised that RemoveText works fine in this PDF and also test extraction 
give me good result - so there must be a way... Thank you

PS: I don`t mind to fix bug on my own it but i do not have any significant 
knowledge of internal PDF structure. Hints welcomed.


  was:
org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in 
this pdf :

https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
(anyone with link can view and download it...)

As i found during iteration in "Tj" and "tj" operations :
 COSString previous = (COSString)tokens.get( j-1 );
 String string = previous.getString();
Those strings are just empty or with length of 2 (some whitespaces only) so 
cannot be replaced.

I tried this on version 1.7.1 and then i download latest code from SVN (today) 
and both version had the same behaviour. I my PDF special in any way or which 
objects should be explored next ? I tried another two PDF downloaded from 
google drive and both had the same issue (maybe google formats PDF in special 
way ?).

I am suprised that RemoveText works fine in this PDF and also test extraction 
give me good result - so there must be a way... Thank you

PS: I don`t mind to fix bug on my own it but i do not have any significant 
knowledge of internal PDF structure. Hints welcomed.



> ReplaceString fails to replace text, however RemoveText or TextExtraction 
> works fine
> 
>
> Key: PDFBOX-1545
> URL: https://issues.apache.org/jira/browse/PDFBOX-1545
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.7.1
> Environment: ubuntu 32bit, Java 6
>Reporter: MartinV
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
> in this pdf :
> https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
> (anyone with link can view and download it...)
> As i found during iteration in "Tj" and "tj" operations :
>  COSString previous = (COSString)tokens.get( j-1 );
>  String string = previous.getString();
> Those strings are just empty or with length of 2 (some whitespaces only) ... 
> i would expect to get some separated group of words from my PDF.
> I tried this on version 1.7.1 and then i download latest code from SVN 
> (today) and both version had the same behaviour. I my PDF special in any way 
> or which objects should be explored next ? I tried another two PDF downloaded 
> from google drive and both had the same issue (maybe google formats PDF in 
> special way ?).
> I am suprised that RemoveText works fine in this PDF and also test extraction 
> give me good result - so there must be a way... Thank you
> PS: I don`t mind to fix bug on my own it but i do not have any significant 
> knowledge of internal PDF structure. Hints welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1335) ArrayIndexOutOfBoundsException while loading ttf font

2013-03-20 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-1335:
---

Fix Version/s: 1.8.0

> ArrayIndexOutOfBoundsException while loading ttf font
> -
>
> Key: PDFBOX-1335
> URL: https://issues.apache.org/jira/browse/PDFBOX-1335
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.0
>Reporter: Mirek Hankus
> Fix For: 1.8.0
>
>
> While loading TTF font I'm getting exception (below). That font is 
> OpenSans-Regular.ttf from http://www.google.com/webfonts
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 931
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:360)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:166)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:142)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PDFBOX-1335) ArrayIndexOutOfBoundsException while loading ttf font

2013-03-20 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun resolved PDFBOX-1335.


Resolution: Fixed

Testing 

PDDocument document = new PDDocument();
PDFont font = PDTrueTypeFont.loadTTF(document, "OpenSans-Regular.ttf" );
System.out.println(font.getFontDescriptor().getFontFamily());

with pdfbox-1.7.0 I was able to replicate the error message.

Doing the same with pdfbox-1.8.0 the error message is gone and the font family 
is correctly printed as 'Open Sans'.

Closing the issue.

> ArrayIndexOutOfBoundsException while loading ttf font
> -
>
> Key: PDFBOX-1335
> URL: https://issues.apache.org/jira/browse/PDFBOX-1335
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.0
>Reporter: Mirek Hankus
> Fix For: 1.8.0
>
>
> While loading TTF font I'm getting exception (below). That font is 
> OpenSans-Regular.ttf from http://www.google.com/webfonts
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 931
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:360)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:166)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:142)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadTTF(PDTrueTypeFont.java:129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira