PDFBox 3.0.1 renderer fails on certain files

2023-12-15 Thread John Lussmyer
I have a customer that uses a LOT of PDF files.  They currently have 2 
files that are failing when we try to render them.
The same files can be viewed with Acrobat Reader or Foxit PDF with no 
errors reported.


From Acrobat Reader file info:
PDF Producer: PDFOut V3.8 – build 201 – Oct 28 2022
PDF Version: 1.6 (Acrobat 7.x)

The stacktrace makes me suspect that the file has an error in it's image 
compression data - which other readers somehow ignore.


Any suggestions?

This is the exception trace from PDFBox 3.0.1

java.io.IOException: negative array index: -1 near offset 1
   at 
org.apache.pdfbox.filter.LZWFilter.checkIndexBounds(LZWFilter.java:136) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:110) 
~[pdfbox-3.0.1.jar:3.0.1]
   at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:70) 
~[pdfbox-3.0.1.jar:3.0.1]
   at org.apache.pdfbox.filter.Filter.decode(Filter.java:96) 
~[pdfbox-3.0.1.jar:3.0.1]
   at org.apache.pdfbox.filter.Filter.decode(Filter.java:238) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:73) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:172) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:166) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:188) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.common.PDStream.toByteArray(PDStream.java:407) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.common.function.PDFunctionType4.(PDFunctionType4.java:51) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.common.function.PDFunction.create(PDFunction.java:143) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.graphics.color.PDDeviceN.(PDDeviceN.java:93) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:223) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:193) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorSpace.process(SetNonStrokingColorSpace.java:56) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:892) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:530) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:505) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:282) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:330) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:247) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:233) 
~[pdfbox-3.0.1.jar:3.0.1]
   at 
com.metrixsoftware.preview.PDFBoxRenderer.render(PDFBoxRenderer.java:79) 
[bin/:?]




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Odd OCG error

2023-11-22 Thread John Lussmyer
Thanks, that really helps.  Since we are too close to release to try a 
newer PDFBox jar,
I just added this little bit of code to our system so these PDF's will 
work. (the if statement before creating the "PDOptionalContentGroup".)



    if (!dict.getItem(COSName.TYPE).equals(COSName.OCG)) {
        dict.setItem(COSName.TYPE, COSName.OCG);
    }
    PDOptionalContentGroup grp = new PDOptionalContentGroup(dict);


On 11/21/2023 10:52 PM, Andreas Lehmkühler wrote:


Am 21.11.23 um 21:26 schrieb John Lussmyer:

Ugh, formatting mess.
For more info, this is the "addOCGs:OCG" log line just before the 
error message:


10:53:09.765 [etrix SwingWorker[0]] DEBUG ImposedPDFEngine - addOCGs: 
OCG 
COSDictionary{COSName{Name}:COSObject{COSNull{}};COSName{Type}:COSObject{COSName{OCG}};}
The value for the type is an indirect object. Usally such values are 
direct objects. The type check fails as it expects a direct object as 
type value.




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Odd OCG error

2023-11-21 Thread John Lussmyer

Ugh, formatting mess.
For more info, this is the "addOCGs:OCG" log line just before the error 
message:


10:53:09.765 [etrix SwingWorker[0]] DEBUG ImposedPDFEngine - addOCGs: 
OCG 
COSDictionary{COSName{Name}:COSObject{COSNull{}};COSName{Type}:COSObject{COSName{OCG}};}


On 11/21/2023 10:56 AM, John Lussmyer wrote:
I'm using PDFBox 3.0.0 to combine some PDF files.  One of the files 
uses an Optional Content Group.
Note that this code has been working just fine for many other files 
both with and without OCG's.


For this file, I get this exception:

java.lang.IllegalArgumentException: Provided dictionary is not of type 
'COSName{OCG}'


    at 
org.apache.pdfbox.pdmodel.graphics.optionalcontent.PDOptionalContentGroup.(PDOptionalContentGroup.java:48) 
~[pdfbox-3.0.0.jar:3.0.0]


Code:

*if*(obj*instanceof*COSDictionary) {

COSDictionary dict= (COSDictionary) obj;

COSName dType= dict.getCOSName(COSName.*/TYPE/*);

*if*(dType== *null*) {

*continue*;

}

*if*(dType.equals(COSName.*/OCG/*)) {

*/log/*.debug("addOCGs: OCG {}", dict);

PDOptionalContentGroup grp= *new*PDOptionalContentGroup(dict);

ocProps.addGroup(grp);

ocProps.setGroupEnabled(grp, layersON.contains(grp.getName()));

changed= *true*;

}

}

 It's failing on the "new PDOptionalContentGroup(dict)" call.
Any ideas on why?


Odd OCG error

2023-11-21 Thread John Lussmyer
I'm using PDFBox 3.0.0 to combine some PDF files.  One of the files uses 
an Optional Content Group.
Note that this code has been working just fine for many other files both 
with and without OCG's.


For this file, I get this exception:

java.lang.IllegalArgumentException: Provided dictionary is not of type 
'COSName{OCG}'


    at 
org.apache.pdfbox.pdmodel.graphics.optionalcontent.PDOptionalContentGroup.(PDOptionalContentGroup.java:48) 
~[pdfbox-3.0.0.jar:3.0.0]


Code:

*if*(obj*instanceof*COSDictionary) {

COSDictionary dict= (COSDictionary) obj;

COSName dType= dict.getCOSName(COSName.*/TYPE/*);

*if*(dType== *null*) {

*continue*;

}

*if*(dType.equals(COSName.*/OCG/*)) {

*/log/*.debug("addOCGs: OCG {}", dict);

PDOptionalContentGroup grp= *new*PDOptionalContentGroup(dict);

ocProps.addGroup(grp);

ocProps.setGroupEnabled(grp, layersON.contains(grp.getName()));

changed= *true*;

}

}

 It's failing on the "new PDOptionalContentGroup(dict)" call.
Any ideas on why?


Re: PDF 2.0, PDF/A-4 support

2023-11-08 Thread John Lussmyer

On 11/8/2023 5:28 PM, Peter Wyatt wrote:

I would think supporting the following PDF 2.0 features are highly relevant, 
given that other implementations are already generating PDF 2.0 files today 
(seehttps://pdfa.org/supporting-pdf20/)


A bunch of useful suggestions elided..

What I REALLY REALLY need is support for Overprint and Knockout when 
Rendering to an image. I run into too many PDF's that are unrecognizable 
when generating an image due to this.


Re: Looking for a Debugger that can show which incremental save an object belongs to

2023-10-06 Thread John Lussmyer

I doubt there is a way.
It's most likely that the signing code makes a MD5 checksum (or similar) 
of the file when it is signed.
If the file is changed, checking the signing will re-calculate the 
checksum and find that it is different.  There isn't any info on what 
changed, just that SOMETHING changed.


On 10/6/2023 8:50 PM, Tilman Hausherr wrote:

On 06.10.2023 19:50, Marc Kaufman wrote:
I find myself debugging PDF files where Acrobat claims "Document has 
been altered or corrupted since it was signed." I would dearly love 
to see which objects belong to the last xref (color code is OK). Has 
anyone added that feature to PDF Debugger, or know where I can find 
one? Just comparing revisions is not enough, since sometimes the 
"changed" object is identical to the same object in the previous 
revision. 


I don't know of any. I research such questions the hard way, with 
NOTEPAD++.




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: how to replace MemoryUsageSetting.setupMixed(100mb) ?

2023-10-05 Thread John Lussmyer
Thanks, that does help.  Having an example means I'll find the relevant 
classes to use MUCH faster.


On 10/5/2023 3:07 PM, Pados Attila wrote:

I am using something like this:

PDDocument a1doc = Loader.loadPDF(new
RandomAccessReadBuffer(resourceAsStream), () -> new
ScratchFile(MemoryUsageSetting.setupMixed(100)));

(I use it with tempFileOnly, but the rest are the same)

On Thu, Oct 5, 2023 at 9:50 PM John Lussmyer  wrote:

I'm trying to update to the latest PDFBox 3.0.0.
The code was using a call to
loadPDF(file,MemoryUsageSetting.setupMixed(MB100); // 100 MB

I see that that no longer exists, but the only mention of it doesn't
seem to provide any info on how to configure an equivalent replacement?

Any suggestions?


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



how to replace MemoryUsageSetting.setupMixed(100mb) ?

2023-10-05 Thread John Lussmyer

I'm trying to update to the latest PDFBox 3.0.0.
The code was using a call to 
loadPDF(file,MemoryUsageSetting.setupMixed(MB100); // 100 MB


I see that that no longer exists, but the only mention of it doesn't 
seem to provide any info on how to configure an equivalent replacement?


Any suggestions?


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Optional Content Groups

2023-01-04 Thread John Lussmyer
Ah, thanks.  I hadn't noticed the "show internal structure" choice.
The tool I used to use just did that normally.  (I thought things looked a bit 
odd...)

From: Tilman Hausherr 
Sent: Wednesday, January 4, 2023 10:29 AM
To: users@pdfbox.apache.org
Subject: Re: Optional Content Groups

[EXTERNAL]
On 04.01.2023 19:22, John Lussmyer wrote:

I have a pdf with several Optional Content groups.

I can find their definitions in the Page/Resources/Properties dictionary, but I 
don't see how they are enabled or disabled.

Where is that controlled?

This is below the document root, use PDFDebugger to look at it (first click 
"view", "show internal structure"). To learn more, you'll have to read the PDF 
specification, although some can be understood by looking at the structure 
below. This is from the file at 
https://issues.apache.org/jira/browse/PDFBOX-5524<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPDFBOX-5524&data=05%7C01%7CJohn.Lussmyer%40efi.com%7C2147e215b0b340ca7e7008daee8194fe%7C3fe4532499b245c397517034bae71475%7C0%7C1%7C638084537602543858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=ykSglpshnvSasNgiDQmpJCt1T3t8JdYmD7Q7o9o69ow%3D&reserved=0>
 which has two "off" OCGs

[cid:image001.png@01D92029.9F0E9020]

[cid:image002.png@01D92029.9F0E9020]

Tilman

Confidentiality notice: This message may contain confidential information. It 
is intended only for the person to whom it is addressed. If you are not that 
person, you should not use this message. We request that you notify us by 
replying to this message, and then delete all copies including any contained in 
your reply. Thank you.


Optional Content Groups

2023-01-04 Thread John Lussmyer
I have a pdf with several Optional Content groups.
I can find their definitions in the Page/Resources/Properties dictionary, but I 
don't see how they are enabled or disabled.
Where is that controlled?
Confidentiality notice: This message may contain confidential information. It 
is intended only for the person to whom it is addressed. If you are not that 
person, you should not use this message. We request that you notify us by 
replying to this message, and then delete all copies including any contained in 
your reply. Thank you.


Re: Possible bug with FunctionType3?

2022-06-16 Thread John Lussmyer
I was able to get ahold of the customers PDF file - but it (of course) works 
just FINE for me on my system.
I have logs showing multiple identical failures for the customer - and lots of 
other files succeeding.
I'd really like to test your possible fix - but first I have to figure out how 
to reproduce the problem

On Tue Jun 14 21:06:02 PDT 2022 thaush...@t-online.de said:
>Am 15.06.2022 um 05:42 schrieb Tilman Hausherr:
>> float[] functionResult = function.eval(functionValues);
>>
>> eval is an abstract method, but I don't see how any of its
>> implementation would return null :-(   (but I just woke up)
>
>oops, the return of eval() is irrelevant here.
>
>Anyway, I fixed the possible bug below in
>https://issues.apache.org/jira/browse/PDFBOX-5459 , try a snapshot in an
>hour or two
>
>https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.0-SNAPSHOT/


--

Tigers prowl and Dragons soar in my dreams...

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Possible bug with FunctionType3?

2022-06-14 Thread John Lussmyer
We are using PDFBox to render various PDF files in our product.
One customer is having issues due to PDFBox throwing a NullPointerException 
when certain files are rendered. (No, I don't have copies of the files - yet)
Any ideas on what could cause this?

java.lang.NullPointerException: null
at 
org.apache.pdfbox.pdmodel.common.function.PDFunctionType3.eval(PDFunctionType3.java:123)
 ~[pdfbox.jar:?]
at 
org.apache.pdfbox.pdmodel.graphics.shading.PDShading.evalFunction(PDShading.java:410)
 ~[pdfbox.jar:?]
at 
org.apache.pdfbox.pdmodel.graphics.shading.PDShading.evalFunction(PDShading.java:393)
 ~[pdfbox.jar:?]
at 
org.apache.pdfbox.pdmodel.graphics.shading.AxialShadingContext.calcColorTable(AxialShadingContext.java:151)
 ~[pdfbox.jar:?]
at 
org.apache.pdfbox.pdmodel.graphics.shading.AxialShadingContext.(AxialShadingContext.java:128)
 ~[pdfbox.jar:?]
at 
org.apache.pdfbox.pdmodel.graphics.shading.AxialShadingPaint.createContext(AxialShadingPaint.java:62)
 ~[pdfbox.jar:?]
at sun.java2d.pipe.AlphaPaintPipe.startSequence(Unknown Source) ~[?:?]
at sun.java2d.pipe.SpanShapeRenderer$Composite.startSequence(Unknown 
Source) ~[?:?]
at sun.java2d.pipe.SpanShapeRenderer.renderSpans(Unknown Source) ~[?:?]
at sun.java2d.pipe.SpanShapeRenderer.fill(Unknown Source) ~[?:?]
at sun.java2d.pipe.ValidatePipe.fill(Unknown Source) ~[?:?]
at sun.java2d.SunGraphics2D.fill(Unknown Source) ~[?:?]
at org.apache.pdfbox.rendering.PageDrawer.shadingFill(PageDrawer.java:1234) 
~[pdfbox.jar:?]



I believe the version we are using is the 3.0.0-alpha2.

Confidentiality notice: This message may contain confidential information. It 
is intended only for the person to whom it is addressed. If you are not that 
person, you should not use this message. We request that you notify us by 
replying to this message, and then delete all copies including any contained in 
your reply. Thank you.


Possible PDFBox bug?

2022-03-17 Thread John Lussmyer
We have an app that can generate multi-page PDF Files.  We recently ran into a 
problem where the library we were using would keep ALL the pages in  memory.  
For a quick workaround we have it write out single-page PDF files, then use 
PDFBox to combine them.

We recently found a bug in the way that the pages get modified when combined 
into a single PDF.
When we generate the pages, sometimes the MediaBox starts at negative 
coordinates.  When PDFBox adds that page to a document, it offsets it by that 
negative amount - which moves the page content up and to the right.

Out page combining code looks like this.

try (PDDocument doc = new 
PDDocument(MemoryUsageSetting.setupTempFileOnly())) {
for (File pagFile : srcPages) {
log.debug("make: page {}", 
pagFile.getAbsolutePath());
PDPage page = new PDPage();
doc.addPage(page);

try (PDPageContentStream contents = new 
PDPageContentStream(doc, page)) {

try (PDDocument sourceDoc = 
Loader.loadPDF(pagFile, MemoryUsageSetting.setupTempFileOnly())) {
PDPage srcPage = 
sourceDoc.getPage(0);

page.setUserUnit(srcPage.getUserUnit());

page.setMediaBox(srcPage.getMediaBox());

page.setCropBox(srcPage.getCropBox());

page.setTrimBox(srcPage.getTrimBox());

// Create a Form XObject from 
the source document using LayerUtility
LayerUtility layerUtility = new 
LayerUtility(doc);
PDFormXObject form = 
layerUtility.importPageAsForm(sourceDoc, 0);
// draw the full form
contents.drawForm(form);
}
}
}

doc.save(outPDF);
}

The original Page pdf has a TrimBox[0,0,1296,864], MediaBox[-72,-72,1368,936]
The page in the PDFBox combined output has the same TrimBox and MediaBox, BUT 
the /Form1 it uses to place the contents has a BBox[-72,-72,1368,936] and a 
Matrix[1,0,0,1,72,72].
I'm not sure why it's adding a Matrix to offset the content.

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Problem with text extraction

2022-01-23 Thread John Lussmyer
On Sun Jan 23 10:02:08 PST 2022 rc...@pobox.com said:
>I am using PDFBox's PDFTextStripper.getText() for a particular kind of
>PDF file generated by a government agency, and the text I'm getting does
>not match that displayed by Acrobat Reader for the same files. The
>getText() calls occasionally get characters Reader does not display, and
>in one case getText() gets an "O" instead of the "U" displayed by
>Reader. I would like to know if there's some way I can get same text as
>Reader displays.

Have you checked for embedded Fonts in the PDF?  It's quite possible to have 
fonts where the code for "A" is NOT the save as the ASCII "A".


--

Worlds only All Electric F-250 truck! 
http://john.casadelgato.com/Electric-Vehicles/1995-Ford-F-250



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: memory requirements when merging PDF files?

2022-01-07 Thread John Lussmyer
On Fri Jan 07 08:55:38 PST 2022 ke...@trumpetinc.com said:
>If you use the temporary file memory storage, it should be possible to work
>with very large files.

Thanks, I was hoping there was some way to deal with this case.

I just ran a quick test, generating a 2000 page PDF by placing a 1 page PDF on 
each output page.
Using  LayerUtility & PDFFormXObject as the real usage will involve placing 
multiple small PDFs on a large page, for many large pages.
The 1 page PDF was 291K, the resulting 2000 page pdf was 168MB.
(I was doing gc() just before reporting the usage.)
Doing it all in memory:
7m 38s, and peaked at 424MB in use. 
with the setTempFileOnly on the output document:
7m 1s, 292MB.



--

Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

memory requirements when merging PDF files?

2022-01-06 Thread John Lussmyer
I have a need to merge a couple thousand PDF's into one humongous PDF.
The old tool we use for PDF manipulation runs out of memory as it builds the 
result PDF in memory, and only writes it out when done.

Can PDFBox do something more like streaming the output as it's built?  or even 
not load all the source pdf content streams until needed for output?


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-13 Thread John Lussmyer
On Thu Sep 09 10:10:52 PDT 2021 thaush...@t-online.de said:
>In theory one could make separate rendering hints for fonts and for
>ordinary vectors, but that would be messy and hard to understand. (And
>who knows whether it will work for your file)
>
>I recommend that you try doing this yourself by downloading the source
>code and changing PageDrawer and put some hard-coded modifications.
>Search for "graphics.".

I was able to find a bit of time to take a look at this.
I experimented in PageDrawer.drawGlyph and found that I get pretty close to the 
old renderer if I changed the RenderingMode to STROKE instead of FILL.
As far as I can tell, the FILL comes from the PDF interpreting code.  For my 
use of generating tiny thumbnails, I added the following code.  Not sure if it 
would ever be useful to anyone else.  If it might, I'd need to clean it up a 
bit and do whatever is needed to fit with the project style and conventions.

--I added this to my PageDrawer.java

public static class BoxKey extends Key {

public static BoxKey KEY_TEXTHINT = new BoxKey(1984);

private BoxKey( final int privatekey) {
super(privatekey);
}

@Override
public boolean isCompatibleValue(final Object val) {
boolean isvalid = false;
try {
RenderingMode.valueOf((String) val);
isvalid = true;
} finally {
}
return isvalid;
}

}

private RenderingMode textRenderModeHint = null;

-- then in the constructor, added

if (renderingHints.containsKey(BoxKey.KEY_TEXTHINT)) {
textRenderModeHint = RenderingMode.valueOf((String) 
renderingHints.get(BoxKey.KEY_TEXTHINT));
}

-- and in drawGlyph, changed renderingMode to be:

RenderingMode renderingMode = (textRenderModeHint != null) ? 
textRenderModeHint : state.getTextState().getRenderingMode();

-- and finally, where I actually use the renderer, I added:

hintlist.put(BoxKey.KEY_TEXTHINT, "STROKE");



--

Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-09 Thread John Lussmyer
On Wed Sep 08 20:31:47 PDT 2021 thaush...@t-online.de said:
>Ooops, you didn't mention that you turned antialiasing off. The image
>looks as if interpolation was also turned off. If you set rendering
>hints you always have to set all the hints you need. Here's the default:
>
>    private RenderingHints createDefaultRenderingHints(Graphics2D graphics)
>    {
>    RenderingHints r = new RenderingHints(null);
>    r.put(RenderingHints.KEY_INTERPOLATION, isBitonal(graphics) ?
>RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR :
>    RenderingHints.VALUE_INTERPOLATION_BICUBIC);
>    r.put(RenderingHints.KEY_RENDERING,
>RenderingHints.VALUE_RENDER_QUALITY);
>    r.put(RenderingHints.KEY_ANTIALIASING, isBitonal(graphics) ?
>RenderingHints.VALUE_ANTIALIAS_OFF :
>RenderingHints.VALUE_ANTIALIAS_ON);
>    return r;
>    }

So, setting one Rendering Hint discards all default values?  What does it use 
for those others then?

Just tried with this set:
hintlist.put(RenderingHints.KEY_ANTIALIASING, 
RenderingHints.VALUE_ANTIALIAS_OFF);
hintlist.put(RenderingHints.KEY_TEXT_ANTIALIASING, 
RenderingHints.VALUE_TEXT_ANTIALIAS_OFF);
hintlist.put(RenderingHints.KEY_INTERPOLATION, 
RenderingHints.VALUE_INTERPOLATION_BICUBIC);
hintlist.put(RenderingHints.KEY_RENDERING, 
RenderingHints.VALUE_RENDER_QUALITY);
hintlist.put(RenderingHints.KEY_FRACTIONALMETRICS, 
RenderingHints.VALUE_FRACTIONALMETRICS_ON);

I also tried a variation with VALUE_INTERPOLATION_NEAREST_NEIGHBOR.

No change.  Still looks like random pixels scattered on the page.


--

Tigers prowl and Dragons soar in my dreams...

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
You can see the diference in the results in the images I used in a 
StackOverflow posting. (Before I remembers this email list.)

https://stackoverflow.com/questions/69107975/how-to-improve-text-contrast-in-pdfbox-rendered-thumbnail-image

On Wed Sep 08 12:40:14 PDT 2021 cou...@casadelgato.com said:
>On Wed Sep 08 12:20:59 PDT 2021 thaush...@t-online.de said:
>>Am 08.09.2021 um 21:16 schrieb John Lussmyer:
>>> Ok, just tried that - no change.
>>>
>>> We are currently trying PDFBox 3.0.0-RC1 - is that a problem?
>>
>>No, this is excellent; there will be a new release of another beta in a
>>few days. You can try it here
>>
>>https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-alpha2/
>>
>>Is the PDFBox code creating the same image size as the old code? What
>>code are you using? Can you share a file and the result (upload to
>>sharehoster)?
>
>Image size is within a couple pixels in.  Same format (ARGB).  The the older 
>renderer image file is about twice as many bytes of image file.
>Can't really share the original as it has some names and account numbers.
>The main feature of the PDF is that it is pure text, no images.  8.5 x 11.
>
>--
>
>Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/
>-
>To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>For additional commands, e-mail: users-h...@pdfbox.apache.org


--

Worlds only All Electric F-250 truck! 
http://john.casadelgato.com/Electric-Vehicles/1995-Ford-F-250



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
On Wed Sep 08 12:20:59 PDT 2021 thaush...@t-online.de said:
>Am 08.09.2021 um 21:16 schrieb John Lussmyer:
>> Ok, just tried that - no change.
>>
>> We are currently trying PDFBox 3.0.0-RC1 - is that a problem?
>
>No, this is excellent; there will be a new release of another beta in a
>few days. You can try it here
>
>https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-alpha2/
>
>Is the PDFBox code creating the same image size as the old code? What
>code are you using? Can you share a file and the result (upload to
>sharehoster)?

Image size is within a couple pixels in.  Same format (ARGB).  The the older 
renderer image file is about twice as many bytes of image file.
Can't really share the original as it has some names and account numbers.
The main feature of the PDF is that it is pure text, no images.  8.5 x 11.


--

Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
Ok, just tried that - no change.

We are currently trying PDFBox 3.0.0-RC1 - is that a problem?

On Wed Sep 08 11:55:56 PDT 2021 thaush...@t-online.de said:
>The default rendering is high quality oder speed, although there is one
>obscure option you could try,
>PDFRenderer.setImageDownscalingOptimizationThreshold(0). And make sure
>you're using the latest version (2.0.24).
>No there is no option to prioritize text pixels.


--

Bobcats and Cougars, oh my!  http://john.casadelgato.com/Pets

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
We are trying to switch to using PDFBox to create the thumbnail images of PDF 
Pages in our application. (The older product we currently use fails on OS 11).

I'm running into a problem if there is text on the page, the thumbnail image 
makes it hard to make any sense at all of the text. (yes, these are thumbnails, 
and don't need to be readable - but should be recognizeable.)

The older renderer created thumbnails that, while the text ws not legible, it 
was definitely visible, and you could tell the shapes of the words.

PDFBox creates thumbnails where the text is more of a scattering of random 
pixels, an you have to guess that it might be text.

Is there any way when generating the raster to have it treat Text pixels as the 
higher priority for the color of a pixel?  It seems to be only allowing the 
text color to control the pixel if the entire pixel is part of the character.



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Parsing huge PDF (400Mb, 2700 pages)

2019-11-14 Thread John Lussmyer
On Thu Nov 14 08:32:20 PST 2019 sahy...@fileaffairs.de said:
>well - PDF ist not really easily streamable as
>
>- it's organized as a random access format
>- the refernce table about the objects forming the PDF is at the end of the 
>file to you have to read the last parts first and
>then move back

While the PDF file itself can't be usefully streamed, the CONTENT streams can 
be.
Those are usually 99.99% of the file size.


--

Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Exact PDF text - add it back as an annotation

2019-11-01 Thread John Lussmyer
On Tue Oct 29 21:59:57 PDT 2019 thaush...@t-online.de said:
>IIRC tesseract can do this. Not as annotation, but as invisible font.

As far as I can tell, it does it the same way that other programs do.
It's added to the content stream, mixed with all the commands for positioning, 
font size, etc...  Words are often broken up.
I'm looking for something that just embeds the plain text, with NO markup.


--

Bobcats and Cougars, oh my!  http://john.casadelgato.com/Pets

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Exact PDF text - add it back as an annotation

2019-10-29 Thread John Lussmyer
I have a bunch of PDF files that have had an OCR package run against them.
The problem is that it adds the text to the normal Page content, and tries to 
position the recognized text at the location in the image it was found.
So the text is mixed with lots of positioning, etc..  information.
I'd like to extract all the text as a block of text, and just add it all as a 
single item.  Probably an annotation.
There are lots of tools to extract text from a PDF - but they are all web 
based, or use a GUI to do one file at a time.
I want to just run this against a directory full of PDF's and have it do all of 
them.

Anyone know of such a tool?  Have one written?


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: PDFRendering

2016-06-27 Thread John Lussmyer
On Mon Jun 27 14:34:03 PDT 2016 j...@jahewson.com said:
>Right, and if it was a leak then system.gc would not have fixed it. 

That is only SOMETIMES true.  I've run into "memory leaks" where the leak was 
uncleared references to objects.  So the old objects just hung around forever.


--

Bobcats and Cougars, oh my!  http://john.casadelgato.com/Pets

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Call java file from PDF

2016-02-14 Thread John Lussmyer
So, install a Java app that starts when the system boots.
It listens on a port, then your PDF can use http://localhost:789/didlysquat to 
submit the form to your java app.
You java app doesn't need to be a full web server, just listen on the 
appropriate port and catch the data.

On Sun Feb 14 13:34:20 PST 2016 bigal...@gmail.com said:
>Yeah all the Java runtimes Ave virtual machines are installed I tested
>it.
>On 15/02/2016 10:23 am, "Olaf Drümmer"  wrote:
>
>> But you have the rights to install a Java program?
>>
>> Olaf
>>
>> > On 14.02.2016, at 21:40, Al Grant  wrote:
>> >
>> > I would not have the permission rights to install a web server :(
>> > On 15/02/2016 9:27 am, "John Lussmyer"  wrote:
>> >
>> >> On Sun Feb 14 12:15:12 PST 2016 bigal...@gmail.com said:
>> >>> Thank you for both your answers.
>> >>>
>> >>> The html is very appealing,  but what I did not mention is in working
>> >>> within a rather rigid IT environment.
>> >>>
>> >>> I won't be able to install a html server. So back to Java executable
>> >> (which
>> >>> I can use) unless there is a better way?
>> >>
>> >> You can't even have your app running on the local machine?
>> >> IT can be your html server.  (Use a non-standard port, and ignore any
>> >> requests that aren't EXACTLY what you are expecting.)
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Bobcats and Cougars, oh my!  http://john.casadelgato.com/Pets
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
>> >>
>>
>>
>> -
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>
>>


--

Tigers prowl and Dragons soar in my dreams...

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Call java file from PDF

2016-02-14 Thread John Lussmyer
On Sun Feb 14 12:15:12 PST 2016 bigal...@gmail.com said:
>Thank you for both your answers.
>
>The html is very appealing,  but what I did not mention is in working
>within a rather rigid IT environment.
>
>I won't be able to install a html server. So back to Java executable (which
>I can use) unless there is a better way?

You can't even have your app running on the local machine?
IT can be your html server.  (Use a non-standard port, and ignore any requests 
that aren't EXACTLY what you are expecting.)



--

Bobcats and Cougars, oh my!  http://john.casadelgato.com/Pets

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Creating a page from a block of CCITTG42D data?

2015-02-19 Thread John Lussmyer
In this case I'm converting some proprietary image files from a program I wrote 
18 years ago.

On Thu Feb 19 13:41:30 PST 2015 thaush...@t-online.de said:
>Glad it works. Where did you get the raw G4 files from / is this
>something that you think might be useful to many, or was it just
>something unique for you? I'm just wondering if I should add such code
>to the 2.0 or 2.1 version.
>
>Tilman
>
>Am 19.02.2015 um 19:09 schrieb John Lussmyer:
>> On Wed Feb 18 23:34:09 PST 2015 thaush...@t-online.de said:
>>> Assuming you are using 1.8.8, put the ccitt stream into a PDStream
>>> object, then call the PDCcitt constructor with that PDStream.
>>>
>>> PDStream pd =new PDStream(doc, new
>>> ByteArrayInputStream(data), true);
>> 
>>
>> Thanks, that worked! (with a few tweaks and typo corrections of course!)
>>
>> --
>>
>> Try my Sensible Email package!  
>> https://sourceforge.net/projects/sensibleemail/
>>
>>
>> -
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org


--

Worlds only All Electric F-250 truck! 
http://john.casadelgato.com/Electric-Vehicles/1995-Ford-F-250



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Creating a page from a block of CCITTG42D data?

2015-02-19 Thread John Lussmyer
On Wed Feb 18 23:34:09 PST 2015 thaush...@t-online.de said:
>Assuming you are using 1.8.8, put the ccitt stream into a PDStream
>object, then call the PDCcitt constructor with that PDStream.
>
>PDStream pd =new PDStream(doc, new
>ByteArrayInputStream(data), true);


Thanks, that worked! (with a few tweaks and typo corrections of course!)

--

Try my Sensible Email package!  https://sourceforge.net/projects/sensibleemail/

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Creating a page from a block of CCITTG42D data?

2015-02-18 Thread John Lussmyer
So, I have a block of data (byte[]) that represents a scanned image, compressed 
using CCITTG4.
I'm new to PDFBox. (of course)

So far, I haven't been able to figure out how I can create a page that consists 
of just that image.
All the examples want to read the image from a file, and decompress it.
Since I already have it as a compressed block, I'd prefer to just use it as-is.
The last time I did much work with PDF's, I was working directly with the 
dictionaries.  I don't see a simple way of even getting to the Page dictionary 
in PDFbox.

Anyone have a suggestion on how to do this?

--
Worlds only All Electric F-250 truck! 
http://john.casadelgato.com/Electric-Vehicles/1995-Ford-F-250


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org