Re: AW: Question about commit "PDFBOX-5660: add warning / exception, as suggested by mkl in SO 78307200"

2024-05-18 Thread Tilman Hausherr
-Ursprüngliche Nachricht- Von: Tilman Hausherr Gesendet: Freitag, 17. Mai 2024 11:37 An: users@pdfbox.apache.org Betreff: Re: Question about commit "PDFBOX-5660: add warning / exception, as suggested by mkl in SO 78307200" I've created https://issues.apache.org/jira/browse/P

Re: Question about commit "PDFBOX-5660: add warning / exception, as suggested by mkl in SO 78307200"

2024-05-17 Thread Tilman Hausherr
I've created https://issues.apache.org/jira/browse/PDFBOX-5822 . This isn't in the released versions, correct? Then we have to thank you even more for discovering this. Tilman - To unsubscribe, e-mail:

Re: Question about commit "PDFBOX-5660: add warning / exception, as suggested by mkl in SO 78307200"

2024-05-17 Thread Tilman Hausherr
Hi, Thanks for finding this. Sadly there were no tests. I'll investigate. Tilman On 17.05.2024 10:25, pascal.schumac...@t-systems.com wrote: Hi, concerning commit: "PDFBOX-5660: add warning / exception, as suggested by mkl in SO 78307200"

Re: Error Log related question

2024-05-15 Thread Tilman Hausherr
Hi, Isn't it possible to do this with commons-logging or with the actual logging that you're using? If you tell what error messages you get we may be able to tell you what the problem is. Tilman On 15.05.2024 15:28, Tony Pilote wrote: Hello, We have been using PDFBox and would have

Re: Question Regarding Removal of PositionWrapper and TextNormalize Classes

2024-05-14 Thread Tilman Hausherr
wrote: Hi , Do we have any alternative for TextNormalize. As we are using it in our code for a long and we are upgrading to PdfBox 2.0.30. Thanks, Krishna On Tue, 14 May 2024 at 12:42 PM, Tilman Hausherr wrote: Hi, PositionWrapper is still in the code. However it's private. This and the removal

Re: Question Regarding Removal of PositionWrapper and TextNormalize Classes

2024-05-14 Thread Tilman Hausherr
Hi, PositionWrapper is still in the code. However it's private. This and the removal of TextNormalize was done 10 years ago in https://issues.apache.org/jira/browse/PDFBOX-2384 before 2.0.0, so you're a bit late. Nobody can tell you what to do because we don't know why you needed it. Tilman

Re: IllegalStateException are thrown by surrogate pair character.

2024-05-07 Thread Tilman Hausherr
"3.0.3-20240505.072852-59" and got the expected results! I also tried a few other Kanji characters besides "鸽" and none of them had any problems! I am glad I could contribute :) 2024年5月5日(日) 16:32 Tilman Hausherr : Hello Toshiaki, It's been committed and available as a snapsh

Re: Radio Button not set correctly

2024-05-06 Thread Tilman Hausherr
On 06.05.2024 15:53, Martin Resch wrote: sorry, PDF attached You need to upload to a sharehoster Tilman

Re: IllegalStateException are thrown by surrogate pair character.

2024-05-05 Thread Tilman Hausherr
, Toshiaki Ito wrote: Hi, Tilman. Thank you for checking and correcting the attached code. I look forward to waiting for it to be committed! 2024年5月5日(日) 2:05 Tilman Hausherr: Hello, I can confirm that your proposed change works, it also passes the "private" tests that aren't in the reposit

Re: IllegalStateException are thrown by surrogate pair character.

2024-05-04 Thread Tilman Hausherr
for tests (ipafont) has the glyph, I have prepared a small test also based on your code. Tilman On 04.05.2024 16:39, Tilman Hausherr wrote: On 04.05.2024 15:21, Toshiaki Ito wrote: By the way, with pdbox 2.0.31, the same code produces the expected output. Ouch, I can confirm that. I ha

Re: IllegalStateException are thrown by surrogate pair character.

2024-05-04 Thread Tilman Hausherr
On 04.05.2024 15:21, Toshiaki Ito wrote: By the way, with pdbox 2.0.31, the same code produces the expected output. Ouch, I can confirm that. I have created a new ticket: https://issues.apache.org/jira/browse/PDFBOX-5812 Tilman

Re: IllegalStateException are thrown by surrogate pair character.

2024-05-04 Thread Tilman Hausherr
Hi, Is it this one? 鸽 According to my understanding of https://www.compart.com/en/unicode/U+29E3D you should use \u29E3D  or 鸽 directly. However I tried this with your font and with MingLiU and MS Mincho and it didn't work either. Is this a very standard glyph? Or something unusual? So I

Re: possible regression in PDFBox 3.0.2

2024-05-03 Thread Tilman Hausherr
On 03.05.2024 10:40, Kai Keggenhoff wrote: Since we switched to 3.0.2 (from 3.0.0, we skipped 3.0.1) we encountered several PDFs which produce an IOException when saved : Please try with a 3.0.3 snapshot

Re: Problem finding an AcroForm field

2024-05-02 Thread Tilman Hausherr
It's a radio button but without the radio flag?! Tilman On 02.05.2024 12:42, Ulf Dittmer wrote: Yes, that's the one for the "pro Stunde" option. But the one for the "pro Monat" option is missing. They're both connected, in that checking one manually will uncheck the other. But setting *any*

Re: How to remove an image resource from a PDF form

2024-04-26 Thread Tilman Hausherr
tResources( resources );     } And now the PDF doesn't seem to be growing any more. Thanks, Jurgen On Fri, 26 Apr 2024 14:58:26 +0200, Tilman Hausherr wrote: Do you save directly or incrementally? If directly then the old one should be gone. If not, please share the PDF (upload to s

Re: How to remove an image resource from a PDF form

2024-04-26 Thread Tilman Hausherr
Do you save directly or incrementally? If directly then the old one should be gone. If not, please share the PDF (upload to sharehoster) and tell us why you think it's still there. Tilman On 26.04.2024 12:37, Jurgen Doll wrote: Hi I would like to know how to remove an image resource from a

Re: Performance advice

2024-04-06 Thread Tilman Hausherr
Is the image already compressed, e.g. PNG, JPEG and b/w TIFF? Then use the image directly because PDFBox can use these formats without doing a compression, if you use the static methods from PDImageXObject. Or is the image in memory, or from a different format (e.g. color CCITT, GIF)? Then

Re: Blank page generation issue - Unknown code in Huffman RLE stream

2024-04-04 Thread Tilman Hausherr
2.0.19 is 4 years old, why are you using it? Please retry with 2.0.31. I tried and your file works, despite that it is broken. Tilman On 04.04.2024 06:13, Himanshu Jain wrote: Hello Team, We are using pdf-box to generate images of each page of the pdf. While generating images we are getting

Re: Text extraction from a certain PDF does not seem to terminate

2024-04-03 Thread Tilman Hausherr
The document has been extracted while I had dinner, so there is no endless loop. I've created https://issues.apache.org/jira/browse/PDFBOX-5799 Tilman On 03.04.2024 18:12, Tilman Hausherr wrote: Rendering page 230 with PDFBox 2.0: 50 seconds Rendering page 230 with PDFBox trunk: 2990 seconds

Re: Text extraction from a certain PDF does not seem to terminate

2024-04-03 Thread Tilman Hausherr
Rendering page 230 with PDFBox 2.0: 50 seconds Rendering page 230 with PDFBox trunk: 2990 seconds Rendering page 231 with PDFBox trunk: 4798 seconds while I write this, page 230 has been extracted, it is now working on page 231 Tilman

Re: Not able to to watermark on PDFs with PDF version 1.7

2024-04-03 Thread Tilman Hausherr
Hi, Please use the 5-parameter constructor of PDPageContentStream. If it still doesn't work, please share the file and the result (upload to sharehoster). There is no version 2.3.1, maybe you meant 2.0.31? Tilman On 03.04.2024 09:26, Palaniappan RM wrote: Hi team,   I am using *pdfbox*

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-25 Thread Tilman Hausherr
schrieb Tilman Hausherr: Here they are, remove the XXX https://corpora.tika.apache.org/XXXbase/docs/govdocs1/433/433525.pdf https://corpora.tika.apache.org/XXXbase/docs/commoncrawl3/O2/O226ORR4SMIKRGPWC6PXUYAYMSBB6FVP https://corpora.tika.apache.org/XXXbase/docs/commoncrawl3/R4

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-24 Thread Tilman Hausherr
The extension p1 / p3 means I split these files and used only one page for my own tests. Tilman On 24.03.2024 16:19, Andreas Lehmkühler wrote: Am 15.03.24 um 05:35 schrieb Tilman Hausherr: You are correct that it's the "fb" parts that are missing. (And some of the other tools you tried al

Re: split a password protected file

2024-03-21 Thread Tilman Hausherr
On 21.03.2024 18:59, Robert Rodini wrote: Does this mean that splitting a password protected PDF effectively disables password protection? On the result files, yes. I've never thought about it. To fix this, we'd need the user and the owner password (only one of the two is needed to

Re: split a password protected file

2024-03-20 Thread Tilman Hausherr
On 20.03.2024 16:24, Robert Rodini wrote: Can PDFSplit split up a password-protected file.? It seems that it cannot, but there is no error message. P.S. I am using v. 2.x of PDFBox. I will upgrade soon. According to the usage, it should be able to (although it won't encrypt when saving):

Re: Flatten using PDFBOX3

2024-03-19 Thread Tilman Hausherr
Hi, If this happened with 3.0.0 or 3.0.1 please retry with 3.0.2. If not, then please find a non confidential file where that happens. Also make sure that src and dst are different files. Tilman On 19.03.2024 15:26, Frédéric Ravetier wrote: Hello, I am trying to Flatten a PDF using

Re: Help with NullPointerException org.apache.io.IOUtils.LOG

2024-03-15 Thread Tilman Hausherr
Searching for the error message I found this in a comment: https://stackoverflow.com/questions/69151291/java-16-modularisation-illegalaccessexception-java-nio-spring-boot |--add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/jdk.internal.ref=ALL-UNNAMED| Tilman On 15.03.2024

Re: AFMParser optimization

2024-03-15 Thread Tilman Hausherr
Hi, Thank you, done. Tilman On 15.03.2024 14:49, Guillaume Maillrd wrote: Hi, During a profiling session of my application, I found something that could interest you. To speedup the AFMParser (50% gain), the "equals" in parseCharMetric should be written in this order ( order of top 5

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-15 Thread Tilman Hausherr
uot;<0100> 256" could be a problem... PS.: The use of "true" was just a fast and dirty way to do a fast test, as the beginbfchar/endbfchar block suggested to me an identity mapping. Em sex., 15 de mar. de 2024 às 01:35, Tilman Hausherr escreveu: You are correct that

Re: Bugfix for FileSystemFontProvider

2024-03-15 Thread Tilman Hausherr
Hi, Yeah, "never happens" is a red flag. That part has been changed to use CRC32: https://svn.apache.org/viewvc/pdfbox/branches/2.0/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1916176=markup#l923 https://issues.apache.org/jira/browse/PDFBOX-5727

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-14 Thread Tilman Hausherr
think about? PS.: I've read some pieces from ISO 32000-2:2020 but it is quite long. Maybe I'm missing something... I'm sorry if this is the case... Em qui., 14 de mar. de 2024 às 10:30, Luiz Marcelo Modesto < lmodesto.w...@gmail.com> escreveu: Ok! I'll read PDFBOX-5540 and related issues. Thank you v

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-14 Thread Tilman Hausherr
shows some output from PDF Debugger and others. I'm sorry, I sent the pdf file as an attachment in my first message, but I didn't know that it wouldn't work. Em qui., 14 de mar. de 2024 às 07:16, Tilman Hausherr escreveu: Hi, please upload your file to a sharehoster. Tilman On 13

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-14 Thread Tilman Hausherr
Hi, please upload your file to a sharehoster. Tilman On 13.03.2024 20:03, Luiz Marcelo Modesto wrote: Hi everyone,     I'm not sure if this is the same as FAQ "How come I am getting gibberish(G38G43G36G51G5) when extracting text?"...     I'm using PDFBox version 3.0.1 and OpenJDK Runtime

Re: pdfbox 3.0.2 release ?

2024-03-08 Thread Tilman Hausherr
Around the end of next week if there are no last minute surprises. Tilman On 08.03.2024 16:07, Frédéric Ravetier wrote: Hello, Do you have an idea of when 3.0.2 will be released? Have a good day, Fred - To unsubscribe,

Re: OutOfMemoryException in FileSystemFontProvider (pdfbox v2.0.30)

2024-03-07 Thread Tilman Hausherr
Hello Kim, You're welcome, please open a ticket and include your proposed solution. I have approved your registration. (I initially denied it because your text had no details whatsover) Tilman On 07.03.2024 14:08, Kim Hagedorn wrote: Hello I originally wanted to submit a defect to the

Re: Feature request for filtering TextPosition in PDFTextStripperByArea and PDFTextStripper

2024-03-05 Thread Tilman Hausherr
I think I did something similar in 2018 that you might use, see the FilteredTextStripper class in ExtractText.java . That one only extracts text with angle 0. /**  * TextStripper that only processes glyphs that have angle 0.  */ class FilteredTextStripper extends PDFTextStripper {    

Re: Adding Annotations to Signed PDF Causes Signatures To Appear Invalid

2024-02-27 Thread Tilman Hausherr
Hi, You're using an ordinary save(). The signature will no longer work because the signed file segment has changed. You need to use saveIncremental(). Use the method that takes a list of COSDictionaries. And remove the showPageNo() part, I assume Adobe will not like that because you're

Re: Issue with PDFBox 3.0.0 - Unable to Extract and Add Pages

2024-02-27 Thread Tilman Hausherr
Hi, It's like Fabian said. Btw neither the code here nor the different(!) code in https://stackoverflow.com/questions/78065676/ would enable anybody to reproduce such a bug because it's incomplete. Until we get this fixed, please stay with 2.0.* (2.0.30 is the current version), and also

Re: AW: Importing landscape format and portrait format oriented pages into the same PDF causes PDF corruption

2024-02-23 Thread Tilman Hausherr
On 21.02.2024 16:07, Fabian Zünd SI-Solutions Gmbh wrote: Hello I manged to try it all out with the Most current build pdfbox-app-3.0.2-20240221.085334-88.jar The issue persists. Maybe i'm doing the copying of the page completely wrong? Hi, You did nothing wrong. Sadly, this is the problem

Re: How to find coordonnates of word and apply a mask

2024-02-12 Thread Tilman Hausherr
this text. Le lun. 12 févr. 2024 à 19:14, Tilman Hausherr a écrit : It depends what you want to get. See the DrawPrintTextLocations.java example which shows several strategies to get the bounding boxes of individual glyphs and draw them on the screen (not in a PDF, so the Y coordinate

Re: How to find coordonnates of word and apply a mask

2024-02-12 Thread Tilman Hausherr
It depends what you want to get. See the DrawPrintTextLocations.java example which shows several strategies to get the bounding boxes of individual glyphs and draw them on the screen (not in a PDF, so the Y coordinate is different). You would have to adjust the "Rectangle2D.Float" code to

Re: 遇到一个无法解决的bug

2024-02-05 Thread Tilman Hausherr
Hello, Please explain your problem in englisch and mention what PDFBox version you are using. Apparently it's about text extraction, read this first: https://pdfbox.apache.org/3.0/faq.html#how-come-i-am-getting-gibberish(g38g43g36g51g5)-when-extracting-text%3F Try extracting your test with

Re: JUnit5 Compile Dependency

2024-02-02 Thread Tilman Hausherr
Hi, Sorry about that, this has already been reported and 3.0.2 won't have this problem. https://issues.apache.org/jira/browse/PDFBOX-5722 Tilman On 02.02.2024 15:26, Willy Mwangi wrote: Hello there, We have experienced a bug with version 3.0.1 of PDFBOX whereby it comes with a compile

Re: Loading a PDF using InputStream

2024-02-01 Thread Tilman Hausherr
P.S.: thank you for having investigated and reported this! Tilman On 01.02.2024 16:06, Tilman Hausherr wrote: Oh. I had looked at the trunk and not at 3.0. That was likely a mistake in refactoring. Fixed in  https://issues.apache.org/jira/browse/PDFBOX-5757 and you get get a snapshot here

Re: Loading a PDF using InputStream

2024-02-01 Thread Tilman Hausherr
On 01.02.2024 15:25, Lars Juel Jensen wrote: That is weird.. The source file I am looking at for version 3.0.1 does not pass it: --> https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91 On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr wr

Re: Modifying the order of AcroForm Fields and/or associated Widget Annotations...

2024-01-31 Thread Tilman Hausherr
On 31.01.2024 16:50, Dwayne Parks wrote: I'll post them on a shared file site and provide the links here, if it would be helpful.  Do you have any recommendations for such a site?  Thanks! In the past I used filedropper.com but it doesn't seem to work anymore. Try google drive if you have a

Re: Loading a PDF using InputStream

2024-01-31 Thread Tilman Hausherr
create a scenario to reproduce this? Preferably without using a container. Tilman On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr wrote: On 31.01.2024 14:48, Lars Juel Jensen wrote: This creates another problem for me. I am running PDFBox in a kubernetes cluster on premises with limited

Re: Filling a form advice

2024-01-31 Thread Tilman Hausherr
Hello Nicola, Please upload your PDF to a sharehoster, attachments are removed. showTextWithPositioning is for horizontal positioning of individual glyphs, it is the "way to specify a string with some, I don't know, offset between the chars". (or vertical, if it is a vertical font) it might

Re: pdfbox 3.x, is it recommended to include jai-imageio when I am already using twelvemonkeys?

2024-01-31 Thread Tilman Hausherr
You should use all of these (including the jai-imageio-corewhich is required for jpeg2000) except the one for tiff. That one isn't needed but you can if you are creating TIFF files. It is not needed for decoding CCITT content in PDF files. (However our CCITT encoder / decoder is copied from

Re: Loading a PDF using InputStream

2024-01-31 Thread Tilman Hausherr
then it should work with PDFBox 3. Tilman On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr wrote: On 31.01.2024 09:50, Lars Juel Jensen wrote: In PDFBox2 I could do: PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly()) But there is no equivalent to this in PDFBox3. How do I read

Re: Loading a PDF using InputStream

2024-01-31 Thread Tilman Hausherr
On 31.01.2024 09:50, Lars Juel Jensen wrote: In PDFBox2 I could do: PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly()) But there is no equivalent to this in PDFBox3. How do I read a PDF from an inputstream? |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),

Re: Fwd: Help with Incorrect Identity-H Mapping

2024-01-30 Thread Tilman Hausherr
d my issue, that would be amazing, but unfortunately I can't get it to work. On 2024/01/30 16:01:56 Tilman Hausherr wrote: Also try changing the line cs.showText("äöüß"); to String s = "äöüß"; System.out.println(s.length()); cs.showText(s); the output on the c

Re: Fwd: Help with Incorrect Identity-H Mapping

2024-01-30 Thread Tilman Hausherr
Also try changing the line cs.showText("äöüß"); to String s = "äöüß"; System.out.println(s.length()); cs.showText(s); the output on the console should be 4. If suspect your output will be 8 if my theory is correct. Tilman

Re: Fwd: Help with Incorrect Identity-H Mapping

2024-01-30 Thread Tilman Hausherr
Hello Gino, Please tell whether it happens with every font or only with that one. And check whether the encoding in the source code is the same passed to the javac compiler. I suspect your file is UTF8 but the java compiler expects a single byte font. It works for me, I just tested it:    

Splitting PDF while keeping document structural information

2024-01-29 Thread Tilman Hausherr
The trunk now supports splitting the structure tree. Please test it and report any problems. https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/ If you're a JIRA user, you can also make your comments here:

Re: The annotations generated by PDFBOX cannot be displayed in the browser, but they can be displayed in adobe pdf reader

2024-01-25 Thread Tilman Hausherr
Hi, Please include more of your code. It does not show how this PDAnnotationFreeText is created, and whether you called *constructAppearances()* on it. Also upload your PDF to a sharehoster, and mention what PDFBox version you're using. Tilman On 26.01.2024 07:39, Tam chilun wrote: Dear

Re: potential issue in fontbox component CmapSubtable

2024-01-17 Thread Tilman Hausherr
Hi, I hope I'm not wrong on this, but if the second element is true (glyphIdToCharacterCode == null) then the third one wouldn't be evaluated, because there's no need. (short circuit evaluation) Look at https://issues.apache.org/jira/browse/PDFBOX-5465 , the stack trace looks just like

Re: Cannot get overlaypdf working on command line interface

2024-01-15 Thread Tilman Hausherr
Hi, Sorry, it turns out there was a second bug, which has now been fixed. And this time I tested myself and it works. Please test again with the latest snapshot. https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/

Re: merging a pre-existing file with a new page

2024-01-10 Thread Tilman Hausherr
Hi, Please retry with 2.0.* (there use PDDocument.load()) and with a snapshot version of 3.0.2 because we fixed bugs related to what you mention: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ If it doesn't work, please try with the command

Re: FW: PDFBox 3.0.1 Font changes when rendering PDF to Image

2024-01-10 Thread Tilman Hausherr
*, 0, img1.getHeight() + offset); g2.dispose(); *return*newImage; } } *From:* Tilman Hausherr *Sent:* Wednesday, January 10, 2024 10:17 AM *To:* users@pdfbox.apache.org *Subject:* Re: FW: PDFBox 3.0.1 Font changes when rendering PDF to Image * **  External Email - Use Caution * Hi, I teste

Re: FW: PDFBox 3.0.1 Font changes when rendering PDF to Image

2024-01-10 Thread Tilman Hausherr
: A sample PDF file can be seen here: https://www.dropbox.com/scl/fi/w5zgfrqbulungxd4dpq37/MuseTest.pdf?rlkey=jskisldanhoxf3pvcqqy6nk7b=0 -Original Message- From: Tilman Hausherr Sent: Wednesday, January 10, 2024 8:09 AM To:users@pdfbox.apache.org Subject: Re: FW: PDFBox 3.0.1 Font change

Re: java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62

2024-01-10 Thread Tilman Hausherr
Hi, This is a syntax error in the PDF. There should be another token after "/N". Tilman On 10.01.2024 13:19, John, Ines wrote: Hello PdfBox-Team, we have the following problem in our project: When merging documents we get an exception for a certain document. That’s why we updated the

Re: FW: PDFBox 3.0.1 Font changes when rendering PDF to Image

2024-01-10 Thread Tilman Hausherr
Hi, We'd need the PDF file, please upload to a sharehoster. Your attachments (all of them) didn't get through. Also try to use the latest snapshot https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ and look at the log messages. Tilman On

Re: Inquiry on Filling Chinese Characters in AcroForm with PDFBox 3.0.1

2024-01-05 Thread Tilman Hausherr
Hi, I only remember that we always advise to never embed font subsets in AcroForm fields. Your subsetted file doesn't have the actual subset fonts. Does this effect also happen when you don't flatten? And if you save first, then reload and flatten? Tilman On 05.01.2024 08:41, Congwei Ni

Re: Cannot get overlaypdf working on command line interface

2024-01-05 Thread Tilman Hausherr
Hi, The bug I found and fixed ( https://issues.apache.org/jira/browse/PDFBOX-5748 ) is only in the command line interface. Please try with a snapshot build: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ *(at the bottom)* and tell

Re: Cannot get overlaypdf working on command line interface

2024-01-04 Thread Tilman Hausherr
Sorry, seems I read part 1, 2 and 4 but not part 3. I suspect a bug in OverlayPDF.java that has been there since the end of 2020 (!), but only in 3.0.* and the trunk, "infile" is never assigned. Tilman On 05.01.2024 08:00, Lukas Jans wrote: *Tilman Hausherr*- Donnerstag, 4. Janu

Re: Cannot get overlaypdf working on command line interface

2024-01-04 Thread Tilman Hausherr
Please use "overlay" instead of "OverlayPDF". This is a documentation bug. (See also the "did you mean" line in the error message) Tilman On 04.01.2024 12:00, Lukas Jans wrote: Hello I am having troubles using the pdfbox command line interface. I have downloaded the pdfbox-app-3.0.1.jar

Re: Importing landscape format and portrait format oriented pages into the same PDF causes PDF corruption

2024-01-03 Thread Tilman Hausherr
Please retry with 3.0.1 and if it still doesn't work, with the current snapshot version, because there have been several bugs related to include "foreign" pages in PDFs. https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ Tilman On 03.01.2024

Re: Splitter creates corrupted PDFs

2023-12-27 Thread Tilman Hausherr
account. regards robert On 2023/12/21 12:08:01 Tilman Hausherr wrote: Hi, I remember your name, you tried to create a JIRA account with the text "submitting a bug", which was a meaningless text unlike your subject now which is a meaningful text. I was able to reproduce the problem and ha

Re: Splitter creates corrupted PDFs

2023-12-21 Thread Tilman Hausherr
Hi, I remember your name, you tried to create a JIRA account with the text "submitting a bug", which was a meaningless text unlike your subject now which is a meaningful text. I was able to reproduce the problem and have created a ticket: https://issues.apache.org/jira/browse/PDFBOX-5742 If

Re: Blank pages when splitting PDF with version 3.0.1

2023-12-19 Thread Tilman Hausherr
Hi, Please retry with the current snapshot: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ if it doesn't work, please upload your file to a sharehoster that doesn't require login. Tilman On 19.12.2023 15:53, Marco Philipp GRAF wrote:

Re: PDF to PDF/A conversion on java

2023-12-19 Thread Tilman Hausherr
On 19.12.2023 00:24, CowwoC wrote: I'm going to need to do something like this in the near future. Are there any good samples or documentation I can look at for this use-use? import java.io.ByteArrayOutputStream; import java.io.File; import java.io.IOException; import java.io.InputStream;

Re: PDF to PDF/A conversion on java

2023-12-16 Thread Tilman Hausherr
On 21.11.2023 11:31, Kirandas vakkil wrote: Hi All, Can you please share if there is any resource on converting EXISTING PDF to PDF/A in java. There are commercial tools for this. PDFBox doesn't offer anything, however you can still do it if there are very few errors and you know how to fix

Re: PDFBox 3.0.1 renderer fails on certain files

2023-12-16 Thread Tilman Hausherr
The file you mention likely has an almost empty stream. The other viewers don't fail, that's the difference. There might also be a different problem (object reference mismatch), so it would be nice to have the file. Despite the LZW compression, the part that fails isn't an image in this stack

Re: Could not load font file

2023-12-14 Thread Tilman Hausherr
Hi, The "SubstFormat" bug is not really important because it doesn't abort, the "Format 14 cmap table" isn't really a bug, there are usually several tables. Please try a snapshot version, the "SubstFormat" bug has been fixed:

Re: Regarding CMap invalid query

2023-12-13 Thread Tilman Hausherr
On 13.12.2023 17:26, Tmy Hub wrote: I have a pdf that has Veranda Bold Font. And Indentify H type. We cannot able to read that font text correctly. It shows invalid CMap. I will attach the PDF file. What I have to do in that. Let us know and it will greatly helpful for us. Yes the

Re: Fetch the background color for text in PDF

2023-12-05 Thread Tilman Hausherr
There is no such a thing as "the background color". The background is whatever you have at the area when you're putting out the glyphs. It can be several colors if you're overwriting an image. Tilman On 06.12.2023 03:23, Jeffrey Matthew wrote: Hey Team, I'm new to pdfbox and working on

Re: Font operation takes a long time with 3.0.1

2023-12-05 Thread Tilman Hausherr
➜ ~ grep -i NotoSansKannada .pdfbox.cache *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000 Thanks for the quick response, great work! BR Kjetil tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr : Thanks, new snapshot build here: https://repository.

Re: Font operation takes a long time with 3.0.1

2023-12-05 Thread Tilman Hausherr
. des. 2023 kl. 05:03 skrev Tilman Hausherr : Please do also post the full (for pdfbox / fontbox) stack trace. I have a theory why it happens, which is that addTrueTypeCollection() does not add the font as "*skipexception*" to the cache file because it's not done in the exception handle

Re: Font operation takes a long time with 3.0.1

2023-12-04 Thread Tilman Hausherr
Please do also post the full (for pdfbox / fontbox) stack trace. I have a theory why it happens, which is that addTrueTypeCollection() does not add the font as "*skipexception*" to the cache file because it's not done in the exception handler. Tilman On 04.12.2023 21:17, Tilma

Re: Font operation takes a long time with 3.0.1

2023-12-04 Thread Tilman Hausherr
Does the stack trace appear at every start? If yes then it's a bug. The intent of the current code is that bad fonts aren't retried. The font cache file should contain a line with "*skipexception*" for that font. Can you look at it for the two font files? I could change SHA512 to CRC32. It

AW: Font operation takes a long time with 3.0.1

2023-12-04 Thread Tilman Hausherr
This should happen only once in 3.0.1, unless you're working with a container without font cache file in the image. SHA512 checksum is done only if the file modification date of a font file has changed, then we check whether the content has changed. Tilman -- Original-Nachricht -- Von: Kjetil

Re: Odd OCG error

2023-11-22 Thread Tilman Hausherr
Great. The problem mentioned by Andreas will be fixed in the next version. Tilman On 22.11.2023 17:59, John Lussmyer wrote: Thanks, that really helps.  Since we are too close to release to try a newer PDFBox jar, I just added this little bit of code to our system so these PDF's will work. (the

Re: Odd OCG error

2023-11-21 Thread Tilman Hausherr
Please retry with the 3.0.1 snapshot, there were bugs fixed related to combining files. If there bug is still there please create a ticket in JIRA Tilman On 21.11.2023 19:56, John Lussmyer wrote: I'm using PDFBox 3.0.0 to combine some PDF files.  One of the files uses an Optional Content

Re: Fwd: PDF/A convertion

2023-11-20 Thread Tilman Hausherr
Hi, There isn't. You could write something yourself if the defects are minor and the files are all from the same source. A typical example is PDF scans, then you'd need only to fix the XML, and add an RGB output intent. Tilman On 20.11.2023 14:48, Kirandas vakkil wrote: looped --

Re: 2 errors on PDF Splitting

2023-11-17 Thread Tilman Hausherr
nasty bug. Tilman Thanks! On Fri, 17 Nov 2023 at 05:07, Tilman Hausherr wrote: Hi Joan, There's now a snapshot release with the recent change https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.1-SNAPSHOT/ so please try if it works with that one

Re: 2 errors on PDF Splitting

2023-11-16 Thread Tilman Hausherr
Hi Joan, There's now a snapshot release with the recent change https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.1-SNAPSHOT/ so please try if it works with that one. It worked for me. Tilman On 15.11.2023 10:29, Joan Fisbein wrote: I have 2 errors when

Re: Slight rendering issues Apache FOP document

2023-11-14 Thread Tilman Hausherr
On 14.11.2023 18:46, Tres Finocchiaro wrote: It's mostly in PageDrawer.java search for things line width and clip. Attempts to affect the line width are ineffective. For example, there's a function in PagDrawer which ensures that the threshold for lineWidth is at least 0.25. I've changed

Re: PDF 2.0, PDF/A-4 support

2023-11-12 Thread Tilman Hausherr
On 11.11.2023 07:35, Maruan Sahyoun wrote: Let‘s create tickets for each point and we include them in our release planning. WDYT +1 Tilman Maruan Am 11.11.2023 um 05:32 schrieb Tilman Hausherr : It turns out that the Colorburn / Colordodge change was done 5 years ago: https

Re: Slight rendering issues Apache FOP document

2023-11-12 Thread Tilman Hausherr
On 11.11.2023 15:50, Tilman Hausherr wrote: Should this be enough for a bug report? I don't know if they'll bother with a file that needs the PDFBox viewer, they'll claim that we're at fault. When I wrote "they" I meant the java folks. This isn't a PDFBox bug. Tilman

Re: Bouncy Castle dependency on Android

2023-11-11 Thread Tilman Hausherr
Hi, You need to ask this on github, we're not doing support for the Android project. The best would be to try. Open an encrypted file and don't include BouncyCastle and see if it works. Our own code does not use BC in StandardSecurityHandler which is why it's optional dependency. It is used

Re: Slight rendering issues Apache FOP document

2023-11-11 Thread Tilman Hausherr
On 09.11.2023 22:30, Tres Finocchiaro wrote: I can reproduce with an example provided with Apache FOP: ./fop -fo examples/fo/basic/border.fo -pdf foo.pdf It seems to use the same trapezoidal vector borders as the originating PDF. When printing to a 4x6 label printer, some of the borders

Re: PDF 2.0, PDF/A-4 support

2023-11-10 Thread Tilman Hausherr
, Tilman Hausherr wrote: Hi Peter, That's a lot... I'll create issues for some of these topics. The negative dash phase thing has been fixed in the latest release : https://issues.apache.org/jira/browse/PDFBOX-5636 Things that might be possible: - UTF-8 - Encryption - Colorburn / Colordodge

Re: Error splitting PDF

2023-11-09 Thread Tilman Hausherr
Joan, Yes but this can be done only within 24 hours and I missed that :-( Can you try to register again now? I assume that the username should now be available. Tilman Thanks! Joan On Thu, 9 Nov 2023 at 05:44, Tilman Hausherr wrote: Hello Joan, Sorry for the rejection, this was a

Re: PDF 2.0, PDF/A-4 support

2023-11-09 Thread Tilman Hausherr
On 09.11.2023 03:39, John Lussmyer wrote: On 11/8/2023 5:28 PM, Peter Wyatt wrote: I would think supporting the following PDF 2.0 features are highly relevant, given that other implementations are already generating PDF 2.0 files today (seehttps://pdfa.org/supporting-pdf20/) A bunch of

Re: PDF 2.0, PDF/A-4 support

2023-11-09 Thread Tilman Hausherr
isible differences. See also https://pdfa.org/how-to-get-started-with-pdf-2-0/ since reporting a simple PDF version is unlikely to withstand the test of time... Of course I am also biased  - and I'm not a Java expert! -Original Message- From: Tilman Hausherr Sent: Thursday, Novemb

Re: Any pdfbox 2 vs pdfbox 3 memory usage benchmark ?

2023-11-09 Thread Tilman Hausherr
On 09.11.2023 12:26, Olivier Masseau wrote: Hello, I've read in the release notes that pdfbox 3 has better memory usage. But is there somewhere some benchmark that compares pdfbox 2 and pdfbox 3 memory usage in given scenarios ? No... but it should be better by design for most of the classic

Re: Error splitting PDF

2023-11-08 Thread Tilman Hausherr
Hello Joan, Sorry for the rejection, this was a close call, the description didn't mention what happened (a stack overflow). Feel free to register again so you can follow the issue I created https://issues.apache.org/jira/browse/PDFBOX-5712 Tilman On 08.11.2023 21:36, Joan Fisbein wrote:

Re: PDF 2.0, PDF/A-4 support

2023-11-08 Thread Tilman Hausherr
We don't have roadmaps. If you need a PDF 2.0 feature, tell us which one and why. PDF/A-4 isn't a topic because preflight isn't developed further. Use VeraPDF instead. You can create PDF/A-4 files like you can create PDF/A-1b files. Tilman On 08.11.2023 00:15, Gili Tzabari wrote: Hi, I

Re: Little CMS

2023-11-07 Thread Tilman Hausherr
Maybe a JPEG / JPEG2000 within the PDF? Or some XMP data within the PDF? Tilman On 07.11.2023 16:59, Florian Schlittgen wrote: Thanks for your feedback. The Java version I am currently using is corretto-11.0.21, so this is the up-to-date version of Java 11. Is the assumption correct that the

  1   2   3   4   5   6   7   8   9   10   >