Re: [iText-questions] Can iText replace images etc.

Leonard Rosenthol Sat, 25 Feb 2006 17:32:50 -0800

At 03:18 PM 2/25/2006, Petter Nyström wrote:

You could use iText to extract theimages, though you'd also need a VERY detailedunderstanding of image handling and colormanagement in order to make sure that theextracted data was in the correct form.
That sounds problematic. I assume I have beenwrong in my assumption that the stream data ofthe PDF holds raw image data in some format, be it jpeg of tiff or other?


        Yes, you are wrong in that assumption - at least partially.

Image data in PDF is either in JPEG/JFIFformat (which can just be written out to a file)- OR it is simply an array of "pixels" in thespecified colorspace. So in the latter case(which is probably the more common), you wouldneed to transform the data into something usuablein JPEG, TIFF, etc. This may include not onlyfile format, but also colorspace handling sincePDF supports 11 colorspaces while JPEG (for example) only does 2.

It is not as simple as taking this data andwriting it to a file, and voila there's the image?


        Correct, it is not that simple.

Depending on what types ofmodifications you are going to allow the 3rdparty tools to do, it MIGHT be possible to useiText, but you'd need to work at a very lowlevel of PDF functionality to find, modify and replace the relevant objects.
But do iText have support for working at thislow level, or will I need to write my ownroutines for hacking into the PDF syntax?

No, all the PDF syntax stuff is done foryou. HOWEVER, you will need to understand WHATPDF "objects" you need to add/modify, etc.

When I set out on my search for PDF libraries,my highest goal was really to find a PDFparser. I would love to find code that takes aPDF document and turns it into a datastructure representing the elements in the PDF- i.e. a parse tree. Then I could traversethis tree and do whatever modifications I'dlike to the nodes therein. When finished, I'dneed some code to turn the parse tree back into a flat string - a PDF document.
       There are a couple of commercial libraries that offer this feature.
Alright, I regret my statement that non-opensource solutions were out of the picture. Pleaseshare the names on these libraries! =)

Adobe's PDFLibrary and PDF.NET(http://www.pdftron.com/) both offer this.

Specifically, I think I can get my hands on theofficial Adobe SDK for PDF:s. Does anyone knowwhat sort of support that library could give me?

Well, there are two different aspects tothe Acrobat SDK. First are the tools forbuilding plugins to Adobe Acrobat, and the secondis the Adobe PDFLibrary for stand-aloneapplications. Both offer what you need, thoughonly the second could be used for server-sidesolutions - but the first is FREE (minus the costof Acrobat, of course) and the second is quite expensive.

Also, would anyone have an opinion of how feasible this sort of approach is?

The approach you describe is that takenby a number of commercial solutions - includingthose from my company. So it's quite feasible and is the right approach.

Are there for example official formal grammarsavailable for the PDF syntax? Something youcould feed to lex/yacc or the similar. (I thinknot, because I spent quite some time looking.)


        It's been tried, but the PDF syntax doesn't fit well to BNF.


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Can iText replace images etc.

Reply via email to