On 04.02.2009 20:11:35 Andreas Lehmkühler wrote: > Jeremias Maerki schrieb: > > >> But it could be an alternative to modify ExtractImages as follows: > >> > >> - use resources.getXObjects() instead of resources.getImages() > >> - iterate through the XObjects filtering with the subtype "Form" > >> - create PDXObjectForm-objects > >> - save the stream of the XObject to a file > > > > Ok, but what would saving the stream to a file accomplish? It would not > > be a valid PDF file and you'd still have to write some sort of > > interpreter. I'm not sure if ExtractImages should be enhanced at all. If > > functionality could be added to extract Form XObjects, some people will > > want to extract them as bitmaps. Others will want vectors. But in what > > format? Some will want PDF, others EPS or SVG. I guess that will be > > subject to discussion how this should be done. Anyway, the first step as > > I see it would be extending PageDrawer to be able to draw Form XObjects, > > too. That way, people can convert those Form XObject to any output > > format they want. > First of all there was a misunderstanding on my side. I thought, that a > Form XObject supports several vector formats like svg etc. and that the > handling is similar to Image XObjects. But after your post and some > minutes reading the pdf-specs I realized it's different. Form XObject > are embedded mins-pdfs within a pdf. Finally we "simply" have to parse > the stream of the Form Xobject and that's it. As you can see in > org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of > pdfbox. So displaying such a document shouldn't be a problem. To save an > isolated Form XObject as bitmap or so, isn't possible yet, but it > couldn't be that difficult.
Cool. I didn't think it could be that easy. > > But then, we still don't know if Graeme Kidd's PDF actually contains > > images in the form of Form XObjects or not. > Until now the whole discussion was theoretical, but perhaps someone > could provide us with a example.... Nothing easier than that: http://people.apache.org/~jeremias/fop/tiger-as-form-xobject.pdf 1. fop -imagein tiger.svg -pdf tiger.pdf (I used FOP Trunk, but the latest release would also work) 2. Create a small FO file which includes the generated PDF using an fo:external-graphic. 3. fop -fo tiger-as-form-object.fo -pdf tiger-as-form-xobject.pdf (if you have my PDF-in-PDF plugin for FOP in the classpath which uses PDFBox to parse the PDF by the way). Have fun! :-) Jeremias Maerki
