On 03.02.2009 18:40:25 Andreas Lehmkühler wrote: > Jeremias Maerki schrieb: > > On 03.02.2009 18:05:14 Andreas Lehmkühler wrote: > >>> Well Adobe Acrobat was able to detect the images with it's "Export > >>> images" functionality so I assume they are embedded somehow by an > >>> XObject. > >>> > >>> I noticed you had an ExtractImages class, would I be able to modify this > >>> to extract vectors? > >>> Would I need it to give me a list of Fill/Stroke/Path data points in > >>> order for it to extract correctly? > >> I suggest to give it a try. If the images are embedded as XObjects > >> ExtractImages should do it. > > > > No, I've just checked: ExtractImages can only handle PDXObjectImage (i.e. > > bitmap images), not PDXObject of which PDFXObjectForm is a subclass. > Sorry, my fault, I didn't realize that little detail...
No need to apologize. We're all in the same boat: discovering what wonders PDFBox can already do. > But it could be an alternative to modify ExtractImages as follows: > > - use resources.getXObjects() instead of resources.getImages() > - iterate through the XObjects filtering with the subtype "Form" > - create PDXObjectForm-objects > - save the stream of the XObject to a file Ok, but what would saving the stream to a file accomplish? It would not be a valid PDF file and you'd still have to write some sort of interpreter. I'm not sure if ExtractImages should be enhanced at all. If functionality could be added to extract Form XObjects, some people will want to extract them as bitmaps. Others will want vectors. But in what format? Some will want PDF, others EPS or SVG. I guess that will be subject to discussion how this should be done. Anyway, the first step as I see it would be extending PageDrawer to be able to draw Form XObjects, too. That way, people can convert those Form XObject to any output format they want. But then, we still don't know if Graeme Kidd's PDF actually contains images in the form of Form XObjects or not. Jeremias Maerki
