On 03.02.2009 17:48:14 Graeme Kidd wrote: > > > FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics > > Commons [1], i.e. as open source under the same license as PDFBox. > > > > [1] http://xmlgraphics.apache.org/commons/ > Thanks I will look into that as well > > > Usually you can't identify an isolated vector image inside a PDF as it > > may be interleaved with normal text. Only if the images are embedded as > > Form XObjects can you isolate them reliably. Or if the PDF is tagged but > > PDFBox can't you help in that case, yet. Even if you can isolate it, > > PDFBox will need to be able to paint just the selected part of a page. > Well Adobe Acrobat was able to detect the images with it's "Export > images" functionality so I assume they are embedded somehow by an > XObject.
Yes, but that's only for bitmap images, right? Or does Acrobat extract Form XObjects as PDF files with that function? > I noticed you had an ExtractImages class, would I be able to modify this to > extract vectors? > Would I need it to give me a list of Fill/Stroke/Path data points in order > for it to extract correctly? Basically, besides normal XObjects (Type XObject, Subtype Image) you'd have to add support for XObjects of Type XObject, Subtype Form. When you've identified such an object you have a content stream like for a page. It should be relatively easy to extend the PageDrawer to paint Form XObjects on a Graphics2D object. But again, your images need to be embedded in your PDF as Form XObjects in the first place. If they are unmarked inline images, the only thing you can do is try to render just the relevant area with the PageDrawer. I don't know enough of PDFBox to say how difficult that would be. You'd have to identify the relevant area to begin with. > ---------------------------------------- > > Date: Tue, 3 Feb 2009 17:23:18 +0100 > > From: [email protected] > > To: [email protected] > > Subject: Re: Extract vectors > > > > On 03.02.2009 17:07:29 Graeme Kidd wrote: > >> > >> Thanks for the suggestion, > >> I am a total beginner at this so any helpful advice is greatly > >> appreaceated. > >> > >> I suppose I could use something like this > >> http://www.jibble.org/epsgraphics/ to save it as an EPS file. > > > > FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics > > Commons [1], i.e. as open source under the same license as PDFBox. > > > > [1] http://xmlgraphics.apache.org/commons/ > > > >> The only problem I have so far is how to detect if the image is a > >> vector graphic in which case I can draw it then save it. Otherwise at the > >> moment I will just be saving the entire page as an EPS file. > > > > Usually you can't identify an isolated vector image inside a PDF as it > > may be interleaved with normal text. Only if the images are embedded as > > Form XObjects can you isolate them reliably. Or if the PDF is tagged but > > PDFBox can't you help in that case, yet. Even if you can isolate it, > > PDFBox will need to be able to paint just the selected part of a page. > > > >> Thanks again for your help so far. > >> > >> > >> ---------------------------------------- > >>> Date: Tue, 3 Feb 2009 09:04:33 -0500 > >>> Subject: Re: Extract vectors > >>> From: [email protected] > >>> To: [email protected]; [email protected] > >>> > >>> You can extend the PageDrawer class and have it do something other than > >>> actually drawing ... > >>> > >>> I've extended it to draw a little differently and in .Net ... it's not a > >>> small undertaking, but is possible. > >>> > >>> On 2/3/09, Graeme Kidd wrote: > >>>> > >>>> > >>>> > >>>> Hi, > >>>> > >>>> I was just wondering if I could use PDFBox to extract vecor graphics? > >>>> > >>>> Thanks. > > > > > > > > Jeremias Maerki > > > _________________________________________________________________ > Windows Live Messenger just got better .Video display pics, contact updates & > more. > http://www.download.live.com/messenger Jeremias Maerki
