On Tue, Dec 29, 2009 at 5:49 PM, Shashwat Anand <[email protected]>wrote:

> How can we retrieve images from PDFs. I need both images and the text
> beneath the image to form a database. I was able to parse text via PDFMiner
> but was crippled when it leads to images.



Searching my apt cache for python pdf shows a lot of libraries some of which
claim to be able to manage the entire contents of the PDF file in question.
I have also come across some tool to break a PDF down into HTML + image
files (don't remember it's name anymore) which was free software so I'm sure
it's doable.



-- 
~noufal
http://nibrahim.net.in
_______________________________________________
BangPypers mailing list
[email protected]
http://mail.python.org/mailman/listinfo/bangpypers

Reply via email to