On Sat, Apr 16, 2011 at 10:33 AM, muthukumar swamy <cmksw...@gmail.com> wrote:
> I am trying to convert the tables in pdf to Excel. I am using CAM::Pdf
> module for reading the text from Pdf. please suggest me anyone for
> other way for converting PDF to Excel.
>

you're going to have to try real hard to make it accurate (especially
if the pdf wasn't written very well). however, consider this:
http://efreedom.com/Question/1-745138/Get-Text-Orientation-Text-String-PDF-Page-Using-CAM-PDF
then, take the x / y and check your boundary (ie, see if it is running
into anything else and see if anything else is on the same line), then
store it somehow (hash, array, ref, whatever). then use
Excel::Writer::XLSX (or the older Spreadsheet::WriteExcel) and create
your spreadsheet file however you want it. this module is so well
documented, i don't think i can say too much more about it.

now, you didn't say how these pdfs were made, whether you were going
to have to ocr them, and whether your information was in any other
format. as you'll notice when you take that example and parse a file
with it, what you have are objects (generally words or lines) and
their position on a page. these positions aren't nice numbers, you
might have y values of 3.43, 4.15, 5.67, etc. and the same goes for x
values (your columns). i haven't read the pdf spec, but it doesn't
seem that these numbers (object placements) have to be put into any
order in the file (so, 5 might come before 3). as you might also
notice from that example, your text can appear at any angle you want -
how do you plan to deal with that? if you are a 'beginner' you aren't
in kansas anymore :)

i almost positive that you're not going to like doing this and that if
you're not making money with it, you'll probably fail. now that i've
given those words of wisdom, if you do succeed and are allowed to, i'd
really enjoy seeing the end result (or the function or module that
makes this work).

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


  • Pdf to Excel muthukumar swamy
    • Re: Pdf to Excel shawn wilson

Reply via email to