Well, you can always call system command from .NET or Java....ugly, but
doable
Do you have any control over the invoices coming in? If so, there are a
number of options.
Given that you are just looking to split the PDF into smaller PDFs, you
really don't need iText. Simple .bat file using tools like Pdf2Text and
PdfSAM would work
Jason
From: Tom Malia [mailto:tomma...@tandtdatasolutions.com]
Sent: Friday, July 06, 2012 3:05 PM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] reading text at a particular place on a
page?
Thanks, I'm a little confused though.
First, the looks like a command line utility. I was looking for a
programming library I could use in, ideally .NET but Java could be OK
too.
Second, I don't actually need or want to extract the text from a PDF
file. I want to read the text in the PDF file so that I can "split" the
PDF file into separate PDF files.
So for example, let's say that my import file contains 5 separate
invoices. Let's say that the invoices are like this:
Invoice 1 has 2 pages (pages 1-2 of the import file)
Invoice 2 has 1 page (page 3 if the import file)
Invoice 3 has 3 pages (pages 4-6 of the import file)
Invoice 4 has 1 page (page 7 if the import file)
Invoice 5 has 1 page (page 8 if the import file)
The first page and only the first page of each invoice has the string
pattern:
Invoice #: 9999999999
In the top, right 1 inch square section of the page.
So I want to write a program that scans through the import file and each
time it finds the pattern:
Invoice #: 9999999999
On a page, it determines that that is the first page of an invoice and
then it should export a PDF file that contains that page plus all
subsequent pages, up to but not including the next page that it
encounters with that pattern.
Is there a way I could leverage either of these tools for that purpose?
From: Jason Berk [mailto:jb...@purduefed.com]
Sent: Friday, July 06, 2012 2:40 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] reading text at a particular place on a
page?
I've used this one: http://linux.die.net/man/1/pdftotext
Pulled account numbers off of IRS forms so I could load them into a
database by clipping the bottom left corner....worked really well
There's also: http://www.colorpilot.com/pdf2text-command-line.html
From: Tom Malia [mailto:tomma...@tandtdatasolutions.com]
Sent: Friday, July 06, 2012 2:08 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] reading text at a particular place on a page?
I need to programmatically split large PDF files into separate files
based on the presence or absence of particular text patterns in
particular position on pages.
For example, I might have PDF files that result from running an
"invoices" report from an accounting system. In this report, lots of
invoices for different customers are generated in the single report and
subsequently end up in one big PDF file.
So I want to be able to scan that file, looking for pages that might
have a text patter like:
Invoice #: 99999999
Maybe in the upper right hand corner of the page and then split the PDF
file based on finding the pages that have this pattern.
I've Googled like crazy to try to find examples or explanations of ways
to read text from particular rectangular regions of a PDF page, but have
little or no luck.
If the iText in Action book covers this topic, I'd be happy to purchase
it. However, if it doesn't cover this specific topic, then though it
would still be "nice to have" the book, I don't really have a need for
anything else right now.
Thanks in advance,
Tom Malia
This is a transmission from Purdue Federal Credit Union (Purdue Federal)
and is intended solely for its authorized recipient(s), and may contain
information that is confidential and or legally privileged. If you are
not an addressee, or the employee or agent responsible for delivering it
to an addressee, you are hereby notified that any use, dissemination,
distribution, publication or copying of the information contained in
this email is strictly prohibited. If you have received this
transmission in error, please notify us by telephoning (765)497-3328 or
returning the email. You are then instructed to delete the information
from your computer. Thank you for your cooperation.
This is a transmission from Purdue Federal Credit Union (Purdue Federal) and is
intended solely for its authorized recipient(s), and may contain information
that is confidential and or legally privileged. If you are not an addressee, or
the employee or agent responsible for delivering it to an addressee, you are
hereby notified that any use, dissemination, distribution, publication or
copying of the information contained in this email is strictly prohibited. If
you have received this transmission in error, please notify us by telephoning
(765)497-3328 or returning the email. You are then instructed to delete the
information from your computer. Thank you for your cooperation.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php