Well, you can always call system command from .NET or Java....ugly, but
doable

 

Do you have any control over the invoices coming in?  If so, there are a
number of options.

 

Given that you are just looking to split the PDF into smaller PDFs, you
really don't need iText.  Simple .bat file using tools like Pdf2Text and
PdfSAM would work

 

Jason

 

From: Tom Malia [mailto:tomma...@tandtdatasolutions.com] 
Sent: Friday, July 06, 2012 3:05 PM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] reading text at a particular place on a
page?

 

Thanks, I'm a little confused though.

First, the looks like a command line utility.  I was looking for a
programming library I could use in, ideally .NET but Java could be OK
too.

 

Second, I don't actually need or want to extract the text from a PDF
file.  I want to read the text in the PDF file so that I can "split" the
PDF file into separate PDF files.

 

So for example, let's say that my import file contains 5 separate
invoices.  Let's say that the invoices are like this:

 

Invoice 1 has 2 pages (pages 1-2 of the import file)

Invoice 2 has 1 page (page 3 if the import file)

Invoice 3 has 3 pages (pages 4-6 of the import file)

Invoice 4 has 1 page (page 7 if the import file)

Invoice 5 has 1 page (page 8 if the import file)

 

The first page and only the first page of each invoice has the string
pattern:

Invoice #: 9999999999

 

In the top, right 1 inch square section of the page.

 

So I want to write a program that scans through the import file and each
time it finds the pattern:

Invoice #: 9999999999

On a page, it determines that that is the first page of an invoice and
then it should export a PDF file that contains that page plus all
subsequent pages, up to but not including the next page that it
encounters with that pattern.

 

Is there a way I could leverage either of these tools for that purpose?

 

 

From: Jason Berk [mailto:jb...@purduefed.com] 
Sent: Friday, July 06, 2012 2:40 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] reading text at a particular place on a
page?

 

I've used this one: http://linux.die.net/man/1/pdftotext 

 

Pulled account numbers off of IRS forms so I could load them into a
database by clipping the bottom left corner....worked really well

 

There's also: http://www.colorpilot.com/pdf2text-command-line.html

 

From: Tom Malia [mailto:tomma...@tandtdatasolutions.com] 
Sent: Friday, July 06, 2012 2:08 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] reading text at a particular place on a page?

 

I need to programmatically split large PDF files into separate files
based on the presence or absence of particular text patterns in
particular position on pages. 

 

For example,  I might have PDF files that result from running an
"invoices" report from an accounting system.  In this report, lots of
invoices for different customers are generated in the single report and
subsequently end up in one big PDF file.

 

So I want to be able to scan that file, looking for pages that might
have a text patter like:

 

Invoice #: 99999999

 

Maybe in the upper right hand corner of the page and then split the PDF
file based on finding the pages that have this pattern.

 

I've Googled like crazy to try to find examples or explanations of ways
to read text from particular rectangular regions of a PDF page, but have
little or no luck.

 

If the iText in Action book covers this topic, I'd be happy to purchase
it.  However, if it doesn't cover this specific topic, then though it
would still be "nice to have" the book, I don't really have a need for
anything else right now.

 

Thanks in advance,

Tom Malia

 

This is a transmission from Purdue Federal Credit Union (Purdue Federal)
and is intended solely for its authorized recipient(s), and may contain
information that is confidential and or legally privileged. If you are
not an addressee, or the employee or agent responsible for delivering it
to an addressee, you are hereby notified that any use, dissemination,
distribution, publication or copying of the information contained in
this email is strictly prohibited. If you have received this
transmission in error, please notify us by telephoning (765)497-3328 or
returning the email. You are then instructed to delete the information
from your computer. Thank you for your cooperation.
 

This is a transmission from Purdue Federal Credit Union (Purdue Federal) and is 
intended solely for its authorized recipient(s), and may contain information 
that is confidential and or legally privileged. If you are not an addressee, or 
the employee or agent responsible for delivering it to an addressee, you are 
hereby notified that any use, dissemination, distribution, publication or 
copying of the information contained in this email is strictly prohibited. If 
you have received this transmission in error, please notify us by telephoning 
(765)497-3328 or returning the email. You are then instructed to delete the 
information from your computer. Thank you for your cooperation.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to