Check out OpenRefine. It's a tool for designing data cleanup workflows.
http://openrefine.org/documentation

On 08/02/2013 04:12 AM, G G wrote:
I agree that If your invoice numbers have a consistent pattern or patterns,
regex is likely your best bet.
furthermore, If the invoice "file" has a format of something like=

text text text
inv #: 1234345TUY
more text more text

then you can be specific and pull the regex from a particular part of the
text with something like this
(^inv#)(.*?)(\n|\r) and grab group 2 from the regex
otherwise if your inv # is just floating in free text, then you will have
something like this
(?:.*?)([0-9]{7}[A-B]{3})(?:.*?) and grab all the matches from group 2
(this regex is based on my fake inv# above).
Don't take these examples literally, just helping with some ideas,
typically for stuff like this I will construct a list of priorized regexes
and run them all and the first match wins
Mark G


On Thu, Aug 1, 2013 at 9:08 AM, Jim <[email protected]> wrote:

if your invoices are all from the same source (therefore in the same
format) then maybe openNLP is a bit of an overkill. A simple regex should
do the job :)

Jim



On 01/08/13 11:47, Umang wrote:

Hi Team,

I want to use Open NLP for detecting Invoice Number from Invoices. For
eg. PFA invoice from which I need to extract Invoice Number. How can I do
this?

Regards,

*Umang Sand*

Newgen Software Technologies Ltd.

www.newgensoft.com <http://www.newgensoft.com/>


Phone:- +91-120-6761000  Ext. :931

Mobile No. : +91- 9711135529


             Disclaimer :- This e-mail and any attachment may contain
             confidential, proprietary or legally privileged
             information. If you are not the original intended
             recipient and have erroneously received this message, you
             are prohibited from using, copying, altering or disclosing
             the content of this message. Please delete it immediately
             and notify the sender. Newgen Software Technologies Ltd
             (NSTL) accepts no responsibilities for loss or damage
             arising from the use of the information transmitted by
             this email including damages from virus and further
             acknowledges that no binding nature of the message shall
             be implied or assumed unless the sender does so expressly
             with due authority of NSTL.




Reply via email to