Re: regexp help

Ian Sheridan Mon, 30 Aug 2004 22:38:50 -0700

heh after talk'n out of my but this should work for you

([[:alnum:] ]+)\r\s*([0-9]{7})\s*([0-9]+)\s*([0-3][0-9]/[0-1][0-9]/[0-9]{4})\s*([A-Z]{2,3})\s*([0-9]+\.[0-9]{2})

You have to add some of your own business logic to not parse the "this
is just a confirmation email" emails. I do not know all the codes that
you are using with the USD codes so I left it like this: [A-Z]{2,3}

I just hope that I have helped somewhere.

Ian

----- Original Message -----
From: chris porter <[EMAIL PROTECTED]>
Date: Mon, 30 Aug 2004 21:40:53 -0400
Subject: Re: regexp help
To: CF-Talk <[EMAIL PROTECTED]>

that was an option i explored, however these emails arent coming from
just one source, in fact there are hundreds of different company
emails coming in, so my options were, either 1) define the data i
need, and specify a regex to grab it, then database the expressions,
or 2) write a custom script in code that can identify each one of
those possibly by regex. personally i opted for option a cause i can
always fine tune an _expression_, but changing code gets tedious.

on another note, what distinguishes the item description line during
a line by line scan from something like

"this is just a confirmation email."

get my drift?
-chris

>I understand that, what I meant was, wouldn't it be easier to parse
>each line separately? This way you do not need to have such a highly
>complicated RegEx. You can make it much more simple and a little bit
>more flexible. Not to mention easier to maintain. (i.e. two separate
>regular expressions or more)
>
><cfscript>
>   email = emailFromServer; // focused down to the body content only
>   n = listlen(email,chr(13) & chr(32));
>   output = arraynew(1);
>   output[1] = structnew();
>   a = arraynew(2);
>   a[1][1] = "RegEx";
>   a[1][2] = "description";
>   a[2][1] = "RegEx";
>   a[2][2] = "productnumber";
>   nn = arraylen(a); // regex count
>   nnn = 1; // record count
>   c = 0; // field count
>   for (i=1:i LTE n;i=i+1) {
>      currentitem = listgetat(email,i,chr(13) & chr(32));
>      for (ii=1;ii LTE nn;ii=ii+1) {
>         if (refind(a[i][1],currentitem)) {
>            output[nnn]['#a[i][2]#'] = trim(currentitem);
>            if (c GT 5) {
>               c = 0;
>               nnn = nnn + 1;
>               output[nnn] = structnew();
>            } else {
>               c = c +1;
>            }
>         }
>      }
>   }
></cfscript>
>
>heh I went a little crazy here but I pretty sure this would work. Any
>future changes on what each feild would hold would be easy to change.
>(beware I did not test it)
>
>Ian
>
>
>----- Original Message -----
>From: Michael Dinowitz <[EMAIL PROTECTED]>
>Date: Mon, 30 Aug 2004 18:23:04 -0400
>Subject: RE: regexp help
>To: CF-Talk <[EMAIL PROTECTED]>
>
>You can get the first and second lines based on a new line delimited list
> and you can get the items in the second line using a space delimited list. I
> just like to be specific when parsing data from a source like email. My
> preference is to pop 2 messages, save the first in a DB (raw) and use the
> second as a flag to rerun the page. Once all the mail is down and stored in
> the DB (one at a time, there's a reason), I have a second process parse each
> message in the tightest way possible. I'm paranoid (as all programmers
> should be) about data from outside sources and I want to be 100% sure of
> what I'm getting and how. If there's a problem, then I want to know exactly
> what's up.
>
>   _____
>
>
>
> Why don;t you just go through the text as a list with CR as the
> delimiter? This way you can have much more focused regular
> expressions.
>
> Just a thought,
>
> Ian
>
> ----- Original Message -----
> From: Michael Dinowitz <[EMAIL PROTECTED]>
> Date: Mon, 30 Aug 2004 16:49:48 -0400
> Subject: RE: regexp help
> To: CF-Talk <[EMAIL PROTECTED]>
>
> Really fast (Using the multi-line move of CFMX)
>
> ^([^#chr(13)#]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]{2
> }/[0-9]{2}/[0-9]{4})[[:space:]]+(USD)?[[:space:]]*([0-9.]+)$
>
>    _____
>
> From: chris porter [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 30, 2004 4:41 PM
> To: CF-Talk
> Subject: Re: regexp help
>
> and one last time....
>
> DATA:
>
> Product Name
> Product Number            Qty      Est. Ship Date                 Your Ext.
> Price
> [dashed go here all the way across PITA email parser]
>
> description of some item1
> 0344437                     1           03/12/2004                 USD
> 335.75
>
> another description of some item1
> 0344734                     1           03/12/2004                 USD
> 335.75
>
> and one last description of some item
> 0433447                     1           03/12/2004                 USD
> 335.75
>
> part i need parsed by a regex
>
> "description of some item1
> 0344437                     1           03/12/2004                 USD
> 335.75"
>
> current REGEX:
> ([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)
>
> that regex matches everything on the 2nd line correctly, but nothing i add
> to the beginning will match the first line. any thoughts?
>
> Thanks!
> -Chris________________________________________________________________

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Re: regexp help

Reply via email to