I understand that, what I meant was, wouldn't it be easier to parse
each line separately? This way you do not need to have such a highly
complicated RegEx. You can make it much more simple and a little bit
more flexible. Not to mention easier to maintain. (i.e. two separate
regular expressions or more)

<cfscript>
   email = emailFromServer; // focused down to the body content only
   n = listlen(email,chr(13) & chr(32));
   output = arraynew(1);
   output[1] = structnew();
   a = arraynew(2);
   a[1][1] = "RegEx";
   a[1][2] = "description";
   a[2][1] = "RegEx";
   a[2][2] = "productnumber";
   nn = arraylen(a); // regex count
   nnn = 1; // record count
   c = 0; // field count
   for (i=1:i LTE n;i=i+1) {
      currentitem = listgetat(email,i,chr(13) & chr(32));
      for (ii=1;ii LTE nn;ii=ii+1) {
         if (refind(a[i][1],currentitem)) {
            output[nnn]['#a[i][2]#'] = trim(currentitem);
            if (c GT 5) {
               c = 0;
               nnn = nnn + 1;
               output[nnn] = structnew();
            } else {
               c = c +1;
            }
         }
      }
   }
</cfscript>

heh I went a little crazy here but I pretty sure this would work. Any
future changes on what each feild would hold would be easy to change.
(beware I did not test it)

Ian

----- Original Message -----
From: Michael Dinowitz <[EMAIL PROTECTED]>
Date: Mon, 30 Aug 2004 18:23:04 -0400
Subject: RE: regexp help
To: CF-Talk <[EMAIL PROTECTED]>

You can get the first and second lines based on a new line delimited list
and you can get the items in the second line using a space delimited list. I
just like to be specific when parsing data from a source like email. My
preference is to pop 2 messages, save the first in a DB (raw) and use the
second as a flag to rerun the page. Once all the mail is down and stored in
the DB (one at a time, there's a reason), I have a second process parse each
message in the tightest way possible. I'm paranoid (as all programmers
should be) about data from outside sources and I want to be 100% sure of
what I'm getting and how. If there's a problem, then I want to know exactly
what's up.


   _____  

Why don;t you just go through the text as a list with CR as the
delimiter? This way you can have much more focused regular
expressions.

Just a thought,

Ian

----- Original Message -----
From: Michael Dinowitz <[EMAIL PROTECTED]>
Date: Mon, 30 Aug 2004 16:49:48 -0400
Subject: RE: regexp help
To: CF-Talk <[EMAIL PROTECTED]>

Really fast (Using the multi-line move of CFMX)

^([^#chr(13)#]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]{2
}/[0-9]{2}/[0-9]{4})[[:space:]]+(USD)?[[:space:]]*([0-9.]+)$


    _____  

From: chris porter [mailto:[EMAIL PROTECTED]
Sent: Monday, August 30, 2004 4:41 PM
To: CF-Talk
Subject: Re: regexp help

and one last time....

DATA:

Product Name
Product Number            Qty      Est. Ship Date                 Your Ext.
Price
[dashed go here all the way across PITA email parser]

description of some item1
0344437                     1           03/12/2004                 USD
335.75

another description of some item1
0344734                     1           03/12/2004                 USD
335.75

and one last description of some item
0433447                     1           03/12/2004                 USD
335.75

part i need parsed by a regex

"description of some item1
0344437                     1           03/12/2004                 USD
335.75"

current REGEX:
([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)

that regex matches everything on the 2nd line correctly, but nothing i add
to the beginning will match the first line. any thoughts?

Thanks!
-Chris________________________________
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Reply via email to