Re: regexp help

chris porter Mon, 30 Aug 2004 18:32:46 -0700

hmm that didnt provide a match in either the example i sent, or the actual data.. did you verrify it first? also how are you loading the regex? i was just using reFindNoCase() maybe there's a better way. I do like where you are going with the first part of your regex though. Its similar to what i origionally tried:

^([^\n\r]+)[\n\r]+([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)
and
^([^\n\r]+)([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)

i would think that the ^  would anchor to the beginning of a line (like its supposed to), ([^\n\r]+) would find an entire line of text (since the _expression_ reads "as many characters that are not newline or return") and [\n\r]+ match the CRLF at the end, then the rest of the working regex (its a mess so i wont paste that part in) would kick in. it then occured to me that it was possible that by using the [^\n\r] it possibly was providing a match on those characters, and therefore the next [\n\r]+ wouldnt find a match as the parser had already passed them during the previous find, so i removed the [\n\r]+ but still no love ( and in fact that theory is wrong, the parser stops at the invalid characters and tests those against the next part of the _expression_). so there is something not happening that i'm missing.

also, i appreciate the changing of the working regex but a couple of things to note here (and also some wise info for newbies out there):
1) the USD is the country code of the currency, and is not optional and wont alwasy be USD, therefore hard refrencing with "(USD)?" doesn't really make the regex work in all cases in fact, if the country code were GBP the optional specifier would fail (since USD was not found), and the pattern would fail because there wouldnt be anything to handle the 3 unhandled characters that were actually there.
2) while the regex used for the date is great, the requirement isnt for date validation here, just field delimitation. this is mainly because we cannot controll the format the data is being received in and as such, we wont know if the date is 10-06-78 or Oct-06-78 etc etc. i'd need to change the regex again to accomodate that. a much simpler and accurate way to parse that section is to just look for the next space ([^ ]+) and let the sql date parser or createODBCdate() take care of it.

anyways back to the topic at hand, what else am i missing at the beginning..? any thoughts anyone?
Thanks
-Chris

>Really fast (Using the multi-line move of CFMX)
>
>^([^#chr(13)#]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]{2
>}/[0-9]{2}/[0-9]{4})[[:space:]]+(USD)?[[:space:]]*([0-9.]+)$
>
>  _____
>
>From: chris porter [mailto:[EMAIL PROTECTED]
>Sent: Monday, August 30, 2004 4:41 PM
>To: CF-Talk
>Subject: Re: regexp help
>
>
>and one last time....
>
>DATA:
>
>Product Name
>Product Number            Qty      Est. Ship Date                 Your Ext.
>Price
>[dashed go here all the way across PITA email parser]
>
>description of some item1
>0344437                     1           03/12/2004                 USD
>335.75
>
>another description of some item1
>0344734                     1           03/12/2004                 USD
>335.75
>
>and one last description of some item
>0433447                     1           03/12/2004                 USD
>335.75
>
>
>part i need parsed by a regex
>
>"description of some item1
>0344437                     1           03/12/2004                 USD
>335.75"
>
>current REGEX:
>([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)
>
>that regex matches everything on the 2nd line correctly, but nothing i add
>to the beginning will match the first line. any thoughts?
>
>Thanks!
>-Chris
>  _____

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Re: regexp help

Reply via email to