Re: I need some help with a regexp please
Dennis Lee Bieber wrote: On 25 Sep 2006 10:25:01 -0700, codefire [EMAIL PROTECTED] declaimed the following in comp.lang.python: Yes, I didn't make it clear in my original post - the purpose of the code was to learn something about regexps (I only started coding Python last week). In terms of learning a little more the example was successful. However, creating a full email validator is way beyond me - the rules are far too complex!! :) I've been doing small things in Python for over a decade now (starting with the Amiga port)... I still don't touch regular expressions... They may be fast, but to me they are just as much line noise as PERL... I can usually code a partial parser faster than try to figure out an RE. If I may add another thought along the same line: regular expressions seem to tend towards an art form, or an intellectual game. Many discussions revolving around regular expressions convey the impression that the challenge being pursued is finding a magic formula much more than solving a problem. In addition there seems to exist some code of honor which dictates that the magic formula must consist of one single expression that does it all. I suspect that the complexity of one single expression grows somehow exponentially with the number of functionalities it has to perform and at some point enters a gray zone of impending conceptual intractability where the quest for the magic formula becomes particularly fascinating. I also suspect that some problems are impossible to solve with a single expression and that no test of intractability exists other than giving up after so many hours of trying. With reference to the OP's question, what speaks against passing his texts through several simple expressions in succession? Speed of execution? Hardly. The speed penalty would not be perceptible. Conversely, in favor of multiple expressions speaks that they can be kept simple and that the performance of the entire set can be incrementally improved by adding another simple expression whenever an unexpected contingency occurs, as they may occur at any time with informal systems. One may not win a coding contest this way, but saving time isn't bad either, or is even better. Frederic -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
Frederic Rentsch wrote: If I may add another thought along the same line: regular expressions seem to tend towards an art form, or an intellectual game. Many discussions revolving around regular expressions convey the impression that the challenge being pursued is finding a magic formula much more than solving a problem. In addition there seems to exist some code of honor which dictates that the magic formula must consist of one single expression that does it all. hear! hear! for dense guys like myself, regular expressions work best if you use them as simple tokenizers, and they suck pretty badly if you're trying to use them as parsers. and using a few RE:s per problem (or none at all) is a perfectly good way to get things done. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
for dense guys like myself, regular expressions work best if you use them as simple tokenizers, and they suck pretty badly if you're trying to use them as parsers. :) Well, I'm with you on that one Fredrik! :) -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
I still don't touch regular expressions... They may be fast, but to me they are just as much line noise as PERL... I can usually code a partial parser faster than try to figure out an RE. Yes, it seems to me that REs are a bit hit and miss - the only way to tell if you've got a RE right is by testing exhaustively - but you can never be sure They are fine for simple pattern matching though. -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
Yes, I didn't make it clear in my original post - the purpose of the code was to learn something about regexps (I only started coding Python last week). In terms of learning a little more the example was successful. However, creating a full email validator is way beyond me - the rules are far too complex!! :) -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
John Machin wrote: ... A little more is unfortunately not enough. The best advice you got was to use an existing e-mail address validator. We got bitten by this at the last place I worked - we were using a regex email validator (from Microsoft IIRC), and we kept having problems with specific email addresses from Ireland. There are stack of Irish email addresses out there of the form paddy.o'[EMAIL PROTECTED] - perfectly valid email address, but doesn't satisfy the usual naive versions of regex validators. We use an even worse validator at my current job, but the feeling the management have (not one I agree with) is that unusual email addresses, whilst perhaps valid, are uncommon enough not to worry about -- http://mail.python.org/mailman/listinfo/python-list
Re: Don't use regular expressions to validate email addresses (was: I need some help with a regexp please)
Ben Finney wrote: ... The best advice I've seen when people ask How do I validate whether an email address is valid? was Try sending mail to it. There are advantages to the regex method. It is faster than sending an email and getting a positive or negative return code. The delay may not be acceptable in many applications. Secondly, the false negatives found by a reasonable regex will be few compared to the number you'd get if the smtp server went down, or a remote relay was having problems delivering the message etc etc. From a business point of view, it is probably more important to reduce the number of false negatives than to reduce the number of false positives - every false negative is a potential loss of a customer. False positives? Who cares really as long as they are paying ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
Ant wrote: John Machin wrote: ... A little more is unfortunately not enough. The best advice you got was to use an existing e-mail address validator. We got bitten by this at the last place I worked - we were using a regex email validator (from Microsoft IIRC), and we kept having problems with specific email addresses from Ireland. There are stack of Irish email addresses out there of the form paddy.o'[EMAIL PROTECTED] - perfectly valid email address, but doesn't satisfy the usual naive versions of regex validators. We use an even worse validator at my current job, but the feeling the management have (not one I agree with) is that unusual email addresses, whilst perhaps valid, are uncommon enough not to worry about Oh, sorry for the abbreviation. use implies source from believedly reliable s/w source; test; then deploy :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
John Machin wrote: Ant wrote: John Machin wrote: ... A little more is unfortunately not enough. The best advice you got was to use an existing e-mail address validator. We got bitten by this at the last place I worked - we were using a regex email validator (from Microsoft IIRC) ... Oh, sorry for the abbreviation. use implies source from believedly reliable s/w source; test; then deploy :-) I actually meant that we got bitten by using a regex validator, not by using an existing one. Though we did get bitten by an existing one, and it being from Microsoft we should have known better ;-) -- http://mail.python.org/mailman/listinfo/python-list
I need some help with a regexp please
Hi,My $0.02:re.compile('^\w+([\.-]?\w+)[EMAIL PROTECTED]([\.-]?\w+)*\.(\w{2}|(com|net|org|edu|intl|mil|gov|arpa|biz|aero|name|coop|info|pro|museum))$')I picked it up from the Net, and while it may be not perfect (you've got lots of reply's telling you why),it's good enough for me.Good luck,Sorin-- http://mail.python.org/mailman/listinfo/python-list
I need some help with a regexp please
Hi, I am trying to get a regexp to validate email addresses but can't get it quite right. The problem is I can't quite find the regexp to deal with ignoring the case [EMAIL PROTECTED], which is not valid. Here's my attempt, neither of my regexps work quite how I want: [code] import os import re s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not [EMAIL PROTECTED] partridge in a pear tree' r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+') #r = re.compile(r'[EMAIL PROTECTED]') addys = set() for a in r.findall(s): addys.add(a) for a in sorted(addys): print a [/code] This gives: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] -- shouldn't be here :( [EMAIL PROTECTED] Nearly there but no cigar :) I can't see the wood for the trees now :) Can anyone suggest a fix please? Thanks, Tony -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
codefire wrote: Hi, I am trying to get a regexp to validate email addresses but can't get it quite right. The problem is I can't quite find the regexp to deal with ignoring the case [EMAIL PROTECTED], which is not valid. Here's my attempt, neither of my regexps work quite how I want: [code] import os import re s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not [EMAIL PROTECTED] partridge in a pear tree' r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+') #r = re.compile(r'[EMAIL PROTECTED]') addys = set() for a in r.findall(s): addys.add(a) for a in sorted(addys): print a [/code] This gives: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] -- shouldn't be here :( [EMAIL PROTECTED] Nearly there but no cigar :) I can't see the wood for the trees now :) Can anyone suggest a fix please? Thanks, Tony '[EMAIL PROTECTED](\.\w+)*' Works for me, and SHOULD for you, but I haven't tested it all that much. Good luck. -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
On 2006-09-21, codefire [EMAIL PROTECTED] wrote: I am trying to get a regexp to validate email addresses but can't get it quite right. The problem is I can't quite find the regexp to deal with ignoring the case [EMAIL PROTECTED], which is not valid. Here's my attempt, neither of my regexps work quite how I want: I suggest a websearch for email address validators instead of writing of your own. Here's a hit that looks useful: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66439 -- Neil Cerutti Next Sunday Mrs. Vinson will be soloist for the morning service. The pastor will then speak on It's a Terrible Experience. --Church Bulletin Blooper -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
codefire wrote: Hi, I am trying to get a regexp to validate email addresses but can't get it quite right. The problem is I can't quite find the regexp to deal with ignoring the case [EMAIL PROTECTED], which is not valid. Here's my attempt, neither of my regexps work quite how I want: [code] import os import re s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not [EMAIL PROTECTED] partridge in a pear tree' r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+') #r = re.compile(r'[EMAIL PROTECTED]') addys = set() for a in r.findall(s): addys.add(a) for a in sorted(addys): print a [/code] This gives: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] -- shouldn't be here :( [EMAIL PROTECTED] Nearly there but no cigar :) I can't see the wood for the trees now :) Can anyone suggest a fix please? The problem is that your pattern doesn't start out by confirming that it's either at the start of a line or after whitespace. You could do this with a look-behind assertion if you wanted. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
Hi, thanks for the advice guys. Well took the kids swimming, watched some TV, read your hints and within a few minutes had this: r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+') This works for me. That is if you have an invalid email such as tony..bATblah.com it will reject it (note the double dots). Anyway, now know a little more about regexps :) Thanks again for the hints, Tony -- http://mail.python.org/mailman/listinfo/python-list
Re: I need some help with a regexp please
codefire wrote: Hi, thanks for the advice guys. Well took the kids swimming, watched some TV, read your hints and within a few minutes had this: r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+') This works for me. That is if you have an invalid email such as tony..bATblah.com it will reject it (note the double dots). Anyway, now know a little more about regexps :) A little more is unfortunately not enough. The best advice you got was to use an existing e-mail address validator. The definition of a valid e-mail address is complicated. You may care to check out Mastering Regular Expressions by Jeffery Friedl. In the first edition, at least (I haven't looked at the 2nd), he works through assembling a 4700+ byte regex for validating e-mail addresses. Yes, that's 4KB. It's the best advertisement for *not* using regexes for a task like that that I've ever seen. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Don't use regular expressions to validate email addresses (was: I need some help with a regexp please)
John Machin [EMAIL PROTECTED] writes: A little more is unfortunately not enough. The best advice you got was to use an existing e-mail address validator. The definition of a valid e-mail address is complicated. You may care to check out Mastering Regular Expressions by Jeffery Friedl. In the first edition, at least (I haven't looked at the 2nd), he works through assembling a 4700+ byte regex for validating e-mail addresses. Yes, that's 4KB. It's the best advertisement for *not* using regexes for a task like that that I've ever seen. The best advice I've seen when people ask How do I validate whether an email address is valid? was Try sending mail to it. It's both Pythonic, and truly the best way. If you actually want to confirm, don't try to validate it statically; *use* the email address, and check the result. Send an email to that address, and don't use it any further unless you get a reply saying yes, this is the right address to use from the recipient. The sending system's mail transport agent, not regular expressions, determines which part is the domain to send the mail to. The domain name system, not regular expressions, determines what domains are valid, and what host should receive mail for that domain. Most especially, the receiving mail system, not regular expressions, determines what local-parts are valid. -- \ I believe in making the world safe for our children, but not | `\our children's children, because I don't think children should | _o__) be having sex. -- Jack Handey | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list