Re: I need some help with a regexp please

2006-09-26 Thread Frederic Rentsch
Dennis Lee Bieber wrote:
 On 25 Sep 2006 10:25:01 -0700, codefire [EMAIL PROTECTED]
 declaimed the following in comp.lang.python:

   
 Yes, I didn't make it clear in my original post - the purpose of the
 code was to learn something about regexps (I only started coding Python
 last week). In terms of learning a little more the example was
 successful. However, creating a full email validator is way beyond me -
 the rules are far too complex!! :)
 

   I've been doing small things in Python for over a decade now
 (starting with the Amiga port)...

   I still don't touch regular expressions... They may be fast, but to
 me they are just as much line noise as PERL... I can usually code a
 partial parser faster than try to figure out an RE.
   
If I may add another thought along the same line: regular expressions 
seem to tend towards an art form, or an intellectual game. Many 
discussions revolving around regular expressions convey the impression 
that the challenge being pursued is finding a magic formula much more 
than solving a problem. In addition there seems to exist some code of 
honor which dictates that the magic formula must consist of one single 
expression that does it all. I suspect that the complexity of one single 
expression grows somehow exponentially with the number of 
functionalities it has to perform and at some point enters a gray zone 
of impending conceptual intractability where the quest for the magic 
formula becomes particularly fascinating. I also suspect that some 
problems are impossible to solve with a single expression and that no 
test of intractability exists other than giving up after so many hours 
of trying.
With reference to the OP's question, what speaks against passing his 
texts through several simple expressions in succession? Speed of 
execution? Hardly. The speed penalty would not be perceptible. 
Conversely, in favor of multiple expressions speaks that they can be 
kept simple and that the performance of the entire set can be 
incrementally improved by adding another simple expression whenever an 
unexpected contingency occurs, as they may occur at any time with 
informal systems. One may not win a coding contest this way, but saving 
time isn't bad either, or is even better.

Frederic

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-26 Thread Fredrik Lundh
Frederic Rentsch wrote:

 If I may add another thought along the same line: regular expressions 
 seem to tend towards an art form, or an intellectual game. Many 
 discussions revolving around regular expressions convey the impression 
 that the challenge being pursued is finding a magic formula much more 
 than solving a problem. In addition there seems to exist some code of 
 honor which dictates that the magic formula must consist of one single 
 expression that does it all.

hear! hear!

for dense guys like myself, regular expressions work best if you use 
them as simple tokenizers, and they suck pretty badly if you're trying 
to use them as parsers.

and using a few RE:s per problem (or none at all) is a perfectly good 
way to get things done.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-26 Thread codefire
 for dense guys like myself, regular expressions work best if you use
 them as simple tokenizers, and they suck pretty badly if you're trying
 to use them as parsers.

:) Well, I'm with you on that one Fredrik! :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-26 Thread codefire

 I still don't touch regular expressions... They may be fast, but to
 me they are just as much line noise as PERL... I can usually code a
 partial parser faster than try to figure out an RE.

Yes, it seems to me that REs are a bit hit and miss - the only way to
tell if you've got a RE right is by testing exhaustively - but you
can never be sure They are fine for simple pattern matching though.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-25 Thread codefire
Yes, I didn't make it clear in my original post - the purpose of the
code was to learn something about regexps (I only started coding Python
last week). In terms of learning a little more the example was
successful. However, creating a full email validator is way beyond me -
the rules are far too complex!! :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-22 Thread Ant

John Machin wrote:
...
 A little more is unfortunately not enough. The best advice you got was
 to use an existing e-mail address validator.

We got bitten by this at the last place I worked - we were using a
regex email validator (from Microsoft IIRC), and we kept having
problems with specific email addresses from Ireland. There are stack of
Irish email addresses out there of the form paddy.o'[EMAIL PROTECTED] -
perfectly valid email address, but doesn't satisfy the usual naive
versions of regex validators.

We use an even worse validator at my current job, but the feeling the
management have (not one I agree with) is that unusual email addresses,
whilst perhaps valid, are uncommon enough not to worry about

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Don't use regular expressions to validate email addresses (was: I need some help with a regexp please)

2006-09-22 Thread Ant

Ben Finney wrote:
...
 The best advice I've seen when people ask How do I validate whether
 an email address is valid? was Try sending mail to it.

There are advantages to the regex method. It is faster than sending an
email and getting a positive or negative return code. The delay may not
be acceptable in many applications. Secondly, the false negatives found
by a reasonable regex will be few compared to the number you'd get if
the smtp server went down, or a remote relay was having problems
delivering the message etc etc.

From a business point of view, it is probably more important to reduce
the number of false negatives than to reduce the number of false
positives - every false negative is a potential loss of a customer.
False positives? Who cares really as long as they are paying ;-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-22 Thread John Machin

Ant wrote:
 John Machin wrote:
 ...
  A little more is unfortunately not enough. The best advice you got was
  to use an existing e-mail address validator.

 We got bitten by this at the last place I worked - we were using a
 regex email validator (from Microsoft IIRC), and we kept having
 problems with specific email addresses from Ireland. There are stack of
 Irish email addresses out there of the form paddy.o'[EMAIL PROTECTED] -
 perfectly valid email address, but doesn't satisfy the usual naive
 versions of regex validators.

 We use an even worse validator at my current job, but the feeling the
 management have (not one I agree with) is that unusual email addresses,
 whilst perhaps valid, are uncommon enough not to worry about

Oh, sorry for the abbreviation. use implies source from believedly
reliable s/w source; test; then deploy :-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-22 Thread Ant

John Machin wrote:
 Ant wrote:
  John Machin wrote:
  ...
   A little more is unfortunately not enough. The best advice you got was
   to use an existing e-mail address validator.
 
  We got bitten by this at the last place I worked - we were using a
  regex email validator (from Microsoft IIRC)
...
 Oh, sorry for the abbreviation. use implies source from believedly
 reliable s/w source; test; then deploy :-)

I actually meant that we got bitten by using a regex validator, not by
using an existing one. Though we did get bitten by an existing one, and
it being from Microsoft we should have known better ;-)

-- 
http://mail.python.org/mailman/listinfo/python-list


I need some help with a regexp please

2006-09-22 Thread Sorin Schwimmer
Hi,My $0.02:re.compile('^\w+([\.-]?\w+)[EMAIL PROTECTED]([\.-]?\w+)*\.(\w{2}|(com|net|org|edu|intl|mil|gov|arpa|biz|aero|name|coop|info|pro|museum))$')I picked it up from the Net, and while it may be not perfect (you've got lots of reply's telling you why),it's good enough for me.Good luck,Sorin-- 
http://mail.python.org/mailman/listinfo/python-list

I need some help with a regexp please

2006-09-21 Thread codefire
Hi,

I am trying to get a regexp to validate email addresses but can't get
it quite right. The problem is I can't quite find the regexp to deal
with ignoring the case [EMAIL PROTECTED], which is not valid. Here's
my attempt, neither of my regexps work quite how I want:

[code]
import os
import re

s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not
[EMAIL PROTECTED] partridge in a pear tree'
r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+')
#r = re.compile(r'[EMAIL PROTECTED]')

addys = set()
for a in r.findall(s):
addys.add(a)

for a in sorted(addys):
print a
[/code]

This gives:
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]   -- shouldn't be here :(
[EMAIL PROTECTED]

Nearly there but no cigar :)

I can't see the wood for the trees now :) Can anyone suggest a fix
please?

Thanks,
Tony

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-21 Thread [EMAIL PROTECTED]

codefire wrote:
 Hi,

 I am trying to get a regexp to validate email addresses but can't get
 it quite right. The problem is I can't quite find the regexp to deal
 with ignoring the case [EMAIL PROTECTED], which is not valid. Here's
 my attempt, neither of my regexps work quite how I want:

 [code]
 import os
 import re

 s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not
 [EMAIL PROTECTED] partridge in a pear tree'
 r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+')
 #r = re.compile(r'[EMAIL PROTECTED]')

 addys = set()
 for a in r.findall(s):
 addys.add(a)

 for a in sorted(addys):
 print a
 [/code]

 This gives:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]   -- shouldn't be here :(
 [EMAIL PROTECTED]

 Nearly there but no cigar :)

 I can't see the wood for the trees now :) Can anyone suggest a fix
 please?

 Thanks,
 Tony

'[EMAIL PROTECTED](\.\w+)*'
Works for me, and SHOULD for you, but I haven't tested it all that
much.
Good luck.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-21 Thread Neil Cerutti
On 2006-09-21, codefire [EMAIL PROTECTED] wrote:
 I am trying to get a regexp to validate email addresses but
 can't get it quite right. The problem is I can't quite find the
 regexp to deal with ignoring the case [EMAIL PROTECTED],
 which is not valid. Here's my attempt, neither of my regexps
 work quite how I want:

I suggest a websearch for email address validators instead of
writing of your own.

Here's a hit that looks useful:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66439

-- 
Neil Cerutti
Next Sunday Mrs. Vinson will be soloist for the morning service.
The pastor will then speak on It's a Terrible Experience.
--Church Bulletin Blooper 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-21 Thread Steve Holden
codefire wrote:
 Hi,
 
 I am trying to get a regexp to validate email addresses but can't get
 it quite right. The problem is I can't quite find the regexp to deal
 with ignoring the case [EMAIL PROTECTED], which is not valid. Here's
 my attempt, neither of my regexps work quite how I want:
 
 [code]
 import os
 import re
 
 s = 'Hi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] @@not
 [EMAIL PROTECTED] partridge in a pear tree'
 r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+')
 #r = re.compile(r'[EMAIL PROTECTED]')
 
 addys = set()
 for a in r.findall(s):
 addys.add(a)
 
 for a in sorted(addys):
 print a
 [/code]
 
 This gives:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]   -- shouldn't be here :(
 [EMAIL PROTECTED]
 
 Nearly there but no cigar :)
 
 I can't see the wood for the trees now :) Can anyone suggest a fix
 please?
 
The problem is that your pattern doesn't start out by confirming that 
it's either at the start of a line or after whitespace. You could do 
this with a look-behind assertion if you wanted.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-21 Thread codefire
Hi,

thanks for the advice guys.

Well took the kids swimming, watched some TV, read your hints and
within a few minutes had this:

r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+')

This works for me. That is if you have an invalid email such as
tony..bATblah.com it will reject it (note the double dots).

Anyway, now know a little more about regexps :)

Thanks again for the hints,

Tony

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I need some help with a regexp please

2006-09-21 Thread John Machin
codefire wrote:
 Hi,

 thanks for the advice guys.

 Well took the kids swimming, watched some TV, read your hints and
 within a few minutes had this:

 r = re.compile(r'[EMAIL PROTECTED]@\s]+\.\w+')

 This works for me. That is if you have an invalid email such as
 tony..bATblah.com it will reject it (note the double dots).

 Anyway, now know a little more about regexps :)

A little more is unfortunately not enough. The best advice you got was
to use an existing e-mail address validator. The definition of a valid
e-mail address is complicated. You may care to check out Mastering
Regular Expressions by Jeffery Friedl. In the first edition, at least
(I haven't looked at the 2nd), he works through assembling a 4700+ byte
regex for validating e-mail addresses. Yes, that's 4KB.  It's the best
advertisement for *not* using regexes for a task like that that I've
ever seen.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Don't use regular expressions to validate email addresses (was: I need some help with a regexp please)

2006-09-21 Thread Ben Finney
John Machin [EMAIL PROTECTED] writes:

 A little more is unfortunately not enough. The best advice you got was
 to use an existing e-mail address validator. The definition of a valid
 e-mail address is complicated. You may care to check out Mastering
 Regular Expressions by Jeffery Friedl. In the first edition, at least
 (I haven't looked at the 2nd), he works through assembling a 4700+ byte
 regex for validating e-mail addresses. Yes, that's 4KB.  It's the best
 advertisement for *not* using regexes for a task like that that I've
 ever seen.

The best advice I've seen when people ask How do I validate whether
an email address is valid? was Try sending mail to it.

It's both Pythonic, and truly the best way. If you actually want to
confirm, don't try to validate it statically; *use* the email address,
and check the result.  Send an email to that address, and don't use it
any further unless you get a reply saying yes, this is the right
address to use from the recipient.

The sending system's mail transport agent, not regular expressions,
determines which part is the domain to send the mail to.

The domain name system, not regular expressions, determines what
domains are valid, and what host should receive mail for that domain.

Most especially, the receiving mail system, not regular expressions,
determines what local-parts are valid.

-- 
 \   I believe in making the world safe for our children, but not |
  `\our children's children, because I don't think children should |
_o__)  be having sex.  -- Jack Handey |
Ben Finney

-- 
http://mail.python.org/mailman/listinfo/python-list