Peng Yu <pengyu...@gmail.com> writes: > Hi, > > I would like to extract "a...@efg.hij.xyz". But it only shows ".hij".
Others have address this question. I'll answer a separate one: > Does anybody see what is wrong with it? Thanks. One thing that's wrong with it is that it is far too restrictive. > email_regex = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)') This excludes a great many email addresses that are valid. Please don't try to restrict a match for email addresses that will exclude actual email addresses. For an authoritative guide to matching email addresses, see RFC 3696 §3 <URL:https://tools.ietf.org/html/rfc3696#section-3>. A more correct match would boil down to: * Match any printable Unicode characters (not just ASCII). * Locate the *last* ‘@’ character. (An email address may contain more than one ‘@’ character; you should allow any printable ASCII character in the local part.) * Match the domain part as the text after the last ‘@’ character. Match the local part as anything before that character. Reject an address that has either of these empty. * Validate the domain by DNS request. Your program is not an authority for what domains are valid; the only authority for that is the DNS. * Don't validate the local part at all. Your program is not an authority for what local parts are accepted to the destination host; the only authority for that is the destination mail host. -- \ “Jealousy: The theory that some other fellow has just as little | `\ taste.” —Henry L. Mencken | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list