On Mon, Dec 18, 2017, at 02:01, Chris Angelico wrote: > Hmm, is that true? I was under the impression that the quoting rules > were impossible to match with a regex. Or maybe it's just that they're > impossible to match with a *standard* regex, but the extended > implementations (including Python's, possibly) are able to match them?
What's impossible to match with a regex are the comments permitted by RFC822 (which are delimited by balanced parentheses - AIUI perl can do it, python can't.) Which are, according to my argument, not part of the address. > Anyhow, it is FAR from simple; and also, for the purpose of "detect > email addresses in text documents", not desirable. Same as with URL > detection - it's better to have a handful of weird cases that don't > autolink correctly than to mis-detect any address that's at the end of > a sentence, for instance. For that purpose, it's better to ignore the > RFC and just craft a regex that matches *common* email address > formats. Email addresses don't, according to the formal spec, allow a dot at the end of the domain part. I was half-seriously proposing that as an extension (since DNS names *do*). -- https://mail.python.org/mailman/listinfo/python-list