Attn: sed(1) regular expression gurus

D J Hawkey Jr Mon, 14 Jul 2003 07:09:19 -0700

Hi all.

I'm getting really frustrated by a seemingly simple problem. I'm doing
this under FreeBSD 4.5.


Given these portions of an e-mail's multi-line Received header as tests:

  by some.host.at.a.com (Postfix) with ESMTP id 3A4E07B03
  by some.host.at.a.com (8.11.6) ESMTP;
  by some.host.at.a.different.com (8.11.6p2/8.11.6) ESMTP;
  by some.host.at.another.com ([123.4.56.789]) id 3A4E07B03
  by some.host.at.yet.another.com (123.4.56.789) id 3A4E07B03

I want to isolate the addresses (one for the 1st through 3rd, two for
the 4th and 5th). Here's the sed(1) command I'm playing with:

  echo "by nospam.mc.mpls.visi.com (Postfix) with ESMTP id 3A4E07B03" \
      |sed -E \
        -e "s/by[[:space:]]+//" \
        -e "s/(\((\[?([0-9]{1,3}\.){3}[0-9]{1,3}\]?){0}\)|id|with|E?SMTP).*//"

In all cases, the parenthetical word is returned, when only the last
two should return the parenthetical word. The idea behind the first
branch of the second sed(1) command is to match anything that isn't a
"digits.digits.digits.digits" pattern. I've tried simpler expressions
like "\(\[?[^0-9.]+\]?\)", but it fails on the third example.

What the devil am I doing wrong?? Am I exercizing known bugs in GNU's
sed(1)? Can anyone dream up a different solution - please, no Perl, but
awk(1) is fine.

Thanks,
Dave

-- 
  ______________________                         ______________________
  \__________________   \    D. J. HAWKEY JR.   /   __________________/
     \________________/\     [EMAIL PROTECTED]    /\________________/
                      http://www.visi.com/~hawkeyd/

_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Attn: sed(1) regular expression gurus

Reply via email to