So I'm trying to go through all of my UNPARSEABLE_RELAY hits and see what I can do. The question is: how much data is required to come out of the header to make it worth parsing? conversely, when should we just ignore a header?
For example: from (1.2.3.4) by host.example.com via smtp id 03d4_20d285fe_3c6f_11db_828f_0013725b2d50; Mon, 04 Sep 2006 19:43:12 -0400 from ([172.16.1.78]) by email2.codeworksonline.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Sep 2006 21:14:29 -0400 Judging by "with HTTP", ip and by is enough. (from [EMAIL PROTECTED]) by m06.lax.untd.com (jqueuemail) id LRVB3JAJ; Fri, 02 Jun 2006 08:15:21 PDT from CNNIMAIL12.CNN.COM by CNNIMAIL12.CNN.COM (LISTSERV-TCP/IP release 1.8d) with spool id 35469828 for [email protected]; Tue, 23 May 2006 11:01:27 -0400 from EXCL.hq.corp.pbs.org (mail1.hq.corp.pbs.org) by listserv.pbs.org (LSMTP for Windows NT v1.1b) with SMTP id <[EMAIL PROTECTED]>; Mon, 22 May 2006 9:43:25 -0400 from PRODWEB02LA by fireball.treehousei.com (Merak 8.5.0-2) with SMTP id FTM10106 for <[EMAIL PROTECTED]>; Tue, 27 Jun 2006 06:11:06 -0700 from mailer by www.ennmagazine.com with HTTP (Mail); Fri, 1 Sep 2006 10:50:01 -0400 Without an IP I think these are useless...? from FNCLISTSRV (10.6.147.53:1028) by listserv.foxnews.com (LSMTP for Windows NT v1.1b) with SMTP id <[EMAIL PROTECTED]>; Tue, 8 Aug 2006 13:45:26 -0400 This is actually parseable. I think it's helo, ip, by, and id. Thoughts? (happily, I was able to shrink the list of 3500 unparsed relays down to ~50 unique formats) -- Randomly Selected Tagline: "Communist revolutionaries taking over the server room and demanding all the computers in the building or they shoot the sysadmin." - Today's BOFH Excuse
pgpl8NF8PJsdq.pgp
Description: PGP signature
