Hi,

I've written milter with the extra facility to gather the TCP SYN and
the SYN's initial ACK from MTA connections to port 25 with a view to
feeding this extra info into an anti-spam learning system.

From initial observation, I think it has an outstanding chance of
great success - for example - I observed that 100% of the 4000 emails
I received this weekend with the following TCP flags and sizings:-
    win 64240 <mss 1460,nop,nop,sackOK
were spam.  "Common sense" tells us that some things are never
supposed to be sending emails (eg: hacked routers), other things are
very unlikely to be running a legitimate MTA (eg: a Windows XP PC) and
others are highly likely to be legit (eg: a RedHat Enterprise server)

Is anyone interested in connecting this passive info-feed up to
spamassassin?

I presently have two bits of perl code - the passive connection cache
daemon, and the milter that - upon connection - gets the SYN/ACK from
the daemon.  (My deamon also records the final FIN as well, although
this would only be useful to sites that filter mail after having
accepted it already).  I also have a growing "database" of passive
data that can be fed to a learning system, with each email already
having been classified as "definite spam", "very very unlikely to be
spam", and "unknown, but very likely to be spam"

Here's some examples:

1. Definite spam:  (caught by "Brightmail", sent not by our customers)

03/06 01:37.27 30855 pm PaSv refused-spam:Yes <> => [EMAIL PROTECTED] 
HOST=smtp01.bluespree.com [67.91.146.227:54790] HELO=smtp01.BlueSpree.com 
TLS=//// Cache(67.91.146.227.54790)=ASYN:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 
62: 67.91.146.227.54790 > xx.xx.xx.xx.25: S [tcp sum ok] 928359215:928359215(0) 
win 16384 <mss 1460,nop,nop,sackOK> (DF) (ttl 115, id 51201, len 48)        
ACK:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 60: 67.91.146.227.54790 > 
xx.xx.xx.xx.25: . [tcp sum ok] 928359216:928359216(0) ack 587049612 win 17520 
(DF) (ttl 115, id 51209, len 40)
03/06 01:38.59 30855 pm PaSv refused-spam:Blocked Sender <[EMAIL PROTECTED]> => 
[EMAIL PROTECTED] HOST=[201.67.216.25] [201.67.216.25:3287] HELO=bluemail.ch 
TLS=//// Cache(201.67.216.25.3287)=ASYN:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 62: 
201.67.216.25.3287 > xx.xx.xx.xx.25: S [tcp sum ok] 3241967323:3241967323(0) 
win 65535 <mss 1452,nop,nop,sackOK> (DF) (ttl 116, id 21335, len 48) 
ACK:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 60: 201.67.216.25.3287 > 
xx.xx.xx.xx.25: . [tcp sum ok] 3241967324:3241967324(0) ack 686135845 win 65535 
(DF) (ttl 116, id 21355, len 40)
03/06 01:39.20 30855 pm PaSv refused-spam:Blocked Sender <[EMAIL PROTECTED]> => 
[EMAIL PROTECTED] HOST=142-217-112-233.telebecinternet.net 
[142.217.112.233:57953] HELO=knuddlteddy.de TLS=//// 
Cache(142.217.112.233.57953)=ASYN:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 62: 
142.217.112.233.57953 > xx.xx.xx.xx.25: S [tcp sum ok] 902385560:902385560(0) 
win 64240 <mss 1460,nop,nop,sackOK> (DF) (ttl 119, id 44063, len 48)   
ACK:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 60: 142.217.112.233.57953 > 
xx.xx.xx.xx.25: . [tcp sum ok] 902385561:902385561(0) ack 696775812 win 64240 
(DF) (ttl 119, id 44069, len 40)
03/06 01:39.08 30855 pm PaSv refused-spam:Yes <[EMAIL PROTECTED]> => [EMAIL 
PROTECTED] HOST=[87.69.103.68] [87.69.103.68:4811] HELO=xx.xx.xx.xx TLS=//// 
Cache(87.69.103.68.4811)=ASYN:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 62: 
87.69.103.68.4811 > xx.xx.xx.xx.25: S [tcp sum ok] 472097603:472097603(0) win 
65535 <mss 1360,nop,nop,sackOK> (DF) (ttl 110, id 53309, len 48)     
ACK:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 60: 87.69.103.68.4811 > xx.xx.xx.xx.25: 
. [tcp sum ok] 472097604:472097604(0) ack 695762229 win 65535 (DF) (ttl 110, id 
53341, len 40)

2. Very probably spam:  (not caught by "Brightmail", but also not a
                         customer with permission to use our mail
                         system)

03/06 01:40.46 30855 pm PaSv noncust:nonspam [EMAIL PROTECTED] => [EMAIL 
PROTECTED] HOST=maildana.danareksa.com [202.158.10.99:19584] 
HELO=mail-gw.danareksa.com TLS=//// 
Cache(202.158.10.99.19584)=SSYN:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 78: 
202.158.10.99.19584 > xx.xx.xx.xx.25: S [tcp sum ok] 212145248:212145248(0) win 
16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1508937671 0> 
(DF) (ttl 115, id 1987, len 64)
03/06 01:46.21 30855 pm PaSv noncust:nonspam [EMAIL PROTECTED] => [EMAIL 
PROTECTED] HOST=mlnyb902er.ml.com [199.43.54.100:56981] HELO=mlnyb902er.ml.com 
TLS=//// Cache(199.43.54.100.56981)=SSYN:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 
74: 199.43.54.100.56981 > xx.xx.xx.xx.25: S [tcp sum ok] 672545219:672545219(0) 
win 5840 <mss 1460,sackOK,timestamp 1375514422 0,nop,wscale 2> (DF) (ttl 50, id 
38645, len 60)

3. Very unlikely to be spam:  (Our customers, or, DSN receipts
                               relating to email we originated
                               earlier, neither of which caught by
                               Brightmail)

03/06 01:46.52 30855 pm PaSv cust:nonspam [EMAIL PROTECTED] => [EMAIL 
PROTECTED] HOST=list.opisnet.com [198.6.95.10:13985] HELO=ucgsmtpfw1.ucg.com 
TLS=//// Cache(198.6.95.10.13985)=ASYN:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 62: 
198.6.95.10.13985 > xx.xx.xx.xx.25: S [tcp sum ok] 4066464733:4066464733(0) win 
16384 <mss 1460,nop,nop,sackOK> (DF) (ttl 116, id 5604, len 48)      
ACK:0:17:cb:19:f2:b9 0:8:2:a0:0:da 0800 60: 198.6.95.10.13985 > xx.xx.xx.xx.25: 
. [tcp sum ok] 4066464734:4066464734(0) ack 1175379631 win 17520 (DF) (ttl 116, 
id 5617, len 40)
03/06 01:47.42 30855 pm PaSv cust:nonspam [EMAIL PROTECTED] => [EMAIL 
PROTECTED] HOST=sccrmhc15.comcast.net [204.127.200.85:46523] 
HELO=sccrmhc15.comcast.net TLS=//// 
Cache(204.127.200.85.46523)=SSYN:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 78: 
204.127.200.85.46523 > xx.xx.xx.xx.25: S [tcp sum ok] 2091700940:2091700940(0) 
win 32850 <nop,wscale 1,nop,nop,timestamp 709867016 0,nop,nop,sackOK,mss 1460> 
(DF) (ttl 53, id 6350, len 64)
03/06 01:47.53 30855 pm PaSv cust:nonspam [EMAIL PROTECTED] => [EMAIL 
PROTECTED] HOST=rwcrmhc15.comcast.net [204.127.192.85:36708] 
HELO=rwcrmhc15.comcast.net TLS=//// 
Cache(204.127.192.85.36708)=SSYN:0:14:f6:dc:b0:c0 0:8:2:a0:0:da 0800 78: 
204.127.192.85.36708 > xx.xx.xx.xx.25: S [tcp sum ok] 2924519936:2924519936(0) 
win 32850 <nop,wscale 1,nop,nop,timestamp 407323383 0,nop,nop,sackOK,mss 1460> 
(DF) (ttl 55, id 39701, len 64)


Hopefully nothing wrapped those long lines above!!

The reason I'm posting this here is because I know Email, and I know
TCP/IP, but I don't know neural-networks or bayesian tech...

Let me know if you want a copy of the code or data to work on.  It's
in perl, and runs on any Linux machine (or Unix with some small
mods)

(-; Chris.

Reply via email to