https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6105





--- Comment #2 from Matt Kettler <mkettler...@verizon.net>  2009-04-29 19:43:22 
PST ---
Ok, after some off-list discussions (done offlist mostly because I was too
tired to decide if posting abusable information was a bad idea or not.. but
I've had time to think about it.. )

The existing algorithm tries to find the public IP closest to the originating
mail client, looking for the "source IP". I propose this algorithm is
fundamentally flawed when compared to the trust boundary, and the arguments
I've seen supporting it so far are readily show to have complimentary problems
that are worse, and are readily worked around.

I propose a few basic points of comparison between using the current algorithm,
and one based on the trust-border. (numbered for referencing)

1) Using the current algorithm "fixes" ISPs with multiple smarthosts in
different /16's, but it also breaks road warriors who get their internet
connectivity from hotels, internet cafe's, etc. 

2) An ISP, even google, only has a finite number of different /16's they
control. Thus, there is a finite number if different IP sets a gmail user will
be categorized as, and eventually they will repeat. This may expand over time,
but the expansion is slow. Repeating yourself is quite likely.

3) Road warriors expand their IP space rapidly, and the odds of repeating a /16
are pretty close to 1 in 2^16. ie: a close to random distribution across all
the /16s possible, minus a few that aren't used, reserved, etc. Repeating is
very unlikely.

4) admittedly there are more gmail users than road warriors, however, the gmail
problem can be readily fixed. (see below)

5) using the "source" is using data that is easily forged, as it appears in
untrusted headers. Arguing that the implications of forgery are small does not
change the fact that it's foolhardy to rely on readily spoofed information. We
may as well use from address only. Besides, there is a reason
whitelist_from_rcvd uses the trust boundary. Shouldn't the AWL follow suit?

6) situations like gmail are readily resolved by changing the AWL to not be
strictly IP based. If we change to the trust border, we can start using the
domain part of the reverse-dns where one exists (and fall back to IP where no
RDNS exists). This feature should reuse the registrar boundaries, 2tld, etc
from the URIBL code. (ie: attempt to work like whitelist_from_rcvd where
possible)

7) Situations like gmail can also be improved when SPF or DKIM is enabled. If
results are available the "source" part can be collapsed to a simple
"spf_pass", "spf_fail" etc.


Now, admittedly, this is more-or-less a heavy revamp of the AWL, possibly
suited to being a separate plugin designed to replace the AWL with a new
"AWL+". (would also be a good opportunity to introduce working expiry)

I'm willing to start learning perl to code this, but does anyone see holes in
any of my postulates above?


-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to