Thanks for the feedback! I'm carbon copying this response to
[EMAIL PROTECTED] (www.mail-archive.com's discussion list, consisting
mainly of me) so it gets archived, ok? I have become somthing of an
archivist lately, and this discussion is relevant for others who use
the service.

>You mentioned you've seen the bots drudging through your
>site so you must know how they are getting addresses of the lists.

Yes, robots do trudge through the site; however only the legitimate
ones are easily detectable (say, Altavista and friends) since they
follow propoer protocols, such as checking robots.txt before
proceeding.

Spambots are generally not so well behaved, and are thus harder to
detect. The ones that get through my permission screens are likely to
be especially sneaky (or possibly, human). 

I only find evidence of problems when I notice the same spam hits
multiple lists. And I don't really know where they are getting the
list addresses from - they aren't recorded anywhere except inside
URLs.

I wonder if the list addresses were harvested many moons ago, before
all the precautions were put in place. I really don't know.

>What you might think of doing, if you haven't already thought of this, is
>to make a host name to list 'ID' mapping, and never use the originating host
>on your webserver. Use it's 'number' instead. The number would be useless
>to spammers.

Ok, let me think on it a bit, and then I'll probably go do it. Yuck. I
guess I can switch to rot13 encrypted machine names or something. I
really didn't think spambots would be smart enough to pull list names
out of URLs, and I thought humans would be way too lazy.

>I haven't noticed if you are stripping the email addresses sometimes attached
>to the bottom of messages containing subscribe and unsubscibe instruction 
>footers.

That's difficult from a computational load standpoint (I have high loads
and would like to avoid post processing if possible) and from an
automation standpoint. What if I accidently delete the wrong stuff?
What if people don't want that stuff deleted?

Do you find that gina-users-request is getting spammed a lot? Those
lines at the bottom almost never have the list address itself, just
administrative lists, so shouldn't result in (direct) spam.

>Another idea would be to set up a way for the list admins to delete spam
>messages if they ever get out to the archives.


At that point much of the damage is done. Ideally the issue of spam
shielding should be addressed before the lists hit the archives.

Also, this is a lot of work, which seems to scale with the total
number of lists archived. Since I don't control lists, there's no
obvious way to know who to give deletion priviliges to.

I'm not trying to sound negative - these are good suggestions; however
I do need to be careful to limit the amount of work required to
administer the archives, especially since they could potentially grow
to include many more lists.

>I think I might work out a way for me to moderate my lists to keep their 
>content free of spam, without my list users needing to know about it.

Hmm, or potentially setting things so only subscribers + people on an
authorization list could post. Some of the lists who use 
mail-archive.com do that. (xmame, mhonarc)

>Another idea would be to start compiling, or adding to an existing spammers
>database, that your system could go through and delete messages from known
>spam addresses.

I've been thinking about this as well. Do a web search on "Realtime
Blackhole List" and/or see http://www.jab.org/cheb/
Again, it's best to stop spam before it gets relayed to the list.

>Let me know if I can help with anything.

What are your comments on Chebyshev? Is it relevant? Should I (or
someone else) set it up as a service? I don't know how effective it
might be.

Jeff

Reply via email to