Your message dated Thu, 23 Apr 2009 22:12:15 +0000
with message-id <[email protected]>
and subject line listarchives: Please remove automatically unreadable spam 
mails from i18n/l10n lists
has caused the Debian Bug report #344886,
regarding listarchives: Please remove automatically unreadable spam mails from 
i18n/l10n lists 
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
344886: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=344886
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: listarchives
Severity: wishlist

(Note: CC'ing listmasters as this might make sense to be applied as a
global rule for mailing lists from now on too)

I have been reporting for a while e-mail in the mailing list archives which
is spam sent in a foreign language (to the list, that is, russian, korean,
chinese messages sent to the spanish i18n/l10n list). I think it would be
good if some mailing lists were filtered automatically of this spam (if
possible when receiving mail but, at least, in the list archives since the
rule will not apply to old messages).

Reviewing the lists at http://lists.debian.org/i18n.html I see that
most of them should only accept charsets that belong to national encodings,
in the case of european languages, those encodings do *not* include any
of these:
big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso
-2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|windows-1251

The european language mailing lists are:
- debian-l10n-catalan   - debian-l10n-czech
- debian-l10n-danish    - debian-l10n-dutch
- debian-l10n-english   - debian-l10n-esperanto
- debian-l10n-finnish   - debian-l10n-french
- debian-l10n-german    - debian-l10n-greek
- debian-l10n-hungarian - debian-l10n-italian 
- debian-l10n-polish    - debian-l10n-portuguese
- debian-l10n-romanian  - debian-l10n-spanish
- debian-laespiral      - debian-user-catalan
- debian-user-danish    - debian-user-de
- debian-user-french    - debian-user-polish
- debian-user-portuguese - debian-user-spanish
- debian-user-swedish   - debian-user-german

This rule, reversed, could also be applied to other lists (Japanese, Chinese)
in order to remove e-mails that are *not* encoded in their language encoding.
That would need to be done in a case by case basis, though, since those lists
might contain legitimate mails in different encodings. I have not
investigated, though, but it might be useful to remove Korean-encoded mail
from the Russian mailing lists and vice-versa.

Attached is the procmail rule that I use to filter out messages sent in
encodings I can't read (and thus, are junk to me) from the mailing lists I'm
subscribed to. Please apply this to the lists above (and consider definiding
new procmail rules for the non european mailing lists).

Thanks

Javier
# Unreadable charsets
UNREADABLE='[^?"]*(big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso-2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|indows-1251)'
:0
* 1^0 $ ^Subject:.*=\?($UNREADABLE)
* 1^0 $ ^Content-Type:.*charset="?$UNREADABLE
$JUNKFOLDER
:0
* ^Content-Type:.*multipart
* B ?? $ ^Content-Type:.*^?.*charset="?$UNREADABLE
$JUNKFOLDER

Attachment: signature.asc
Description: Digital signature


--- End Message ---
--- Begin Message ---
Hello,

This is a Wontfix, but we now established a method to nominate, review
and remove 'certified' spam from the archive.

For details look here:
http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam

Yours,
        Cord, Debian Listmaster of the day
-- 
http://lists.debian.org


--- End Message ---

Reply via email to