Re: French rules

2011-12-09 Thread John GALLET

On Fri, 9 Dec 2011, LEVEAU Stanislas wrote:

Hi

I am looking for French rules with sa-update?


I wrote a few some years/months ago and since the feedback was "this is 
too aggressive" (well, "it works for me"(tm)...) I did not go any further.


I do not know where the "official" archives of this list are supposed to 
be, here is one:


http://comments.gmane.org/gmane.mail.spam.spamassassin.general/110439

and the entry point to the rules (guess what ? It's in French...) 
http://www.saphirtech.com/spamassassin.html


I have of course modified my rules since, if anyone is interested I can 
update a "v2 version", just ask.


Sincerely,
John GALLET



Re: remmonded max children settings

2010-05-22 Thread John GALLET

Hi,

FWIW: the way I solved it was limiting the number of concurrently incoming 
spam because my box serves only few different domains, so I limited the 
number of connections from the same smtp client to 5 using iptables (the 
connn-limit module). This might or not be possible for others.


This stopped dead "spam outbursts": your 8000 mails per day are NOT 
received in a linear way, but everytime a spammer sends you a "batch" and 
you just can not keep up: who would drink from a fire hose ?



HTH

JGA






Re: sa-learn process overhwelming the server

2009-05-06 Thread John GALLET

Hi,

processes. It has even, on occasion, necessitated a reboot when i could not 
get the system to kill the process. I've taken to trying to scan it daily and 
manually delete the spam, but that's not always possible.


This hint might be totally wrong, but last time I saw such a behavior it 
was linked to the process /usr/libexec/gam_server (a file alteration 
monitor, used by fail2ban for example) that was (uselessly) triggered by 
sa-learn. I just configured gamin so it would ignore the user data 
partition and the heavy loads disappeared.


HTH
JG


Re: RFC's suck

2009-04-01 Thread John GALLET

Hi,

[repost from yesterday, I was not using the correct From address for this 
list...]


Yes, it means that every Received: header in an email is valid with a 
valid IP, valid configuration (whatever that is deemed to be), and valid 
DNS. Only servers that were correctly classified as mailservers would 
even be able to be verified. Mailservers that were spam sources would be 
easily identified and blacklisted across the board. Or something.


I am a bit lost here. Are you saying that right now the *main* problem 
with spam is "source spoofing" and that just by having a strict format for 
emails in the protocole we would turn the whole spam fighting industry 
into a single huge database of "known spammers" ?


If it were true, I think it still raises a few issues. First, not all 
providers agree to implement SPF currently. One of the reasons being that 
I am [EDIT: usually !!] using a totally legitimate @wanadoo.fr address but 
I am not currently connected through their network. So basically all 
providers would have to issue SSL certificates for all their clients that 
could be of course stolen by malware etc. and become "legit". What I mean 
is that the very first step of email sending between my favorite mail 
reader and the SMTP/IMAP server would still be a weak point (Like we say 
in French "l'enemi, c'est l'utilisateur" i.e. "the end-used is the 
enemy/the source of all evil"). And I am not even talking about all the 
email domains that are not ISPs such as hotmail or yahoo and the like: you 
can secure all the way from their server to yours, it will never prevent 
the "garbage in - garbage out" approach.


Second, I am not aware of any lawsuits yet against RBLs but I am quite 
surprised no "official spammer" has already done that, or tried direct 
attacks targeted at the RBLs servers: they have enough zombies to spam, so 
they could.


Furthermore, let me be the devil's advocate for a second: would not you 
agree that many a rule in SA can actually catch spam because they are RFC 
compliant but stupid enough to add fancy headers or fancy header 
formatting ?


> Putting this on a distinct port seems more a marketing thing. Why not 
> add it as a capability in a normal SMTP server?


Because the idea is to be able to simply retire the current SMTP and 
that will be a lot simpler if the new service is on a new port.


I would agree to that. Http already does (i.e. port 8080 vs port 80), it 
would facilitate migration.


A secure verifiable delivery chain from server to server would almost 
completely eliminate the need for SA.


I can not agree to that. The point of entry has to be secured and I am 
afraid it will be a pain to do so.


And I'm not saying it would be easy, or happen over-night.  I figure if 
people started working on an RFC right now we might see the end of the 
current SMTP in 15-20 years unless there was a huge push in which case 
it could maybe happen in 10-15 years.


Might be. We have been talking over and over about IPV6 for about 15 years 
now, and currently it only incurs problems between compliant and 
non-compliant equipmentswith zero gain.


I'm not saying it would ELIMINATE Spam, but it would certainly reduce it 
to a manageable level.


Having an authenticated chain can only help if it is not broken or if we 
can detect it was broken, otherwise it will have the reverse effect of 
spammers injecting massive spam into "trusted" network chains that can not 
be banned for fear of hitting legit users.


Nothing we're doing now is reducing it at all, the amount of spam has 
been increasing steadily every year since the very first Green Card 
posting to USENET.


Amen to that.

To come back (a little) to the original post, IMHO we can not and should 
not do without specs i.e. RFCs. THe existing ones are not that bad, I sent 
my firts emails back in 1992 with my mom's address, so these RFCs have 
made the world communicate (for better and worse) for 20 years. Back at 
the time, the bandwith was so low and email access so controlled (add to 
that a tiny bit of optimism about the kind human nature) that spam was not 
an issue. A new RFC can be needed, but I really can not believe its main 
improvement would be protocole formatting...


HTH
JG



Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John GALLET

Re,


Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and
FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't
read this mail in html, click here).


It might be worth collecting more ham that includes any such common
text -- or even _generating_ mails along those lines (just edit the
message body to include the text you want the ruleset to avoid. ;)


Well, that's the whole point: can we conclude that an email with an 
unsubcribe link tends to be a spam more often than a ham ? I consider so, 
but with a low score. Can we conclude that an email citing the French Law 
"informatique et libertés" is a spam ? I would say "100% except government 
sponsored mailing lists that may feel obliged to do so", so I added a 
higher score. Now it might perfectly be faulty logic, I do not have any 
experience in spam fighting.



I also adapted this one (paths of course, but also forced "mbox" format,
"detect" spit out zero results)

ah.  forgot to mention: detect only treats files that end in ".mbox" as
mboxes. ;)


:-) ok, well anyway it was quite easy to find out since it worked well 
when forcing and not at all in automatic.



Thanks for trying it out!


Well, thanks for writing it. I think its main weak point for French and 
other accented languages is handling the different encodings for a same 
char with an accent, some kind of "synonyms" list. The same letter, say "a 
with an accent", can be misspelled with a plain "a", encoded in various 
charsets (latin, utf-8) to a "normal" à, or html encoded agrave (I left & 
and ; out). I do not know if it is possible at all, it might complicate 
things *a lot*.


a++;
JG


Re: French advance fee fraud ruleset

2008-06-24 Thread John GALLET
In a similar vein to the "Nigerian" advance fee fraud, here's a ruleset for 
French-language scams, often originating from Côte d'Ivoire.

http://www.tradoc.fr/spamassassin/fraude_fr.cf
All comments welcome.


Thanks, some of these still getting (rarely, but still so) through.

cat fraude_fr.cf >>$HOME/.spamassassin/user_prefs
and I will keep you posted.

They look good (from memory, I have no sample).

Just a quick question about:

ifplugin Mail::SpamAssassin::Plugin::ReplaceTags
replace_tag AGRAVE  [a\xC0\xE0]

What happens with the agrave htmlentity ? I mean if the received spam is 
htmlentity encoded, or mixes utf-8 accents and ascii-htmlentity ?


JG

Philosophy for opt-in (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John GALLET

Hi,

If these are hit rates with a very minimal daily corpus, don't know if the 
present ruleset is ready for production unless you have 0 tolerance for any 
bulk, period


I'm afraid I must agree. I don't have a confirmed and sorted corpus per se, 
but after a single night's live testing with very low scores I can confirm 
that, as I suspected, many of these rules hit genuine opt-in newsletters and 
even things like ebay notifications in French.


Thanks for the feedback. I do not have any ebay subscriptors in my users, 
except one power-seller who has ebay thingies in whitelist.


I will however keep the ruleset live for a while, to see whether the 
online meds and onling gambling rules actually hit anything.


They should, they do on my machines. But actually, they are only useful 
for a "new" spam that has not been caught yet by RBL. When I wrote them, 
it was because spam *was* getting through, now they just push towards 
"almost-probably-spam". Another note is that much of this particular spam 
is auto and badly translated (much "pidgin-French" if I can say so).


My personal tolerance for bulk mail is pretty low, and in a way I'd love to 
use rules like these, with just a bit of fine tuning - the rules do also hit 
a fair bit of French spam. But unfortunately my users actually want to 
receive their newsletters and even complain if it ends up in their spam 
folder.


I think I have a newbye simple problem of philosophy/strategy: my 
approach, for what it's worth, was that I flag anything that contains some 
unsubscribe links and French law reminders because anyway all the ones I 
receive are spam, and I add the opt-in mailing/newsletter I receive to 
whitelist_from in user_prefs, i.e. I kill everything except those 
explicitly allowed.


If that is not the correct approach, I can garantee you the current way 
the rules are written is bad (too harsh), and I need strategy advice on 
how to manage opt-in lists.


John



Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET

Re,

I excluded the last two rules from my masscheck to avoid FPs as these 
ESPs/X-Mailers are definitely grey, "import rcpt list and blast" sort of ESPs 
not black for global use.


If you can point me to some more information on how to do that, on-list or 
off-list, I am interested. I am new to this whole business.


In fact I was forced to look at X-Mailer and other strange headers for 
French spam that was still getting through with no real easy keywords, and 
these guys often ad the good idea to have developped their own "software" 
and be proud of it.


#counts   FR_SPAMISLEGAL   8s/2h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_SPAMISLEGAL_2 5s/2h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_NOTSPAM   0s/0h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_PAYLESSTAXES  0s/0h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_REALESTATE_INVEST 0s/0h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_ONLINEGAMBLING0s/0h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_ONLINEMEDS0s/0h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_REASON_SUBSCRIBE  1s/1h of 3859 corpus (1166s/2693h AXB-MC1) 
06/23/08
#counts   FR_HOWTOUNSUBSCRIBE  7s/16h of 3859 corpus (1166s/2693h 
AXB-MC1) 06/23/08


If these are hit rates with a very minimal daily corpus, don't know if the 
present ruleset is ready for production unless you have 0 tolerance for any 
bulk, period


I do subscribe to various mailing lists, and none of them seemed compelled 
to remind me how to unsubscribe, even less to state me the law about spam.


Even the official government "conseil des ministres" (sum up of the 
daily/weekly/whatever government meeting) does not state the "loi 
informatique et libertés" anymore (but they do use a company I am getting 
a lot of spam from ).


So basically the question is: what makes a spam in French recognizable.

On the other hand I am also worried about the very low hits of most rules.

If all your 1166 spams are in French, we can throw the whole ruleset to 
/dev/null (well I'll keep it for me anyway).


A++;
JG



seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET

Hi,


You run "seek-phrases-in-corpus" over the 2 corpora, and it'll spit out
the patterns; you can then write rules based on these.


I did so, the results are interesting, though I do not really know where 
to go from there. If I take the first 50 "best" patterns and strip off the 
obvious stand-alone words and sure-to-be-false-positive expressions, here 
is what I get to: (sorry for non French speakers, explanation below)


 RATIO   SPAM%HAM%   DATA
 1.000   9.375   0.000  /Pour ne plus recevoir /
 1.000   6.875   0.000  /6 janvier 1978 relative /
 1.000   6.875   0.000  /affiche pas correctement, vous pouvez le visualiser en/
 1.000   5.625   0.000  /s données nominatives /
 1.000   5.625   0.000  / ce message, cliquez-ici/
 1.000   5.625   0.000  / vous désinscrire de /
 1.000   5.000   0.000  /Conformément à l/
 1.000   5.000   0.000  / plus recevoir d\'informations de notre part/
 1.000   5.000   0.000  /un droit d\'accès/
 1.000   4.375   0.000  /ment Ã|  l\'article 34 de la loi/
 1.000   4.375   0.000  /ment à l\'article 34 de la loi /
 1.000   3.750   0.000  /ous désinscrire de notre /
 1.000   3.750   0.000  /es nominatives vous concernant\. /
 1.000   3.750   0.000  / Libertés du 6 /
 1.000   3.750   0.000  /es vous concernant\. Pour l\'exercer, /

As you can see, charset encoding makes a mess, and many must be regrouped.

Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and 
FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't 
read this mail in html, click here).


The whole result is available at 
http://www.saphirtech.fr/spam/seekrules_fr_1.txt



 http://taint.org/x/2008/seekrules_run


I also adapted this one (paths of course, but also forced "mbox" format, 
"detect" spit out zero results), but the result is even less "readable" 
for me. I miss the script seekrules/kill_bad_patterns which I presume 
removes stand alone words and such things.


Whole result at http://www.saphirtech.fr/spam/seekrules_fr_2.txt

John

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Thanks for taking this burden upon yourself. One other thing you should be 
prepared to do, if you're willing to devote long-term responsibility to these 
rules, is to provide sa-update-compatible feeds of your dynamic rules. This 
is another thing that Justin can probably help you with.


I am happy with trying to do so, but I am honestly not worried about the 
feed part, all it bores down to is putting the right file at the right 
place (be it push or pull, ftp or rsync, whatever).


What I am more worried about is testing regularly the rules, and, even 
before that, checking that they are valid. They are "good" on my system 
with my users, but then they were custom-tailored to be so.


JG



Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET

Re,

Looking at the rules, I'm worried about false positives on genuine opt-in 
advertising. I have a number of users who choose to receive all kinds of 
advertising blurb,


This is one of the reasons why I did not hunt for "click here" and "if you 
can't see this email in html". Now correct me if I am wrong (ouch, no, not 
on the head), but isn't this what whitelist_from is for ? I never was able 
to let the Intel newsletter through (it is in English), it would always be 
caught by SA. Same went for Microsoft Support genuine answers (ok, don't 
laugh).


so I'll run your rules with very low scores for a while to see what gets 
hit.


You can have a little more information, and exactly this suggestion, by 
reading http://www.saphirtech.fr/spamassassin.html


JG



hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET

Hi,

First of all, thanks to Justin for patiently helping me to install 
mass-check and pointing me in the right direction. I will try to run the 
algorithms tonight to see what they come up with.


In the meantime, you can find a hit-frequencies report at:
http://www.saphirtech.fr/spam/freqs_2008_06_23.txt

All rules are prefixed with FR_ and are available in the same directory.

I must say I did not double check for stray spam in my mailbox before 
using it as a ham corpus but it *should* be clean. I'll double check for 
next run. The spam corpus was 100% French spam, hand-picked over the last 
week through the "probably-spam" class (default score values 5-15).


Any feedback on the results (not enough in corpus, bad rules, good rules, 
etc.) appreciated.


Sincerely,
JG



Re: [Rule Set proposal] French Rules

2008-06-19 Thread John GALLET


I still miss samples for two rules, even if I did had hits according to 
/var/spool/maillog I did not save them.


I added a sample for the FR_NOTSPAM rule, and I removed the 
FR_YOURELUCKY rule as I see other forms of the text getting through so 
it is not efficient. On the other hand, nearly all these messages are 
caught with RBL rules so I might even remove it completely if I can't find 
an efficient one.


John
PS: reminder, rules and samples avaible at
http://www.saphirtech.fr/spam/



Re: [Rule Set proposal] French Rules

2008-06-17 Thread John GALLET

Hi,


I was able to access the URL you mentioned, but not all of the files
below it.  I received:
"Forbidden
You don't have permission to access /spam/FR_PAYLESSTAXES.txt on this server."


Sorry guys, only the ruleset file (the one I tried, of course) was 
readable, all the non empty spam samples had bad rights. This is fixed.


I still miss samples for two rules, even if I did had hits according to 
/var/spool/maillog I did not save them.


John




[Rule Set proposal] French Rules

2008-06-17 Thread John GALLET

Hi,

This is my first post on this list and first ruleset, so please point me 
to the right place/documents if I am doing anything wrong.


According to a search of this list on markmail.org, there have been few 
subjects about spam in French and (no disrespect meant) I would agree with 
the comments I read about the current French Ruleset being inadequate 
(tried it, did not keep any of it).


So I would like to propose a set for French Rules and get your feedback.

You can find both the rules and some sample spam email messages (two of 
them missing, I have hits in my log files, but deleted them) at the 
following URL: http://www.saphirtech.fr/spam/


I have been running these for about a month sitewise on three domains, I 
have not seen any false positives (yet).


Sincerely,
JG


#
# FRENCH SPECIFIC SPAMASSASSIN RULES.
# USE AND REDISTRIBUTE WITH THIS NOTE AT YOUR OWN RISK AND PLEASURE.
# AUTHOR: John GALLET
# Version: 2008-JUNE-17
# Latest: http://www.saphirtech.fr/
# Status: It Works For Me (tm)
#
# Spam is legal in France !
body FR_SPAMISLEGAL /\b(Conform.+ment|En 
vertu).{0,5}(article.{0,4}34.{0,4})?la loi\b/i
describe FR_SPAMISLEGAL French: pretends spam is (l)awful.
lang fr describe FR_SPAMISLEGAL Invoque la loi informatique et libertes.
score FR_SPAMISLEGAL2.5

body FR_SPAMISLEGAL_2   /\bdroit d.acc.+s.{1,3}(de 
modification)?.{0,5}de rectification\b/i
describe FR_SPAMISLEGAL_2   French: pretends spam is (l)awful.
lang fr describe FR_SPAMISLEGAL_2   Invoque le droit de rectification cnil.
score FR_SPAMISLEGAL_2  2.5

#
# yeah, sure.
body FR_NOTSPAM /\b(ceci|ce).{1,9} n.est 
pas.{1,5}spam\b/i
describe FR_NOTSPAM French: claims not to be spam.
lang fr describe FR_NOTSPAM Affirme ne pas etre du spam.
score FR_NOTSPAM4.0

#
## I can pay my taxes
body FR_PAYLESSTAXES
/\b(paye|calcul|simul|r.+dui|investi).{1,7}(moins|vo|ses).{0,5}imp.+t(s)?\b/i
describe FR_PAYLESSTAXESFrench: Pay less taxes 
lang fr describe FR_PAYLESSTAXESSimulateurs et reductions d'impots.

score FR_PAYLESSTAXES   2.0

body FR_REALESTATE_INVEST   /\b(loi)? 
(de.robien|girardin).{1,15}(neuf|recentr.+|ancien|IR|IS|imp.+t(s)?|industriel(le)?)\b/i
describe FR_REALESTATE_INVEST   French: Invest in real-estate with 
tax-reductions
lang fr describe FR_REALESTATE_INVEST   Reduction impots immobilier.
score FR_REALESTATE_INVEST  2.5

#
# I won at the casino
body FR_ONLINEGAMBLING  /\b(casino(s)?|jeu(x)?|joueur(s)?) (en 
ligne|de grattage)\b/i
describe FR_ONLINEGAMBLING  French: Online gambling
lang fr describe FR_ONLINEGAMBLING  Jeux en ligne.
score FR_ONLINEGAMBLING 2.0

#
# I am so lucky to receive spam
body FR_YOURELUCKY  /\b(tentez)? votre (jour de)? chance\b/i
describe FR_YOURELUCKY  French: it's your lucky day (sure).
lang fr describe FR_YOURELUCKY  Jeux de hasard et de chance.
score FR_YOURELUCKY 1.0

#
# Baby, did you forget to take your meds ?
body FR_ONLINEMEDS  /\bpharmacie(s)? (en 
ligne|internet)\b/i

describe FR_ONLINEMEDS  French: Online meds ordering
lang fr describe FR_ONLINEMEDS  Achat de medicaments en ligne.
score FR_ONLINEMEDS 3.0

##
# Tell me why
body FR_REASON_SUBSCRIBE/\bVous recevez ce(t|tte)? 
(message|mail|m.+l|lettre|news.+) (car|parce que)\b/i

describe FR_REASON_SUBSCRIBEFrench: you subscribed to my spam.
lang fr describe FR_REASON_SUBSCRIBEIndique pourquoi vous recevez le 
courrier.

score FR_REASON_SUBSCRIBE   1.5

#
# How to unsubscribe
body FR_HOWTOUNSUBSCRIBE 
/\b(souhaitez|d.+sirez|pour).{1,10}(plus.{1,}recevoir|d.+sincrire|d.+sinscription).{0,10}(information|email|mail|mailing|newsletter|message|offre|promotion)(s)?\b/i

describe FR_HOWTOUNSUBSCRIBEFrench: how to unsubscribe
lang fr describe FR_HOWTOUNSUBSCRIBEIndique comment se desabonner.
score FR_HOWTOUNSUBSCRIBE   2.0


# Various "CRM" (Could Remove Me)
#
header FR_MAILER_1  X-Mailer =~ 
/(delosmail|cabestan|ems|mp6|wamailer|phpmailer|eMailink|Accucast|Benchmail)/i
describe FR_MAILER_1French spammy X-Mailer
lang fr describe FR_MAILER_1X-Mailer couramment employe pour 
des spams en francais.

score FR_MAILER_1   4.0

header FR_MAILER_2  X-EMV- =~ /.+/
describe FR_MAILER_2