date:20060413


Mark Martinec wrote:


The most interesting part in my view is not the IP distance, but the
type of OS, illustrated by the following table (derived from the same
data as fig2):

p0f OS guessham :   spam
-
Windows-XP0.7 % : 99.3 %
Windows-2000  5.8 % : 94.2 %
UNKNOWN  16.5 % : 83.5 %
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %
(Unix+Linux  66.5 % : 33.5 %)

Only 0.7% of all mail coming from Windows-XP hosts is ham!!!
It is an ideal information to contribute two or three score points.


I'm not sure the ham hit rate from the Windows-XP category scales (to 
other installations) very well.  The last time I looked into using p0f 
to fingerprint connecting hosts, last spring, I seem to recall that 
Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint 
identically.


While it'd be nice to be score Windows-XP hosts harshly, there's a lot 
of mail coming from Windows Server 2003 hosts that would get hit.


I know for some of my systems 1:99 would be really low if Windows Server 
2003 and XP are identified the same.  40:60 (and in some cases 80:20) 
would be closer to what I often see if I were to assume that all spam 
came from Windows XP hosts.


Maybe you don't receive much, if any, mail from Windows Server 2003 hosts?


Daryl

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Alan Premselaar

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kai Schaetzl wrote:
 I just saw that a normal Ebay outbid notice hit two high-score rules. One 
 is from sare-spoof and I already contacted the maintainer. But one is in 
 the default 3.1.1 ruleset and I think this rule should get completely 
 removed or get a score of 0. It's
 
 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
 
 From grepping the rules it does what it says: it checks if there are two 
 B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or 
 at all? This is absolutely valid Q/B encoding and actually *required* by 
 RFC if your subject line is longer than 80 (or was it 72?) characters 
 (minus the encoding, so it's actually more like a 60 raw character limit).
 This rule will hit on *lots* of non-ASCII mail and on almost all mail 
 coming from Ebay Germany.
 
 There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which 
 are similar. QP scores 0 and BASE64 scores 0.449. This is much more 
 reasonable.
 
 Kai
 

This utterly wreaks havoc on just about all Japanese email, so I dropped
the score to nearly nothing.

alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEPfgmE2gsBSKjZHQRAt82AKDAY4xTmST0kaY5cje1xH1ScDajOACg6fMH
msifLKqJuv1IpudxbKGDcfQ=
=ZDQE
-END PGP SIGNATURE-

Re: xxxl spam

2006-04-13 Thread Loren Wilton

 to read this in other words: while certain analysts (and definitlely
microsoft marketing)
 claim that about 50 % of all servers is running windows, these figures
tend to say that
 real mail servers (those that deliver the ham part of mail) rarely ever
run XP
 but that this OS is the best candidate for creating a spam zombie

Not completely unreasonable.  XP is targeted within MS as a personal or very
small company OS.  The equivalent of a linux/unix system used by more than a
single person would typically be some version of Server 2003.  Which was
probably identified in the stats as Windows 2000.

I'd like to venture the suggestion that the percentage of spam from XP isn't
necessarily an indication of inherent buggyness.  It is more an indication
that it is an OS for Clueless Noobs who haven't a clue about maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  Thes
are the machines that turn into zombies.

If there were as many linux machines in the hands of Clueless Noobs, I'd bet
that the number of infected linux systems would be in the similar percentage
range.  Remember, these XP systems are virtually all run with Administrator
(aka root) privs all the time, by people that haven't a clue what that
means.  What would happen if all linux-like systems ran that way?)

Loren

Re: sa missed to scan some of email

2006-04-13 Thread martin

David B Funk dbfunk at engineering.uiowa.edu writes:
 Exactly so.
 Usually you can find the related message by matching the time-stamp
 from your maillog to your spamd log. You can also do some detective work,
 eliminate maillog entries that have an incoming msgid (IE one from the
 sending MTA) and just concentrate on those that have a locally added
 msgid.
 
 Dave
 

  thx help, it seem ur correct, as based on the timestamp search, most of
unknown msgid at spam.log had a msgid like '[EMAIL PROTECTED]'
at maillog.

Re: xxxl spam

2006-04-13 Thread Mark Martinec

Wolfgang, Loren,
  real mail servers (those that deliver the ham part of mail) rarely ever
  run XP but that this OS is the best candidate for creating a spam zombie

 Not completely unreasonable.  XP is targeted within MS as a personal or
 very small company OS.  The equivalent of a linux/unix system used by more
 than a single person would typically be some version of Server 2003.  Which
 was probably identified in the stats as Windows 2000.

 I'd like to venture the suggestion that the percentage of spam from XP
 isn't necessarily an indication of inherent buggyness.  It is more an
 indication that it is an OS for Clueless Noobs who haven't a clue about
 maintaining a system, avoiding a virus, or even able to tell if they have a
 viruis.  Thes are the machines that turn into zombies.

I fully agree.

In this view the following two lines should be seen as well:

p0f OS guessham :   spam
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %

Linux is used by masses (compared to other Unix OS types) because it is
considered to be easier to set up. Eventually this also means that less care
is invested in prevention of being used to propagate spam.

Still, a score  L_P0F_Unix  -1.0 seems to be doing a good job here.


Daryl,
 I'm not sure the ham hit rate from the Windows-XP category scales (to
 other installations) very well.  The last time I looked into using p0f
 to fingerprint connecting hosts, last spring, I seem to recall that
 Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint
 identically.

 While it'd be nice to be score Windows-XP hosts harshly, there's a lot
 of mail coming from Windows Server 2003 hosts that would get hit.

There is indeed a handful of valid small sites classified by p0f as Windows XP 
from which we do receive regular mail (well, newsletters and such, but still,
should be treated mostly as ham). I don't see adding few score points to them
much different than other (some quite arbitrary) rules - each rule tries to
have low FP rate, but it often is not zero. Only a collection of all rules has
merit.

 I know for some of my systems 1:99 would be really low if Windows Server
 2003 and XP are identified the same.  40:60 (and in some cases 80:20)
 would be closer to what I often see if I were to assume that all spam
 came from Windows XP hosts.
 Maybe you don't receive much, if any, mail from Windows Server 2003 hosts?

I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark

Re: xxxl spam


Mark Martinec wrote:


I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark


Hmm... FWIW:

[EMAIL PROTECTED] dos]$ sudo p0f -i eth1
p0f - passive os fingerprinting utility, version 2.0.4
(C) M. Zalewski [EMAIL PROTECTED], W. Stearns [EMAIL PROTECTED]
p0f: listening (SYN) on 'eth1', 223 sigs (12 generic), rule: 'all'.
24.141.168.241:4218 - Windows XP Pro SP1, 2000 SP3
  - 66.98.221.156:25 (distance 1, link: ethernet/modem)
66.98.221.156:2602 - Windows 2000 SP4, XP SP1
  - 24.141.168.241:783 (distance 19, link: ethernet/modem)


24.141.168.241 is Windows XP Pro SP1
66.98.221.156 is Windows Server 2003 SP1 (Standard Edition)


Daryl

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Michael Monnerie

On Donnerstag, 13. April 2006 13:35 Mark Martinec wrote:
 Agreed, this rule is completely inappropriate, it penalizes valid
 encoding according to RFC 2047 and fires on any lengthier Subject
 line in non-English language. It should disappear or have a
 much reduced default score.

The problem seems to be that
1) most spam is english
2) most people contributing mass-checks are english speaking
3) therefore most ham+spam tested in mass-checks are english

in order to improve the situation, more mass-check testers with 
non-english language ham+spam should contribute, see 
http://wiki.apache.org/spamassassin/MassCheck?highlight=%28mass%29

I'm not a SA dev, but I think they once wrote more supporters would be 
nice. I do mass-checks, and if somebody wants to help, I have a working 
script you can have in order to contribute to testing. It's a simple 
setup, and then your server has some work to do overnight. On mine, 
it's about 1 hour per night, so pas problem.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpRDuDm470m7.pgp
Description: PGP signature

TEXTAREA style=visibility: hidden

2006-04-13 Thread Magnus Holmgren

I see a fair amount of spam using TEXTAREA style=visibility: hidden to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
at SARE.

-- 
Magnus Holmgren


pgpVmoewWW2XX.pgp
Description: PGP signature

Re: TEXTAREA style=visibility: hidden

Magnus Holmgren wrote:
 I see a fair amount of spam using TEXTAREA style=visibility: hidden to 
 hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
 general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
 at SARE.

It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long):

rawbody L_STYLE_HIDDEN /TEXTAREA
[^]{0,50}style\s?=\s?\s?visibility:\s?hidden\s?[^]{0,50}/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.

RE: TEXTAREA style=visibility: hidden

2006-04-13 Thread JD Smith

So, what exactly is bayes poison?

Best regards,

JD Smith
-Original Message-
From: Magnus Holmgren [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 13, 2006 8:58 AM
To: users@spamassassin.apache.org
Subject: TEXTAREA style=visibility: hidden

I see a fair amount of spam using TEXTAREA style=visibility: hidden
to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset,
nor 
at SARE.

-- 
Magnus Holmgren

RE: TEXTAREA style=visibility: hidden

2006-04-13 Thread Bowie Bailey

JD Smith wrote:
 
 So, what exactly is bayes poison?

Bayes poison is a collection of random words or text selections that
have nothing to do with the email subject and are only there in an
attempt to confuse the Bayes database.  This doesn't really work the
way the spammers would like to think it does, but they keep doing it
anyway.

-- 
Bowie

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote:
 Agreed, this rule is completely inappropriate, it penalizes valid
 encoding according to RFC 2047 and fires on any lengthier Subject
 line in non-English language. It should disappear or have a
 much reduced default score.

Says you. ;)

  1.047   1.4619   0.07920.949   0.580.89  SUBJECT_ENCODED_TWICE

So in the results used to generate scores, that rule is ~94.9% accurate,
and hits ~1.46% of all spam.  In a recent nightly mass-check run:

  1.153   1.4173   0.11510.925   0.730.89  SUBJECT_ENCODED_TWICE

So more ham seems to use encoding twice in the subject, and a little
less spam uses it.  Based on this, my guess is the generated score would
go down.

The thing to remember about rules is that they neither necessarily
look for RFC non-compliance, nor do they avoid RFC compliant mails.
They look for features that hit spam and try to avoid hitting ham.
The key there is that rule development occurs with the results people
make available.  If the people generating results don't receive ham
mails that, for instance, use multiple encodings in a Subject header,
the results won't indicate that it occurs in ham very much.

-- 
Randomly Generated Tagline:
I protect home plate like a mormon girl on prom night.
 - Mimi on the Drew Carey show


pgp7GImSPz38Z.pgp
Description: PGP signature

SpamAssassin BZ downtime

2006-04-13 Thread Justin Mason

http://ajax.apache.org/%7ejefft/ :

  Bugzilla is moving to a new host, and is temporarily down while the
  database synchs. Apologies for the inconvenience.

--j.

Re: TEXTAREA style=visibility: hidden

Bowie Bailey wrote:
 JD Smith wrote:
 So, what exactly is bayes poison?
 
 Bayes poison is a collection of random words or text selections that
 have nothing to do with the email subject and are only there in an
 attempt to confuse the Bayes database.  This doesn't really work the
 way the spammers would like to think it does, but they keep doing it
 anyway.


How well bayes poison works depends a lot on your bayes implementation. Some
bayes implementations are fairly susceptible to this.  (I put bayes in
quotes because not all bayes implementations are really Bayesian at all.
Actually, most are not, including SA.)

In particular, the choice of combining algorithm seems to matter a lot. The use
of chi-squared combining, instead of true Bayesian combining, seems to make SA's
bayes rather resistant to this.

(note: the use of chi-squared is not exclusive to SA.. many bayes
implementations do this, but not all.)

Another area of influence is the choice of tokens. Words vs chars, hapaxes, etc
all change how a bayes implementation reacts to poisoning attempts.

So spammers keep using bayes poison because it works in some cases. It also
doesn't really hurt them much, and sometimes even helps them, against more
resistant implementations.

Re: TEXTAREA style=visibility: hidden

2006-04-13 Thread Matthias Keller


Matt Kettler wrote:

Magnus Holmgren wrote:
  
I see a fair amount of spam using TEXTAREA style=visibility: hidden to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
at SARE.



It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long):

rawbody L_STYLE_HIDDEN /TEXTAREA
[^]{0,50}style\s?=\s?\s?visibility:\s?hidden\s?[^]{0,50}/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.
  

Hi Matt

I'm using this rule for quite some time now:

rawbody MKE_HIDDEN1 
/[^]*\bstyle=[^]*(?:visibility:\s*hidden|display:\s*none)/i

describeMKE_HIDDEN1 Contains CSS-hidden text
score   MKE_HIDDEN1 3.5

In my opinion you shouldn't limit it to textareas as I've seen them on 
DIVs and others too...
So to me, any visibility:hidden or display:none is suspect as I dont see 
any legitimate use in emails


In my spams, this rule matches around 4% of all spams, I haven't seen 
any ham matches yet
Feel free to mass check it and/or include it into your coding rules. But 
if you do please inform me that I can remove my local copy then.


Matt

Russian Spam

2006-04-13 Thread Kristopher Austin

I have received several copies of a spam message that is in Russian (I think 
it's Russian).  I get maybe 1 or 2 a week.  I wish I could block all Russian 
messages, but we are a University and could easily have Russian students.  I am 
unable to read this message and therefore have no ideas on how to block this.  
Can anyone help me out with suggestions?

I apologize if this has been discussed in the last week.  I haven't had time to 
catch up on list messages over the last couple of days and didn't see anything 
skimming the subjects of recent threads.

Thanks,
Kris

Message with full headers below:

Microsoft Mail Internet Headers Version 2.0
Received: from gateway3.oc.edu ([205.143.222.12]) by fsmail.oc.edu with 
Microsoft SMTPSVC(6.0.3790.211);
 Thu, 13 Apr 2006 08:50:17 -0500
Received: from ip-189.net-82-216-33.toulouse.rev.numericable.fr 
([82.216.33.189])(helo=ip-189.net-82-216-33.toulouse.rev.numericable.fr)
by gateway3.oc.edu with smtp (Exim 4.54)
id 1FU2CH-0008JS-AY
for [EMAIL PROTECTED]; Thu, 13 Apr 2006 08:49:43 -0500
From: Litvinova Elena [EMAIL PROTECTED]
To: Samusenko Tat'jana [EMAIL PROTECTED]
Date: Thu, 13 Apr 2006 13:50:06 +
Message-ID: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset=koi8-r;
reply-type=original
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1441
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-SA-Exim-Connect-IP: 82.216.33.189
X-SA-Exim-Rcpt-To: [EMAIL PROTECTED]
X-SA-Exim-Mail-From: [EMAIL PROTECTED]
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on gateway3.oc.edu
X-Spam-Level: 
X-Spam-Status: No, score=0.3 required=5.0 tests=DNS_FROM_AHBL_RHSBL,RELAY_FR 
autolearn=disabled version=3.1.0
Subject: Re[6]: =?koi8-r?B?9Nkgzc7Px88gxMzRIM3FztEg2s7B3snb2A==?= davavsheju
X-SA-Exim-Version: 4.2 (built Thu, 03 Mar 2005 10:44:12 +0100)
X-SA-Exim-Scanned: Yes (on gateway3.oc.edu)
Return-Path: [EMAIL PROTECTED]
X-OriginalArrivalTime: 13 Apr 2006 13:50:17.0572 (UTC) 
FILETIME=[32A1FA40:01C65F01]

Рад Вас снова видеть!

Вы собираетесь в США? Хотите свободно работать
с технической документацией? Расширить свой кругозор?

Центр Американского Английского
приглашает выучить английский язык!!!
Все стадии обучения - от нуля до высшего. Ассоциативно-
образная методика. Преподаватели из США.

Без больших скидок не уйдёте! :)

Наши телефоны в Москве:
105 пять-один-восемь-шесть
два-три-восемь-три-три-восемь-шесть


Не хотите получать информацию от Центра? Отправьте свой адрес нам:
[EMAIL PROTECTED]



сил. Но он не мог понять того, -- вдруг как бы вырвавшимся тонким голосом
закричал князь Андрей, -- но он не мог понять, что мы в первый раз дрались
там за русскую землю, что в войсках был такой дух, какого никогда я не
видал, что мы два дня сряду отбивали французов и что этот успех удесятерял
наши силы. Он велел отступать, и все усилия и потери пропали даром. Он не
думал об измене, он старался все сделать как можно лучше, он все обдум
от этого-то он и не годится. Он не годится теперь именно потому, что он все
обдумывает очень основательно и аккуратно, как и следует всякому немцу. Как
бы тебе сказать... Ну, у отца твоего немец-лакей, и он прекрасный лакей и
удовлетворит всем его нуждам лучше тебя, и пускай он служит; но ежели отец
при смерти болен, ты прогонишь лакея и своими непривычными, неловкими 
станешь ходить за отцом и лучше успокоишь его, чем искусный, но чужой
человек. Так и сделали с Барклаем. Пока Россия была здорова, ей мог служить

Re: TEXTAREA style=visibility: hidden

On Thu, Apr 13, 2006 at 03:58:01PM +0200, Magnus Holmgren wrote:
 I see a fair amount of spam using TEXTAREA style=visibility: hidden to 
 hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
 general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
 at SARE.

Not specific to textarea, just looking for an html tag with that style setting:

  0.878   0.9903   0.33190.749   0.001.00  TVD_VIS_HIDDEN

Specifically just looking for textarea:

  0.821   0.9903   0.1.000   1.001.00  TVD_VIS_HIDDEN

I added the second one to my sandbox.  We'll see how the nightly
mass-checks deal with it. :)

Thanks! :)

-- 
Randomly Generated Tagline:
Do not meddle in the affairs of wizards,
 for they are subtle and quick to anger.- Lord of the Rings


pgpQ8Oyqqmvgy.pgp
Description: PGP signature

Proper use of user_prefs whitelist

2006-04-13 Thread Forrest Aldrich

I've been having some difficulty with the user_prefs and the whitelist_* 
fucntions.   I read the examples etc, and I believe these are correct, 
but clearly certain email is still being tagged (see below).   I wonder 
if someone can help clarify what I'm doing wrong here.


First, here are the directives in my ~/.spamassassin/user_prefs file, as 
it applies to this instance:


   whitelist_from_rcvd spamassassin.apache.org hermes.apache.org

   whitelist_from  *.apache.org


Here is the Sendmail log, showing the rejection:

   Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951:
   from=[EMAIL PROTECTED],
   size=17514, class=-60, nrcpts=1, msgid=[EMAIL PROTECTED],
   proto=SMTP, daemon=MTA, relay=hermes.apache.org [209.237.227.199]

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add:
   header: X-Spam-Flag: YES

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add:
   header: X-Spam-Status: Yes, score=9.0 required=5.0
   
tests=HTML_00_10,HTML_MESSAGE,\n\tJ_CHICKENPOX_12,J_CHICKENPOX_33,RCVD_IN_SORBS,SARE_BIZOP,\n\tSARE_COLLEGE_SCAM,TVD_FUZZY_DEGREE
   autolearn=no version=3.1.1

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter: data,
   reject=550 5.7.1 Blocked by SpamAssassin

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951:
   to=[EMAIL PROTECTED], delay=00:00:02, pri=155514, stat=Blocked by
   SpamAssassin



Thanks in advance

Re: TEXTAREA style=visibility: hidden

Matthias Keller wrote:
 Matt Kettler wrote:
 Magnus Holmgren wrote:
  
 I see a fair amount of spam using TEXTAREA style=visibility:
 hidden to hide bayes poison. Shouldn't a rule against that, or
 CSS-hidden text in general, be worthwile? I couldn't find any in the
 default 3.1.1 ruleset, nor at SARE.
 

 It certainly seems worth testing.

 Here's a rule I wrote (caution: word-wraps.. this should be 3 lines
 long):

 rawbody L_STYLE_HIDDEN /TEXTAREA
 [^]{0,50}style\s?=\s?\s?visibility:\s?hidden\s?[^]{0,50}/i
 describe L_STYLE_HIDDEN  has text with hidden visibility style
 score L_STYLE_HIDDEN 0.1

 I added some allowance for other declarations in the textarea tag, and
 the
 insertion of whitespace at various spots...

 It may need further tweaking/tuning, but it's a first-stab.
   
 Hi Matt
 
 I'm using this rule for quite some time now:
 
 rawbody MKE_HIDDEN1
 /[^]*\bstyle=[^]*(?:visibility:\s*hidden|display:\s*none)/i
 describeMKE_HIDDEN1 Contains CSS-hidden text
 score   MKE_HIDDEN1 3.5
 

That seems to be a nicer rule. My only concern would be that [^]* could be
rather slow. I'd change the * to a range-limit, to prevent SA from digging
through the entire body of a message that happens to be text/plain and starts
off with a  and has no  anywhere in it.

Re: TEXTAREA style=visibility: hidden

2006-04-13 Thread Matthias Keller


Matt Kettler wrote:

Matthias Keller wrote:
  

Matt Kettler wrote:


Magnus Holmgren wrote:
 
  

I see a fair amount of spam using TEXTAREA style=visibility:
hidden to hide bayes poison. Shouldn't a rule against that, or
CSS-hidden text in general, be worthwile? I couldn't find any in the
default 3.1.1 ruleset, nor at SARE.



It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines
long):

rawbody L_STYLE_HIDDEN /TEXTAREA
[^]{0,50}style\s?=\s?\s?visibility:\s?hidden\s?[^]{0,50}/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and
the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.
  
  

Hi Matt

I'm using this rule for quite some time now:

rawbody MKE_HIDDEN1
/[^]*\bstyle=[^]*(?:visibility:\s*hidden|display:\s*none)/i

describeMKE_HIDDEN1 Contains CSS-hidden text
score   MKE_HIDDEN1 3.5




That seems to be a nicer rule. My only concern would be that [^]* could be
rather slow. I'd change the * to a range-limit, to prevent SA from digging
through the entire body of a message that happens to be text/plain and starts
off with a  and has no  anywhere in it.
  

Good idea
Thanks for pointing that out
Maybe a meta rule with IS_HTML or how that's called again might be a 
good idea too


Let me know your mass check results then

Matt

Re: relaydb and tarpit


Michael Monnerie wrote:
Sorry for x-posting, but that's a program useful to postfix and/or SA 
users.


http://www.benzedrine.cx/relaydb.html

Does anybody use or know about this program with tarpitting? It sounds 
very interesting, and for the author it seems to work, but I'd like to 
know if others made good or bad experience with it. After all, we're 
all fighting spammers, and if there are solutions really working, I'm 
ready to implement it into our servers.




pfff. just reading the two first paragraphs is enough to look elsewhere.

some people seem to redefine what a false positive is. they think that 
just because they reject mail or because the client/sender/... 
misbehaves, then it's not a false positive. This is just silly. a false 
positive is when a classifier considers a legitimate mail as spam, be 
that by rejection, by discarding, by delivering to a junk folder, ... etc.


just say no...

Re: xxxl spam

2006-04-13 Thread John Rudd



On Apr 13, 2006, at 12:12 AM, Loren Wilton wrote:

I'd like to venture the suggestion that the percentage of spam from XP 
isn't
necessarily an indication of inherent buggyness.  It is more an 
indication
that it is an OS for Clueless Noobs who haven't a clue about 
maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  
Thes

are the machines that turn into zombies.



While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients 
(mail/spam/virus scanning at the border), OR if you do not have a high 
percentage of XP clients in your domain, then your scanning systems 
will not receive many (if any) legitimate direct connections from XP 
clients ... because a legitimate mail sending process on an XP system 
will be directly connecting to their own domain's mail server, and not 
to YOUR mail scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP 
systems, on your mail scanning systems, will be from spam/virus bots 
trying to directly submit spam or virus laden messages to your mail 
gateways instead of submitting it to their own mail servers (as bots 
are known to do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation.  Kinda like QM.

Re: relaydb and tarpit

2006-04-13 Thread Michael Monnerie

On Donnerstag, 13. April 2006 18:15 mouss wrote:
 pfff. just reading the two first paragraphs is enough to look
 elsewhere. some people seem to redefine what a false positive is.

I didn't mean that, I meant the tarpitting approach. Of course you have 
to set some (much) harder policy on which systems to put on your 
tarpit-blackhole list.

But *if* you have such a tarpit decider without FP (not sure how to do 
that...), couldn't this be a very good countermeasure to spam?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpBPBcyjNZ8J.pgp
Description: PGP signature

How was this missed?

2006-04-13 Thread qqqq

Guys,

Any idea how this one got through?

body BRIAN_PHONE_NUMBERS
/2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6
.2.0.2.2.0.3.3/
describe BRIAN_PHONE_NUMBERS  Phone number or address pulled from spam
scoreBRIAN_PHONE_NUMBERS  5.5

- Message -

Good day,


A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!- 2*0*6*984-2327

Within 2 weeks! No Study Required!   1_0_0_% Veri.fiable!

Right now the following deg.rees are being offered:

B/A,   .B/S/C,.M/A,.M/S/C,.M/B/A,   .P/H/D,


C.al_l us now_ for more information,  2*0*6*984-2327



TTYL,
Vilma Milton

Re: TEXTAREA style=visibility: hidden

2006-04-13 Thread Kelson


Matthias Keller wrote:
In my opinion you shouldn't limit it to textareas as I've seen them on 
DIVs and others too...
So to me, any visibility:hidden or display:none is suspect as I dont see 
any legitimate use in emails


Hmm... The main uses I can think of for display:none and 
visibility:hidden are:


(1) Serving the same content to different media (for instance, set a
page so that the navigation area doesn't appear when you print it)
(2) Replacing content (as in CSS techniques to replace text with
graphical headlines)
(3) Scripting that will show and hide sections in response to time or
user interaction.
(4) Creating machine-readable content that the user will not see.
(keyword stuffing, bayes poison, black-hat SEO, honeypot seeding,
etc.)

#1 isn't a good fit with email, since the main things you'd want to 
leave out of a print version are more likely to be in the mail client UI 
than part of the message body.  Though it might be useful for providing 
a handheld-friendly view.  Even so, it wouldn't work with inline styles, 
only with an attached or embedded stylesheet.


#2 is pretty much useless in email.  If you want a text alternative, 
you're better off providing a text/plain version of the message.


#3 shouldn't even be a consideration, since HTML-capable email clients 
should have scripting disabled for safety reasons.


#4 is mostly deceptive.  If you need to provide metadata in an HTML doc, 
well, that's what META tags are for.  If you need to provide metadata in 
an email message, you've got headers, you can add an XML attachment, etc.


Nope.  No legit uses in email that I can think of.

--
Kelson Vibber
SpeedGate Communications www.speed.net

RE: TEXTAREA style=visibility: hidden

2006-04-13 Thread Matthew.van.Eerde

Kelson wrote:
 (3) Scripting that will show and hide sections in response to time or
  user interaction.
... 
 #3 shouldn't even be a consideration, since HTML-capable email clients
 should have scripting disabled for safety reasons.

s/Scripting/CSS :hover/ is perfectly reasonable, though:
http://www.meyerweb.com/eric/css/edge/menus/demo.html

(doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...)

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer

Re: How was this missed?

On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
 Any idea how this one got through?
 
 body BRIAN_PHONE_NUMBERS
 /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6
 .2.0.2.2.0.3.3/
 
 A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!- 2*0*6*984-2327

Sure, the pattern doesn't match.  . means there has to be some (any)
character between the numbers.  984 has no characters between the
numbers.

-- 
Randomly Generated Tagline:
1-900-Tech Support...hold...all operators are busy.


pgpwSxDV5mFul.pgp
Description: PGP signature

Re: How was this missed?

2006-04-13 Thread Magnus Holmgren

Please start a new thread instead of replying to an unrelated message.

Thursday 13 April 2006 18:39  wrote:
 Any idea how this one got through?

 body BRIAN_PHONE_NUMBERS
 /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|
2.0.6.3.3.8.6.0.6.1|2.0.6 .2.0.2.2.0.3.3/
 describe BRIAN_PHONE_NUMBERS  Phone number or address pulled from spam
 scoreBRIAN_PHONE_NUMBERS  5.5


A period (.) matches exactly one arbitrary character (except newline). Try 
putting a question mark (?) after each period.

 - Message -

 Good day,


 A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!- 2*0*6*984-2327

 Within 2 weeks! No Study Required!   1_0_0_% Veri.fiable!

 Right now the following deg.rees are being offered:

 B/A,   .B/S/C,.M/A,.M/S/C,.M/B/A,   .P/H/D,


 C.al_l us now_ for more information,  2*0*6*984-2327
-- 
Magnus Holmgren



pgpxO9V9dnv05.pgp
Description: PGP signature

RE: How was this missed?

2006-04-13 Thread Matthew.van.Eerde

Theo Van Dinter wrote:
 On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
 Any idea how this one got through?
 
 body BRIAN_PHONE_NUMBERS

/2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7
.9|2.0.6.3.3.8.6.0.6.1|2.0.6
 .2.0.2.2.0.3.3/ 
 
 A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!-
 2*0*6*984-2327 
 
 Sure, the pattern doesn't match.  . means there has to be some (any)
 character between the numbers.  984 has no characters between the
 numbers.

Fixed version:

/2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8
.?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6
.?2.?0.?2.?2.?0.?3.?3/

Or, perhaps, better:

/2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5
\D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D
?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3
/

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer

Re: xxxl spam


John Rudd wrote:
While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients (mail/spam/virus 
scanning at the border), OR if you do not have a high percentage of XP 
clients in your domain, then your scanning systems will not receive many 
(if any) legitimate direct connections from XP clients ... because a 
legitimate mail sending process on an XP system will be directly 
connecting to their own domain's mail server, and not to YOUR mail 
scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP systems, 
on your mail scanning systems, will be from spam/virus bots trying to 
directly submit spam or virus laden messages to your mail gateways 
instead of submitting it to their own mail servers (as bots are known to 
do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation.  Kinda like QM.




ot
I agree that statistics aren't the whole story. you can study the 
percentage of thiefs/criminals based on skin color and origin (some 
people already do it, and many jump to conclusions without studies). but 
you can do the same study based on social situation and past history of 
people. the first researcher will probably conclude that 
black/arabic/latin/... people are more criminal. the second 
researcher will instead conclude that criminality is more seen in poor 
communities, but that these aren't the worst criminals (killing vs 
stealing for instance).

/ot

back to xp and co. my feeling (no, I didn't run a study and won't) is 
that even if any study would show that we get more spam from XP than 
from linux, I will not use this to classify my mail.


I am certain that if you do stats on mail date, you'll find that some 
dates correspond to more spam than others. we've already seen people 
jumping to block specific mailers (the bat for instance) based on their 
stats. I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and if 
I can't find a justification, I disable it.

Re: New bayes poison

2006-04-13 Thread William Stearns


Good afternoon, Michael,

On Thu, 13 Apr 2006, Michael Monnerie wrote:


Hi, I just received some new bayes poison attempt. I never had one so
large, maybe that could start to be a bit of problem?


	To the best of my knowledge, it isn't.  Temporarily you get more 
hapaxes (tokens seen just once) in your bayes data, but those will get 
expired sooner or later.
	There's no effect on accuracy if the tokens truly are seen once. 
If they show up again in spam, it actually helps because the phrases help 
identify the second spam.

Cheers,
- Bill

---
Computers let you make more mistakes faster than any other
invention in human history, with the possible exception of handguns and
tequila.
-- Mitch Radcliffe
(Courtesy of Hugo van der Kooij [EMAIL PROTECTED])
--
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--

Re: How was this missed?

2006-04-13 Thread Tyler Nally

On Thursday 13 April 2006 11:55, [EMAIL PROTECTED] wrote:
 Theo Van Dinter wrote:
  On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
  Any idea how this one got through?
  
  body BRIAN_PHONE_NUMBERS
 
 /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7
 .9|2.0.6.3.3.8.6.0.6.1|2.0.6
  .2.0.2.2.0.3.3/ 

There's a ruleset I use from:

   http://www.emtinc.net/includes/chickenpox.cf

.. that checks for the d.i.f.f.e.r.e.n.t kinds of 
spacing like this... a lot of the spam that has
those kinds of characteristics will have several
of the CHICKENPOX_ rules that have fired positive.

It checks for some 60+ different patterns..

describe J_CHICKENPOX_12  1alpha-pock-2alpha
describe J_CHICKENPOX_13  1alpha-pock-3alpha
describe J_CHICKENPOX_14  1alpha-pock-4alpha
describe J_CHICKENPOX_15  1alpha-pock-5alpha
describe J_CHICKENPOX_16  1alpha-pock-6alpha
describe J_CHICKENPOX_17  1alpha-pock-7alpha
describe J_CHICKENPOX_18  1alpha-pock-8alpha
describe J_CHICKENPOX_19  1alpha-pock-9alpha
describe J_CHICKENPOX_110 1alpha-pock-10alpha
describe J_CHICKENPOX_111 1alpha-pock-11alpha
describe J_CHICKENPOX_21  2alpha-pock-1alpha
describe J_CHICKENPOX_22  2alpha-pock-2alpha
describe J_CHICKENPOX_23  2alpha-pock-3alpha
describe J_CHICKENPOX_24  2alpha-pock-4alpha
describe J_CHICKENPOX_25  2alpha-pock-5alpha
describe J_CHICKENPOX_26  2alpha-pock-6alpha
describe J_CHICKENPOX_27  2alpha-pock-7alpha
describe J_CHICKENPOX_28  2alpha-pock-8alpha
describe J_CHICKENPOX_29  2alpha-pock-9alpha
describe J_CHICKENPOX_210 2alpha-pock-10alpha
describe J_CHICKENPOX_31  3alpha-pock-1alpha
describe J_CHICKENPOX_32  3alpha-pock-2alpha
describe J_CHICKENPOX_33  3alpha-pock-3alpha
describe J_CHICKENPOX_34  3alpha-pock-4alpha
describe J_CHICKENPOX_35  3alpha-pock-5alpha
describe J_CHICKENPOX_36  3alpha-pock-6alpha
describe J_CHICKENPOX_37  3alpha-pock-7alpha
describe J_CHICKENPOX_38  3alpha-pock-8alpha
describe J_CHICKENPOX_39  3alpha-pock-9alpha
describe J_CHICKENPOX_41  4alpha-pock-1alpha
describe J_CHICKENPOX_42  4alpha-pock-2alpha
describe J_CHICKENPOX_43  4alpha-pock-3alpha
describe J_CHICKENPOX_44  4alpha-pock-4alpha
describe J_CHICKENPOX_45  4alpha-pock-5alpha
describe J_CHICKENPOX_46  4alpha-pock-6alpha
describe J_CHICKENPOX_47  4alpha-pock-7alpha
describe J_CHICKENPOX_48  4alpha-pock-8alpha
describe J_CHICKENPOX_51  5alpha-pock-1alpha
describe J_CHICKENPOX_52  5alpha-pock-2alpha
describe J_CHICKENPOX_53  5alpha-pock-3alpha
describe J_CHICKENPOX_54  5alpha-pock-4alpha
describe J_CHICKENPOX_55  5alpha-pock-5alpha
describe J_CHICKENPOX_56  5alpha-pock-6alpha
describe J_CHICKENPOX_57  5alpha-pock-7alpha
describe J_CHICKENPOX_61  6alpha-pock-1alpha
describe J_CHICKENPOX_62  6alpha-pock-2alpha
describe J_CHICKENPOX_63  6alpha-pock-3alpha
describe J_CHICKENPOX_64  6alpha-pock-4alpha
describe J_CHICKENPOX_65  6alpha-pock-5alpha
describe J_CHICKENPOX_66  6alpha-pock-6alpha
describe J_CHICKENPOX_71  7alpha-pock-1alpha
describe J_CHICKENPOX_72  7alpha-pock-2alpha
describe J_CHICKENPOX_73  7alpha-pock-3alpha
describe J_CHICKENPOX_74  7alpha-pock-4alpha
describe J_CHICKENPOX_75  7alpha-pock-5alpha
describe J_CHICKENPOX_81  8alpha-pock-1alpha
describe J_CHICKENPOX_82  8alpha-pock-2alpha
describe J_CHICKENPOX_83  8alpha-pock-3alpha
describe J_CHICKENPOX_84  8alpha-pock-4alpha
describe J_CHICKENPOX_91  9alpha-pock-1alpha
describe J_CHICKENPOX_92  9alpha-pock-2alpha
describe J_CHICKENPOX_93  9alpha-pock-3alpha
describe J_CHICKENPOX_101 10alpha-pock-1alpha
describe J_CHICKENPOX_102 10alpha-pock-2alpha

-- 
Tyler Nally
[EMAIL PROTECTED]
317-989-2028

RE: New bayes poison

2006-04-13 Thread Matthew.van.Eerde

[EMAIL PROTECTED] wrote:
 The spammer used the Yahoo! webmail infrastructure (probably via an
 automated HTTP client) to send his spam.

I've been reporting spam with good DK signatures to the mail provider:
http://add.yahoo.com/fast/help/us/mail/cgi_spam
https://services.google.com/inquiry/gmail_security2

DK and SPF are very useful in proving accountability for email sent.

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer

Re: TEXTAREA style=visibility: hidden

On Thu, Apr 13, 2006 at 09:45:13AM -0700, Kelson wrote:
 Nope.  No legit uses in email that I can think of.

Just because you can't think of a use doesn't mean people don't use them.
I see a lot of:

div ... style=...; visibility: hidden; ...
input ... style=display: none ...
div ... style=display: none ...

and a bunch of CSS which includes those two style attributes as well.

Seen in personal mails from places such as Yahoo! and American Express,
and newsletters from such places as the Boston Globe, CNN, the Denver
Post, Der Spiegel, Microsoft, the Washington Post, etc.

-- 
Randomly Generated Tagline:
When you are at Rome live in the Roman style; when you are elsewhere live
 as they live elsewhere.
-- St. Ambrose


pgpJo5l3EnQsH.pgp
Description: PGP signature

Re: How was this missed?

On Thu, Apr 13, 2006 at 09:55:59AM -0700, [EMAIL PROTECTED] wrote:
  2*0*6*984-2327 
  
 /2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8
 .?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6
 .?2.?0.?2.?2.?0.?3.?3/
 
 Or, perhaps, better:
 
 /2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5
 \D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D
 ?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3
 /

Now you won't catch

(206) 984-2327
[206] 984-2327
206 - 984 - 2327

etc.  FYI.

-- 
Randomly Generated Tagline:
Thinking of using NT for your critical apps?
 Isn't there enough suffering in the world?   - Sun Microsystems Ad


pgpWQfHPF8Wbh.pgp
Description: PGP signature

Re: xxxl spam

2006-04-13 Thread John Rudd



On Apr 13, 2006, at 9:56 AM, mouss wrote:



I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and 
if I can't find a justification, I disable it.




I wouldn't do that.

Just because legitimate mail triggers some rule doesn't mean that the 
rule is flawed.  Using your example, triggering no_real_name does not 
mean that the message is spam, it means that the message has _some_ 
similarity to at least some spam messages (the higher the score, the 
stronger the similarity).  And, that's absolutely true: statistically, 
when looking at the corpus which was used to create the rules database, 
a higher percentage of no_real_name messages were spam.


Now, if legit messages were not just triggering those rules, but also 
triggering enough rules to be flagged as spam ... then I would lower 
the value of those rules, but not disable those rules.  But I would 
only do that if I could see that there was a large percentage of 
should-be-ham messages being flagged as spam by that rule AND that rule 
wasn't being useful in flagging spam messages.  The reason is: if the 
message is being flagged, but it shouldn't have been, then perhaps my 
corpus of messages differs significantly enough from the SA internal 
corpus that my score values need to be different.  But that doesn't 
mean that the rules are so disjoint from tracking spam that they should 
be entirely disabled.  They just don't have the same weighting that my 
corpus needs.


If, instead, most messages passing through my mail servers, that 
triggered that rule, really did seem to be spam, then I wouldn't alter 
the score at all.  I would just pass the should-have-been-ham message 
into my bayesian learner and hope that a low bayes score for messages 
like that would offset the rules had flagged it as spam.

Re: How was this missed?

2006-04-13 Thread qqqq

!Sure, the pattern doesn't match.  . means there has to be some (any)
!character between the numbers.  984 has no characters between the
!numbers.

DOH!!!

Thanks. your right...

Re: relaydb and tarpit


Michael Monnerie wrote:

On Donnerstag, 13. April 2006 18:15 mouss wrote:

pfff. just reading the two first paragraphs is enough to look
elsewhere. some people seem to redefine what a false positive is.


I didn't mean that, I meant the tarpitting approach. Of course you have 
to set some (much) harder policy on which systems to put on your 
tarpit-blackhole list.


But *if* you have such a tarpit decider without FP (not sure how to do 
that...), couldn't this be a very good countermeasure to spam?




The issue is that:
- to tarpit, you need to devote some process or thread to that. and this 
is not unix specific. however you do, you'll need something to handle 
it. even with a packet filter, this still means many unnecessary states.


- the best you can do (at user level) is have an asynchronous process 
(which can handle many connections) to do so. now, either it is the 
listener, but then it needs to pass good connections to good 
listeners (which ones support this?) or the opposite (which ones support 
this?). of course, you can tune this to the point that you'd write a 
spam-OS. just to discover that spamers found othre ways to get to you.


- the most severe problem is to find a criteria to decide who is bad. 
This is what we're all trying to do! If I knew which clients are used by 
spamers, I would need no tarpit nor DNSBL nor SA nor bayes. I would just 
block these.


- sometimes, some ideas seem fine. but they don't resist serious 
analysis. you want to protect yourself, but that's just part of your 
goal. you want to do so at a limited cost and under some (non explicit 
but real) conditions (killing all the non-white people will 
statistically reduce terrorism, but would you do that?).


I have already seen systems that get idle when I connect to them. These 
systems just make me use my resources in vain, which is not a good 
practice. And I tend to believe these systems are driven by nuts, so are 
easily attacked (I never do that, for both personal and professional 
reasons. The best way to deal with them is to ignore them. route add, 
transport_maps, ... are enough to build one's own internet:)

Re: TEXTAREA style=visibility: hidden

2006-04-13 Thread Kelson


[EMAIL PROTECTED] wrote:

s/Scripting/CSS :hover/ is perfectly reasonable, though:
http://www.meyerweb.com/eric/css/edge/menus/demo.html

(doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...)


D'oh!

I blame the coffee.  There wasn't enough of it when I wrote my last post.

On the other hand, to apply :hover rules, you need an actual stylesheet 
and a way to select the element(s) you're showing.  You could still 
apply the visibility/display rules inline, but you might as well just 
put them in the stylesheet.


That said, I'm probably guilty of using inline styles for this sort of 
thing myself -- just not in email.


--
Kelson Vibber
SpeedGate Communications www.speed.net

dbg: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler

Mysql:

SHOW VARIABLES LIKE character%

Variable_name   Value
character_set_clientutf8
character_set_connectionutf8
character_set_database  latin1
character_set_results   utf8
character_set_serverutf8
character_set_systemutf8
character_sets_dir  /usr/share/mysql/charsets/

SHOW VARIABLES LIKE collation%

Variable_name   Value
collation_connectionutf8_general_ci
collation_database  latin1_swedish_ci
collation_serverutf8_general_ci

SHOW CREATE TABLE bayes_token

Table   Create Table
bayes_token CREATE TABLE `bayes_token` (\n  `id` int(11) NOT NULL default 
'0',\n  `token` char(5) NOT NULL default '',\n  `spam_count` int(11) NOT NULL 
default '0',\n  `ham_count` int(11) NOT NULL default '0',\n  `atime` int(11) 
NOT NULL default '0',\n  PRIMARY KEY  (`id`,`token`)\n) ENGINE=MyISAM DEFAULT 
CHARSET=latin1

Can't get Bayes to work. Here is my lint output:

[23913] dbg: logger: adding facilities: all
[23913] dbg: logger: logging level is DBG
[23913] dbg: generic: SpamAssassin version 3.1.1
[23913] dbg: config: score set 0 chosen.
[23913] dbg: util: running in taint mode? no
[23913] dbg: dns: is Net::DNS::Resolver available? yes
[23913] dbg: dns: Net::DNS version: 0.53
[23913] dbg: diag: perl platform: 5.008007 linux
[23913] dbg: diag: module installed: MIME::Base64, version 3.05
[23913] dbg: diag: module installed: HTML::Parser, version 3.48
[23913] dbg: diag: module installed: Digest::SHA1, version 2.11
[23913] dbg: diag: module installed: DB_File, version 1.814
[23913] dbg: diag: module installed: Net::DNS, version 0.53
[23913] dbg: diag: module installed: Net::SMTP, version 2.29
[23913] dbg: diag: module installed: Mail::SPF::Query, version 1.998
[23913] dbg: diag: module installed: IP::Country::Fast, version 309.002
[23913] dbg: diag: module installed: Razor2::Client::Agent, version 2.80
[23913] dbg: diag: module installed: Net::Ident, version 1.20
[23913] dbg: diag: module installed: IO::Socket::INET6, version 2.51
[23913] dbg: diag: module installed: IO::Socket::SSL, version 0.97
[23913] dbg: diag: module installed: Time::HiRes, version 1.82
[23913] dbg: diag: module installed: DBI, version 1.50
[23913] dbg: diag: module installed: Getopt::Long, version 2.34
[23913] dbg: diag: module installed: LWP::UserAgent, version 2.033
[23913] dbg: diag: module installed: HTTP::Date, version 1.46
[23913] dbg: diag: module installed: Archive::Tar, version 1.28
[23913] dbg: diag: module installed: IO::Zlib, version 1.04
[23913] dbg: ignore: using a test message to lint rules
[23913] dbg: config: using /etc/mail/spamassassin for site rules pre files
[23913] dbg: config: read file /etc/mail/spamassassin/init.pre
[23913] dbg: config: read file /etc/mail/spamassassin/v310.pre
[23913] dbg: config: using /var/lib/spamassassin/3.001001 for sys rules pre 
files
[23913] dbg: config: using /var/lib/spamassassin/3.001001 for default rules 
dir
[23913] dbg: config: read file 
/var/lib/spamassassin/3.001001/updates_spamassassin_org.cf
[23913] dbg: config: using /etc/mail/spamassassin for site rules dir
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf
[23913] dbg: config: read file 
/etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html4.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu1.cf
[23913] dbg:

Re: xxxl spam

mouss wrote:
 I also understand that US guys may get less encoded subjects, but at least in 
 .fr, we have that all the time (because of our accented letters, and because 
 many companies still use software that predates mime). and if I find a 
 legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. 

Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just developed in the US by US citizens for US
markets.

However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US, and I've got plenty of users that speak
multiple languages, not all of which use plain-ascii.

[no subject]

2006-04-13 Thread Daniel Madaoui

I want to use SA for a lot of users which don't have home directory.  
There mails are in /var/mail. The spammed mails are send to the  
recipient  in his file /var/mail/user with the addition  of SA.


The bayes and auto-whitelist database will be comun to anybody.

I use spamassassin  3.0.3 under freebsd 4.8

I use postfix and  SA through procmail.

postfix  main.cf:

mailbox_command = /usr/local/bin/procmail -t

I 've got the config file for procmail in /usr/local/etc/procmailrc

PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:.
LOGFILE=/var/log/procmail.log

:0fw: $LOGNAME.lock
*   256000
| /usr/local/bin/spamc

I launch spamd in this way:

/usr/local/bin/spamd -d -m10

and when I send a mail  I 've got this log:

Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded
Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user  
not specified with -u, not found, or set to root, falling back to  
nobody at /usr/local/bin/spamd line 1152, GEN5 line 4.
Apr 13 19:39:37 host spamd[48968]: spamd: processing message  
[EMAIL PROTECTED] for root:65534
Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.48968 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0)  
for root:65534 in 0.3 seconds, 744 bytes.
Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local 
host.example.com,raddr=127.0.0.1,rport=1645,mid=3822750E-3444-4F34-938F 
[EMAIL PROTECTED],autolearn=failed



The mail was in the mailbox but the bayes was not used.

So I restart the spamd daemon whith this options

/usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an  
user with its directory /home/spamassassin/.spamassassin )


He try to use the .spamassassin directory who belong to root (/ 
root/.spamssassin/ )


Apr 13 19:50:53 host spamd[49552]: spamd: connection from  
localhost.example.com [127.0.0.1] at port 1982
Apr 13 19:50:53 host spamd[49552]: spamd: processing message  
[EMAIL PROTECTED] for root:3005
Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.49552 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0)  
for root:3005 in 0.1 seconds, 736 bytes.
Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh 
ost.example.com,raddr=127.0.0.1,rport=1982,mid=C779CA6F-5CC6-4FD5-8547- 
[EMAIL PROTECTED],autolearn=failed


how can I configure spamd to use another directory for using bayes  
and auto-whitelist database ( in /home/spamassassin/.spamassassin ).  
It works if I change the permissions of /root/.spamassassin but it's  
not optimal.


Thanks for your help.

spamd using a bayes and auto-whitelist commun to anybody

2006-04-13 Thread Daniel Madaoui


It's better with a subject :(

I want to use SA for a lot of users which don't have home directory.  
There mails are in /var/mail. The spammed mails are send to the  
recipient  in his file /var/mail/user with the addition  of SA.


The bayes and auto-whitelist database will be commun to anybody.

I use spamassassin  3.0.3 under freebsd 4.8

I use postfix and  SA through procmail.

postfix  main.cf:

mailbox_command = /usr/local/bin/procmail -t

I 've got the config file for procmail in /usr/local/etc/procmailrc

PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:.
LOGFILE=/var/log/procmail.log

:0fw: $LOGNAME.lock
*   256000
| /usr/local/bin/spamc

I launch spamd in this way:

/usr/local/bin/spamd -d -m10

and when I send a mail  I 've got this log:

Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded
Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user  
not specified with -u, not found, or set to root, falling back to  
nobody at /usr/local/bin/spamd line 1152, GEN5 line 4.
Apr 13 19:39:37 host spamd[48968]: spamd: processing message  
[EMAIL PROTECTED] for root:65534
Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.48968 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0)  
for root:65534 in 0.3 seconds, 744 bytes.
Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local 
host.example.com,raddr=127.0.0.1,rport=1645,mid=3822750E-3444-4F34-938F 
[EMAIL PROTECTED],autolearn=failed



The mail was in the mailbox but the bayes was not used.

So I restart the spamd daemon whith this options

/usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an  
user with its directory /home/spamassassin/.spamassassin )


He try to use the .spamassassin directory who belong to root (/ 
root/.spamssassin/ )


Apr 13 19:50:53 host spamd[49552]: spamd: connection from  
localhost.example.com [127.0.0.1] at port 1982
Apr 13 19:50:53 host spamd[49552]: spamd: processing message  
[EMAIL PROTECTED] for root:3005
Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.49552 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0)  
for root:3005 in 0.1 seconds, 736 bytes.
Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh 
ost.example.com,raddr=127.0.0.1,rport=1982,mid=C779CA6F-5CC6-4FD5-8547- 
[EMAIL PROTECTED],autolearn=failed


how can I configure spamd to use another directory for using bayes  
and auto-whitelist database ( in /home/spamassassin/.spamassassin ).  
It works if I change the permissions of /root/.spamassassin but it's  
not optimal.


Thanks for your help.

Re: xxxl spam

Matt Kettler wrote:

mouss wrote:
I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl.

Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just developed in the US by US citizens for US
markets.

oh, I never meant that.

However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US,

what do you mean here? what would be your official language?

and I've got plenty of users that speak

multiple languages, not all of which use plain-ascii.

I guess so. now I'm not sure our situation isn't worst because people
tried to find non standard solutions that are still used. I still
remember the days when some customers were asking us to fix our
software because it broke their accents... hopefully these times are
gone, but I still see broken mail (much more than I should). actually,
I also see mail that doesn't get rendered correctly on thunderbird. so
I'll admit that the issue isn't really about accented chars...

Re:

Daniel Madaoui wrote:
snip
 So I restart the spamd daemon whith this options
 
 /usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an user
 with its directory /home/spamassassin/.spamassassin )
 
 He try to use the .spamassassin directory who belong to root
 (/root/.spamssassin/ )

Known bug, fixed in SA 3.1.0 and higher.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3900

Also be aware that unless your source has back ported fixes, SA 3.0.3 is
vulnerable to a two different DoS attacks triggered by sending it a specially
crafted messages.

3.0.4, possibly older versions: many to: headers DoS vulnerability
http://secunia.com/advisories/17386/
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-3351

3.0.1-3.0.3: malformed message with long headers DoS
http://secunia.com/advisories/15704/
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2005-1266

Re: Question regarding meta's

On Thu, Apr 13, 2006 at 08:40:30PM +0200, Ruben Cardenal wrote:
   header __ID1 /regexp1/
   header __ID2 /regexp2/
   header __ID3 /regexp3/
   meta MYID ((__ID1 + __ID2 + __ID3)  1)
 
   When a message triggers MYID, is there any way in the X-Spam-Report of
 showing which individual parts of the meta the message matched?

As far as I know, you can't do that without a plugin.  You could write
a small plugin such that _SUBTESTS_ or something would be rewritten to
the list of subtests (starts with __) that hit, and then include that
in the report.

-- 
Randomly Generated Tagline:
It's a question of consistency.  With a Republican president, I think
 you should just expect a certain amount of corruption -- And with a
 Democratic president, you should expect a [ bleep ] in the oval office.
 - Dave Foley on Politically Incorrect, 2001.12.07


pgpQsFQAHB14A.pgp
Description: PGP signature

Re: xxxl spam

mouss wrote:

 However, it is true that the vast majority of the corpus currently
 comes from
 folks who speak English (King's or Yankee) as a primary language, and
 that's a
 bit of a problem as it creates considerable bias in the rules.

 And even us US folks do have encoding issues. After all, English is
 not our
 official language here in the US,
 
 what do you mean here? what would be your official language?

The United States of America does not have any official language.

Americanized English is our common language, but it's not official. This means
that our government has to supply forms and materials in many languages for its
citizens, because it cannot require that citizens speak English.

For example, we have tax forms in French:

http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf

Admittedly non-english forms and services are somewhat secondary here, but they
are present.

 
  and I've got plenty of users that speak
 multiple languages, not all of which use plain-ascii.

 
 I guess so. now I'm not sure our situation isn't worst because people
 tried to find non standard solutions that are still used. I still
 remember the days when some customers were asking us to fix our
 software because it broke their accents... hopefully these times are
 gone, but I still see broken mail (much more than I should). actually,
 I also see mail that doesn't get rendered correctly on thunderbird. so
 I'll admit that the issue isn't really about accented chars...
 

Well, yours is certainly worse, or at least more prevalent, than the problem
here in the US, but I would not say it's the worst.

Generally speaking the worst case seems to be present in smaller Asian nations,
which have really extensive use of non-us characters. At least the French can
restrict their text to the same character set as English and still be readable,
although awkward due to the screwed up accents.

Also, smaller Asian nations still to this day have a high prevalence of
locally-grown mail clients, many of which are not even remotely RFC compliant,
but work well with others in the same locale.

They're also much more likely to make use of mixed-language text containing many
character sets. Speaking 2 or 3 different languages is fairly common in the
smaller countries of the Asian region, just due to necessity for trade with
neighboring countries.

Another area with this same basic issue would be the middle-east, but the number
of completely different character sets is smaller.

Re: xxxl spam

2006-04-13 Thread John Rudd



On Apr 13, 2006, at 11:40 AM, mouss wrote:


Matt Kettler wrote:


And even us US folks do have encoding issues. After all, English is 
not our

official language here in the US,


what do you mean here? what would be your official language?



The US doesn't have an official language.

By default, it is assumed to be English for most things, but it's not 
Official.  And, in some regions within the US, official govt signs 
and documents come in various languages (the reasons why this is true 
has to do with liability and legality; since there's no official 
language, you can't just pick _one_ language to publish your forms in, 
and be done with it; if you do, you're neglecting significant minority 
populations (and in some regions, those can be quite significant, such 
as spanish speakers in southern Florida or southern California), which 
then makes you vulnerable to law suits saying that you're 
discriminating and/or being negligent toward those significant 
minorities who aren't required to speak English, because English isn't 
an official language).


In order to simplify this, some states have tried to enact official 
language legislation.  Florida tried it.  Someone put Make English the 
official state language on a ballot.  The Cuban-American population in 
southern Florida got mad, and put Make Spanish the official state 
language on the ballot.  Neither one passed, but the Spanish one got 
more votes.  This pretty much silenced the English as state language 
movement in Florida, as their plan almost backfired on them.


I don't remember any other state trying it since.  The states where 
there wouldn't be any opposition don't need to make it a law ... and in 
states like California where it could matter (reducing costs in govt 
overhead by eliminating multiple languages and the requirement for 
multilingual workers), the English as state language supporters are 
afraid of what almost happened in Florida.


So ... sorry for the long winded explanation, but that's what he was 
saying.

Re: Question regarding meta's

Ruben Cardenal wrote:
 Hi,
 
   Let's say I have:
 
   header __ID1 /regexp1/
   header __ID2 /regexp2/
   header __ID3 /regexp3/
   meta MYID ((__ID1 + __ID2 + __ID3)  1)
   score MYID 1
 
   When a message triggers MYID, is there any way in the X-Spam-Report of
 showing which individual parts of the meta the message matched?

No, but you can do something like this:


 header ID1 /regexp1/
 score ID1 0.0001
 header ID2 /regexp2/
 score ID2 0.0001
 header ID3 /regexp3/
 score ID3 0.0001

 meta MYID ((ID1 + ID2 + ID3)  1)
 score MYID 1

This will force ID1-3 to be evaluated as normal rules and show up in the hit
list, but will give them an insignificant score. (You can't make the score 0,
that will disable them)

Re: New bayes poison

2006-04-13 Thread Michael Monnerie

On Donnerstag, 13. April 2006 19:05 Justin Mason wrote:
  0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs
 some mails 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain
 is testing DK 0.0 DK_SIGNED              Domain Keys: message has a
 signature -0.0 DK_VERIFIED            Domain Keys: signature passes
 verification

Where to get these rules?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpzTGsQwGKdS.pgp
Description: PGP signature

Re: SpamAssassin BZ downtime


Justin Mason wrote:

http://ajax.apache.org/%7ejefft/ :

  Bugzilla is moving to a new host, and is temporarily down while the
  database synchs. Apologies for the inconvenience.

--j.


Yay, it doesn't seem excruciatingly slow anymore.

Re: New bayes poison

On Thu, Apr 13, 2006 at 11:45:07PM +0200, Michael Monnerie wrote:
   0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs
  some mails 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain
  is testing DK 0.0 DK_SIGNED              Domain Keys: message has a
  signature -0.0 DK_VERIFIED            Domain Keys: signature passes
 
 Where to get these rules?

They're standard in 3.1 if you have enabled the
Mail::SpamAssassin::Plugin::DomainKeys plugin.

-- 
Randomly Generated Tagline:
Note that I am a proponent of Zen in the Art of Systems Administration,
 and thus believe that it's appropriate to present yourself as a beginner
 in all things. This helps you keep a fresh perspective and spank the
 unsuspecting at snooker. - Benjy Feen


pgppCURvZeVng.pgp
Description: PGP signature

Re: Proper use of user_prefs whitelist


Forrest Aldrich wrote:
I've been having some difficulty with the user_prefs and the whitelist_* 
fucntions.   I read the examples etc, and I believe these are correct, 
but clearly certain email is still being tagged (see below).   I wonder 
if someone can help clarify what I'm doing wrong here.


First, here are the directives in my ~/.spamassassin/user_prefs file, as 
it applies to this instance:


   whitelist_from_rcvd spamassassin.apache.org hermes.apache.org

   whitelist_from  *.apache.org




Here is the Sendmail log, showing the rejection:

   Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951:
   from=[EMAIL PROTECTED],


Your whitelist entries don't match 
[EMAIL PROTECTED].



This should work (note the *@):
whitelist_from_rcvd  [EMAIL PROTECTED]  hermes.apache.org


This would work, but would be trivially forged:
whitelist_from  [EMAIL PROTECTED]


Daryl

Re: Proper use of user_prefs whitelist

Daryl C. W. O'Shea wrote:
 
 Your whitelist entries don't match
 [EMAIL PROTECTED].
 
 
 This should work (note the *@):
 whitelist_from_rcvd  [EMAIL PROTECTED]  hermes.apache.org
 
 
 This would work, but would be trivially forged:
 whitelist_from  [EMAIL PROTECTED]
 

If you use the SPF plugin, another, very simple, way would be:

whitelist_from_spf [EMAIL PROTECTED]

Works great here.

I'd also suggest:

bayes_ignore_to users@spamassassin.apache.org
bayes_ignore_to spamassassin-users@incubator.apache.org
bayes_ignore_from [EMAIL PROTECTED]

To inhibit any bayes autolearning of list posts.

RE: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler


Fixed the problem. Backed up the bayes tables with sa-learn --backup, and save 
the userpref and awl tables with mysqldump. Then deleted out the entire 
database, set everything to utf8 in my.cnf, recreated the database and tables 
using utf8 as the default character set. Then restored from backup with 
sa-learn --restore and created the awl and userpref tables with the mysqldump 
files (after editing them to use utf8 as the default character set).

Just in cases anyone else has this problem in the future...

Haven't seen this one before... Premature padding of base64 data

2006-04-13 Thread Philip Prindeville

This appeared in my logs.  Running 3.1.1 on Linux FC3 (x86_64) with
Sendmail 8.13.1 and Mimedefang 2.56:

Apr 13 16:57:05 mail sendmail[23371]: NOQUEUE: connect from
lists-outbound.sourceforge.net [66.35.250.225]
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter
(mimdefang): init success to negotiate
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter: connect to
filters
Apr 13 16:57:05 mail mimedefang.pl[22325]: helo:
lists-outbound.sourceforge.net
(66.35.250.225) said helo lists-outbound.sourceforge.net
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371:
from=[EMAIL PROTECTED], size=15309, class=-60,
nrcpts=1, msgid=[EMAIL PROTECTED], proto=ESMTP, daemon=MTA-v4,
relay=lists-outbound.sourceforge.net [66.35.250.225]
Apr 13 16:57:06 mail mimedefang-multiplexor[11341]: Slave 8 stderr:
Premature padding of base64 data at
/usr/lib/perl5/vendor_perl/5.8.5/MIME/Decoder/Base64.pm
line 109.
Apr 13 16:57:07 mail mimedefang.pl[22325]: k3DMv5s4023371: hits=18.463,
req=5,
names=DATE_IN_PAST_96_XX,FORGED_MSGID_MSN,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,L_ALSA_DEVEL,MIME_HTML_ONLY,MSGID_SHORT,SPF_PASS,URIBL_SBL,URIBL_WS_SURBL
Apr 13 16:57:07 mail mimedefang.pl[22325]:
MDLOG,k3DMv5s4023371,spam,18.463,66.35.250.225,[EMAIL PROTECTED],[EMAIL 
PROTECTED],[Alsa-devel]
Your mortagee approval
Apr 13 16:57:07 mail mimedefang.pl[22325]: filter: k3DMv5s4023371: 
bounce=1 discard=1
Apr 13 16:57:07 mail mimedefang[11357]: k3DMv5s4023371: Bouncing because
filter
instructed us to
Apr 13 16:57:07 mail sendmail[23371]: k3DMv5s4023371: Milter: data,
reject=554 5.7.1 Message rejected; scored too high on the Spam test.


Any ideas?  Didn't see any mention of it in previous postings...

Interesting msg-id.  Hmmm.  Already a rule for that.  Good...

-Philip

Re: Haven't seen this one before... Premature padding of base64 data