Re: Non-English languages

2006-04-13 Thread Philip Prindeville
Kenneth Porter wrote:

>the classes dragged so incredibly slowly that I learned just a little 
>vocabulary and the most basic of grammar, and still led the class. I 
>usually finished my physics homework in that class while waiting for 
>everyone to catch up.
>
>As a programmer I envy my professional peers who can speak Japanese and 
>other non-European languages. My interest in programming languages extends 
>to natural languages, and I find their differences fascinating.
>
>To those of you who've successfully learned 2nd and 3rd languages as an 
>adult, what do you recommend for accomplishing that?
>  
>
> Same here. I took a couple years of high school Spanish in California and


Comic books. Or "bande dessinee" as it's called in French.

The story lines are often simple, and the pictures give a lot of context
to what is
being talked about.

-Philip



Re: Non-English languages (was: xxxl spam)

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 9:46 PM, Kenneth Porter wrote:

On Thursday, April 13, 2006 10:32 PM -0600 "Paul R. Ganci" 
<[EMAIL PROTECTED]> wrote:


Unfortunately I am still a linguistic idiot and only speak English 
... a

Buffalo, NY version at that! My grand parents came over from Italy in
1920 and promptly stopped speaking Italian around my parents. It 
forced
my parents to learn English at the cost of never learning Italian. 
There

is plently of room to accomodate two languages but neither the US
education system or home life is set up to do it.


Same here. I took a couple years of high school Spanish in California 
and the classes dragged so incredibly slowly that I learned just a 
little vocabulary and the most basic of grammar, and still led the 
class. I usually finished my physics homework in that class while 
waiting for everyone to catch up.


As a programmer I envy my professional peers who can speak Japanese 
and other non-European languages. My interest in programming languages 
extends to natural languages, and I find their differences 
fascinating.


To those of you who've successfully learned 2nd and 3rd languages as 
an adult, what do you recommend for accomplishing that?




I wish I had stuck with German in HS.  And I wish I had taken the time 
to learn Latin and/or Greek back when I had all of that free time on my 
hands in HS.  These days, it seems like everyone* ought to know (in 
addition to English) Spanish, and then a choice of French, Chinese, or 
Japanese.


(* in the US, I don't mean globally; globally, I'd probably say that we 
should all know 3 out of those 5, but that's just me making 
wild-a*s-suggestions for a world that doesn't care about my opinion ;-) 
)


And, reiterating Kenneth's question: Anyone have advice for an almost 
middle-aged person who wants to go about expanding his natural language 
capabilities?


(Hmm.. that's probably a dumb question for me.. I think all of those 
are taught at the university where I work... and can take free classes; 
could add Italian, Latin, and Greek too...; still for everyone who 
doesn't work for a University, but who has a similar thought, it's a 
good question to ponder)




Re: Getting spamassassin not to bother checking outgoing mail

2006-04-13 Thread Daryl C. W. O'Shea

Rob Tanner wrote:

Hi,

I installed spamassassin on my server a week ago and along with a number 
of Postfix settings, I'm nearly 100% spam free (I might get one spam a 
day now).  But one thing I haven't figured out.  I would like not to 
check mail originating in my address space.  Is that a spamassassin 
setting or something I need to do in postfix.


Postfix.  Being a filter, SpamAssassin will scan anything passed to it.

Daryl


Getting spamassassin not to bother checking outgoing mail

2006-04-13 Thread Rob Tanner

Hi,

I installed spamassassin on my server a week ago and along with a number 
of Postfix settings, I'm nearly 100% spam free (I might get one spam a 
day now).  But one thing I haven't figured out.  I would like not to 
check mail originating in my address space.  Is that a spamassassin 
setting or something I need to do in postfix.


Thanks,
Rob

--

Rob TannerDRACO DORMIENS NUNQUAM
[EMAIL PROTECTED]TITILLANDUS



Non-English languages (was: xxxl spam)

2006-04-13 Thread Kenneth Porter
On Thursday, April 13, 2006 10:32 PM -0600 "Paul R. Ganci" 
<[EMAIL PROTECTED]> wrote:



Unfortunately I am still a linguistic idiot and only speak English ... a
Buffalo, NY version at that! My grand parents came over from Italy in
1920 and promptly stopped speaking Italian around my parents. It forced
my parents to learn English at the cost of never learning Italian. There
is plently of room to accomodate two languages but neither the US
education system or home life is set up to do it.


Same here. I took a couple years of high school Spanish in California and 
the classes dragged so incredibly slowly that I learned just a little 
vocabulary and the most basic of grammar, and still led the class. I 
usually finished my physics homework in that class while waiting for 
everyone to catch up.


As a programmer I envy my professional peers who can speak Japanese and 
other non-European languages. My interest in programming languages extends 
to natural languages, and I find their differences fascinating.


To those of you who've successfully learned 2nd and 3rd languages as an 
adult, what do you recommend for accomplishing that?


Re: xxxl spam

2006-04-13 Thread Paul R. Ganci

Loren Wilton wrote:


I predict that the US will be the first country in the 21th century to
abandon English as the national language, while almost all other countries
seem to be mandating that their citizens learn English.

   Loren
 

The problem with the US is that we are linguistic idiots (a quote from 
Columbia University German Professor). If you go to Europe in general 
they speak at least two languages fluently. English and the country's 
native language. I have had the opportunity to work in both Geneva, 
Switzerland and and Milan, Italy. All business is conducted in English 
and everything else in Italian or in the case of Switzerland either 
German, Swiss German or French. Essentially all the engineers with whom 
I worked could speak two languages or in some cases four. I don't know 
what the big deal is. It shouldn't be "one" language but at least two 
here in the US. Start young when it is easy for kids to pick up the sounds.


Unfortunately I am still a linguistic idiot and only speak English ... a 
Buffalo, NY version at that! My grand parents came over from Italy in 
1920 and promptly stopped speaking Italian around my parents. It forced 
my parents to learn English at the cost of never learning Italian. There 
is plently of room to accomodate two languages but neither the US 
education system or home life is set up to do it.


--
Paul ([EMAIL PROTECTED])



Re: xxxl spam

2006-04-13 Thread Loren Wilton
> states like California where it could matter (reducing costs in govt
> overhead by eliminating multiple languages and the requirement for
> multilingual workers), the "English as state language" supporters are
> afraid of what almost happened in Florida.

Considering that at last census a "minority" of 54% of California residents
spoke Spanish as their primary or only language...


I predict that the US will be the first country in the 21th century to
abandon English as the national language, while almost all other countries
seem to be mandating that their citizens learn English.

Loren



Re: Haven't seen this one before... "Premature padding of base64 data"

2006-04-13 Thread Matt Kettler
Philip Prindeville wrote:
> Apr 13 16:57:06 mail mimedefang-multiplexor[11341]: Slave 8 stderr:
> Premature padding of base64 data at

> 
> 
> Any ideas?  Didn't see any mention of it in previous postings...
> 

Looks like someone screwed up their base-64 encoding. Base64 encodes into
"quartets", where 3 8-bit bytes get encoded as 4 ascii characters containing 6
bits of data each, so they can fit into ascii-text ranges.


At the end of the input, Base64 is normally padded out to make a quartet with =
characters if the input ends in a non-even multiple of 3 bytes (thus not making
a complete quartet)

Because it's a 3->4 encoding, even one byte of input generates two bytes of code
output, the first holding 6 of the 8 input bits, and the next holding the
remaining 2. In this case, the last two characters of the quartet get filled
with = as a pad.

If you were to think of base-64 as a series of the input is 3 8-bit bytes, like 
so:

12345678 12345678 12345678

That input gets re-split into 4 pieces of 6-bits each, like this:

123456 781234 567812 345678


But with a short input:

12345678

encodes as something like:

123456 78 '='  '='


The error message you see means that an = was inserted in the first or second
position of the last quartet of encoded data. That can never happen, unless the
data is invalid or corrupted.

Either some bytes were dropped, resulting in a base64 encoding that's not a
multiple of 4 bytes, causing a pad to get shifted up. Or more than 2 pads exist
at the end.









Haven't seen this one before... "Premature padding of base64 data"

2006-04-13 Thread Philip Prindeville
This appeared in my logs.  Running 3.1.1 on Linux FC3 (x86_64) with
Sendmail 8.13.1 and Mimedefang 2.56:

Apr 13 16:57:05 mail sendmail[23371]: NOQUEUE: connect from
lists-outbound.sourceforge.net [66.35.250.225]
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter
(mimdefang): init success to negotiate
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter: connect to
filters
Apr 13 16:57:05 mail mimedefang.pl[22325]: helo:
lists-outbound.sourceforge.net
(66.35.250.225) said "helo lists-outbound.sourceforge.net"
Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371:
from=<[EMAIL PROTECTED]>, size=15309, class=-60,
nrcpts=1, msgid=<[EMAIL PROTECTED]>, proto=ESMTP, daemon=MTA-v4,
relay=lists-outbound.sourceforge.net [66.35.250.225]
Apr 13 16:57:06 mail mimedefang-multiplexor[11341]: Slave 8 stderr:
Premature padding of base64 data at
/usr/lib/perl5/vendor_perl/5.8.5/MIME/Decoder/Base64.pm
line 109.
Apr 13 16:57:07 mail mimedefang.pl[22325]: k3DMv5s4023371: hits=18.463,
req=5,
names=DATE_IN_PAST_96_XX,FORGED_MSGID_MSN,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,L_ALSA_DEVEL,MIME_HTML_ONLY,MSGID_SHORT,SPF_PASS,URIBL_SBL,URIBL_WS_SURBL
Apr 13 16:57:07 mail mimedefang.pl[22325]:
MDLOG,k3DMv5s4023371,spam,18.463,66.35.250.225,<[EMAIL PROTECTED]>,<[EMAIL 
PROTECTED]>,[Alsa-devel]
Your mortagee approval
Apr 13 16:57:07 mail mimedefang.pl[22325]: filter: k3DMv5s4023371: 
bounce=1 discard=1
Apr 13 16:57:07 mail mimedefang[11357]: k3DMv5s4023371: Bouncing because
filter
instructed us to
Apr 13 16:57:07 mail sendmail[23371]: k3DMv5s4023371: Milter: data,
reject=554 5.7.1 Message rejected; scored too high on the Spam test.


Any ideas?  Didn't see any mention of it in previous postings...

Interesting msg-id.  Hmmm.  Already a rule for that.  Good...

-Philip





RE: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler

Fixed the problem. Backed up the bayes tables with sa-learn --backup, and save 
the userpref and awl tables with mysqldump. Then deleted out the entire 
database, set everything to utf8 in my.cnf, recreated the database and tables 
using utf8 as the default character set. Then restored from backup with 
sa-learn --restore and created the awl and userpref tables with the mysqldump 
files (after editing them to use utf8 as the default character set).

Just in cases anyone else has this problem in the future...



Re: Proper use of user_prefs "whitelist"

2006-04-13 Thread Matt Kettler
Daryl C. W. O'Shea wrote:
> 
> Your whitelist entries don't match
> "[EMAIL PROTECTED]".
> 
> 
> This should work (note the *@):
> whitelist_from_rcvd  [EMAIL PROTECTED]  hermes.apache.org
> 
> 
> This would work, but would be trivially forged:
> whitelist_from  [EMAIL PROTECTED]
> 

If you use the SPF plugin, another, very simple, way would be:

whitelist_from_spf [EMAIL PROTECTED]

Works great here.

I'd also suggest:

bayes_ignore_to users@spamassassin.apache.org
bayes_ignore_to spamassassin-users@incubator.apache.org
bayes_ignore_from [EMAIL PROTECTED]

To inhibit any bayes autolearning of list posts.




Re: Proper use of user_prefs "whitelist"

2006-04-13 Thread Daryl C. W. O'Shea

Forrest Aldrich wrote:
I've been having some difficulty with the user_prefs and the whitelist_* 
fucntions.   I read the examples etc, and I believe these are correct, 
but clearly certain email is still being tagged (see below).   I wonder 
if someone can help clarify what I'm doing wrong here.


First, here are the directives in my ~/.spamassassin/user_prefs file, as 
it applies to this instance:


   whitelist_from_rcvd spamassassin.apache.org hermes.apache.org

   whitelist_from  *.apache.org




Here is the Sendmail log, showing the rejection:

   Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951:
   from=<[EMAIL PROTECTED]>,


Your whitelist entries don't match 
"[EMAIL PROTECTED]".



This should work (note the *@):
whitelist_from_rcvd  [EMAIL PROTECTED]  hermes.apache.org


This would work, but would be trivially forged:
whitelist_from  [EMAIL PROTECTED]


Daryl



Re: New bayes poison

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 11:45:07PM +0200, Michael Monnerie wrote:
> >  0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs
> > some mails 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain
> > is testing DK 0.0 DK_SIGNED              Domain Keys: message has a
> > signature -0.0 DK_VERIFIED            Domain Keys: signature passes
> 
> Where to get these rules?

They're standard in 3.1 if you have enabled the
Mail::SpamAssassin::Plugin::DomainKeys plugin.

-- 
Randomly Generated Tagline:
"Note that I am a proponent of Zen in the Art of Systems Administration,
 and thus believe that it's appropriate to present yourself as a beginner
 in all things. This helps you keep a fresh perspective and spank the
 unsuspecting at snooker." - Benjy Feen


pgppCURvZeVng.pgp
Description: PGP signature


Re: SpamAssassin BZ downtime

2006-04-13 Thread Daryl C. W. O'Shea

Justin Mason wrote:

http://ajax.apache.org/%7ejefft/ :

  Bugzilla is moving to a new host, and is temporarily down while the
  database synchs. Apologies for the inconvenience.

--j.


Yay, it doesn't seem excruciatingly slow anymore.



Re: New bayes poison

2006-04-13 Thread Michael Monnerie
On Donnerstag, 13. April 2006 19:05 Justin Mason wrote:
>  0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs
> some mails 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain
> is testing DK 0.0 DK_SIGNED              Domain Keys: message has a
> signature -0.0 DK_VERIFIED            Domain Keys: signature passes
> verification

Where to get these rules?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpzTGsQwGKdS.pgp
Description: PGP signature


Re: Question regarding meta's

2006-04-13 Thread Matt Kettler
Ruben Cardenal wrote:
> Hi,
> 
>   Let's say I have:
> 
>   header __ID1 /regexp1/
>   header __ID2 /regexp2/
>   header __ID3 /regexp3/
>   meta MYID ((__ID1 + __ID2 + __ID3) > 1)
>   score MYID 1
> 
>   When a message triggers MYID, is there any way in the X-Spam-Report of
> showing which individual parts of the meta the message matched?

No, but you can do something like this:


 header ID1 /regexp1/
 score ID1 0.0001
 header ID2 /regexp2/
 score ID2 0.0001
 header ID3 /regexp3/
 score ID3 0.0001

 meta MYID ((ID1 + ID2 + ID3) > 1)
 score MYID 1

This will force ID1-3 to be evaluated as normal rules and show up in the hit
list, but will give them an insignificant score. (You can't make the score 0,
that will disable them)


Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 11:40 AM, mouss wrote:


Matt Kettler wrote:


And even us US folks do have encoding issues. After all, English is 
not our

official language here in the US,


what do you mean here? what would be your official language?



The US doesn't have an official language.

By default, it is assumed to be English for most things, but it's not 
"Official".  And, in some regions within the US, official govt signs 
and documents come in various languages (the reasons why this is true 
has to do with liability and legality; since there's no official 
language, you can't just pick _one_ language to publish your forms in, 
and be done with it; if you do, you're neglecting significant minority 
populations (and in some regions, those can be quite significant, such 
as spanish speakers in southern Florida or southern California), which 
then makes you vulnerable to law suits saying that you're 
discriminating and/or being negligent toward those significant 
minorities who aren't required to speak English, because English isn't 
an official language).


In order to simplify this, some states have tried to enact official 
language legislation.  Florida tried it.  Someone put "Make English the 
official state language" on a ballot.  The Cuban-American population in 
southern Florida got mad, and put "Make Spanish the official state 
language" on the ballot.  Neither one passed, but the Spanish one got 
more votes.  This pretty much silenced the "English as state language" 
movement in Florida, as their plan almost backfired on them.


I don't remember any other state trying it since.  The states where 
there wouldn't be any opposition don't need to make it a law ... and in 
states like California where it could matter (reducing costs in govt 
overhead by eliminating multiple languages and the requirement for 
multilingual workers), the "English as state language" supporters are 
afraid of what almost happened in Florida.


So ... sorry for the long winded explanation, but that's what he was 
saying.




Re: xxxl spam

2006-04-13 Thread Matt Kettler
mouss wrote:

>> However, it is true that the vast majority of the corpus currently
>> comes from
>> folks who speak English (King's or Yankee) as a primary language, and
>> that's a
>> bit of a problem as it creates considerable bias in the rules.
>>
>> And even us US folks do have encoding issues. After all, English is
>> not our
>> official language here in the US,
> 
> what do you mean here? what would be your official language?

The United States of America does not have any official language.

Americanized English is our common language, but it's not official. This means
that our government has to supply forms and materials in many languages for its
citizens, because it cannot require that citizens speak English.

For example, we have tax forms in French:

http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf

Admittedly non-english forms and services are somewhat secondary here, but they
are present.

> 
>  and I've got plenty of users that speak
>> multiple languages, not all of which use plain-ascii.
>>
> 
> I guess so. now I'm not sure our situation isn't worst because people
> tried to find non standard solutions that are still used. I still
> remember the days when some customers were asking us to "fix" our
> software because "it broke their accents"... hopefully these times are
> gone, but I still see "broken" mail (much more than I should). actually,
> I also see mail that doesn't get rendered correctly on thunderbird. so
> I'll admit that the issue isn't really about accented chars...
> 

Well, yours is certainly worse, or at least more prevalent, than the problem
here in the US, but I would not say it's the worst.

Generally speaking the worst case seems to be present in smaller Asian nations,
which have really extensive use of non-us characters. At least the French can
restrict their text to the same character set as English and still be readable,
although awkward due to the screwed up accents.

Also, smaller Asian nations still to this day have a high prevalence of
locally-grown mail clients, many of which are not even remotely RFC compliant,
but work well with others in the same locale.

They're also much more likely to make use of mixed-language text containing many
character sets. Speaking 2 or 3 different languages is fairly common in the
smaller countries of the Asian region, just due to necessity for trade with
neighboring countries.

Another area with this same basic issue would be the middle-east, but the number
of completely different character sets is smaller.






Re: Question regarding meta's

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 08:40:30PM +0200, Ruben Cardenal wrote:
>   header __ID1 /regexp1/
>   header __ID2 /regexp2/
>   header __ID3 /regexp3/
>   meta MYID ((__ID1 + __ID2 + __ID3) > 1)
> 
>   When a message triggers MYID, is there any way in the X-Spam-Report of
> showing which individual parts of the meta the message matched?

As far as I know, you can't do that without a plugin.  You could write
a small plugin such that _SUBTESTS_ or something would be rewritten to
the list of subtests (starts with "__") that hit, and then include that
in the report.

-- 
Randomly Generated Tagline:
"It's a question of consistency.  With a Republican president, I think
 you should just expect a certain amount of corruption -- And with a
 Democratic president, you should expect a [ bleep ] in the oval office."
 - Dave Foley on Politically Incorrect, 2001.12.07


pgpQsFQAHB14A.pgp
Description: PGP signature


Re:

2006-04-13 Thread Matt Kettler
Daniel Madaoui wrote:

> So I restart the spamd daemon whith this options
> 
> /usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an user
> with its directory /home/spamassassin/.spamassassin )
> 
> He try to use the .spamassassin directory who belong to root
> (/root/.spamssassin/ )

Known bug, fixed in SA 3.1.0 and higher.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3900

Also be aware that unless your source has back ported fixes, SA 3.0.3 is
vulnerable to a two different DoS attacks triggered by sending it a specially
crafted messages.

3.0.4, possibly older versions: "many to: headers" DoS vulnerability
http://secunia.com/advisories/17386/
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-3351

3.0.1-3.0.3: malformed message with long headers DoS
http://secunia.com/advisories/15704/
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2005-1266



Question regarding meta's

2006-04-13 Thread Ruben Cardenal
Hi,

  Let's say I have:

  header __ID1 /regexp1/
  header __ID2 /regexp2/
  header __ID3 /regexp3/
  meta MYID ((__ID1 + __ID2 + __ID3) > 1)
  score MYID 1

  When a message triggers MYID, is there any way in the X-Spam-Report of
showing which individual parts of the meta the message matched?

Ruben




Re: xxxl spam

2006-04-13 Thread mouss

Matt Kettler wrote:

mouss wrote:
I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. 


Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just "developed in the US by US citizens for US
markets".


oh, I never meant that.



However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US,


what do you mean here? what would be your official language?

 and I've got plenty of users that speak

multiple languages, not all of which use plain-ascii.



I guess so. now I'm not sure our situation isn't worst because people 
tried to find non standard solutions that are still used. I still 
remember the days when some customers were asking us to "fix" our 
software because "it broke their accents"... hopefully these times are 
gone, but I still see "broken" mail (much more than I should). actually, 
I also see mail that doesn't get rendered correctly on thunderbird. so 
I'll admit that the issue isn't really about accented chars...




spamd using a bayes and auto-whitelist commun to anybody

2006-04-13 Thread Daniel Madaoui

It's better with a subject :(

I want to use SA for a lot of users which don't have home directory.  
There mails are in /var/mail. The spammed mails are send to the  
recipient  in his file /var/mail/user with the addition  of SA.


The bayes and auto-whitelist database will be commun to anybody.

I use spamassassin  3.0.3 under freebsd 4.8

I use postfix and  SA through procmail.

postfix  main.cf:

mailbox_command = /usr/local/bin/procmail -t

I 've got the config file for procmail in /usr/local/etc/procmailrc

PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:.
LOGFILE=/var/log/procmail.log

:0fw: $LOGNAME.lock
*  < 256000
| /usr/local/bin/spamc

I launch spamd in this way:

/usr/local/bin/spamd -d -m10

and when I send a mail  I 've got this log:

Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded
Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user  
not specified with -u, not found, or set to root, falling back to  
nobody at /usr/local/bin/spamd line 1152,  line 4.
Apr 13 19:39:37 host spamd[48968]: spamd: processing message  
<[EMAIL PROTECTED]> for root:65534
Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.48968 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0)  
for root:65534 in 0.3 seconds, 744 bytes.
Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local 
host.example.com,raddr=127.0.0.1,rport=1645,mid=<3822750E-3444-4F34-938F 
[EMAIL PROTECTED]>,autolearn=failed



The mail was in the mailbox but the bayes was not used.

So I restart the spamd daemon whith this options

/usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an  
user with its directory /home/spamassassin/.spamassassin )


He try to use the .spamassassin directory who belong to root (/ 
root/.spamssassin/ )


Apr 13 19:50:53 host spamd[49552]: spamd: connection from  
localhost.example.com [127.0.0.1] at port 1982
Apr 13 19:50:53 host spamd[49552]: spamd: processing message  
<[EMAIL PROTECTED]> for root:3005
Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.49552 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0)  
for root:3005 in 0.1 seconds, 736 bytes.
Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh 
ost.example.com,raddr=127.0.0.1,rport=1982,mid=[EMAIL PROTECTED]>,autolearn=failed


how can I configure spamd to use another directory for using bayes  
and auto-whitelist database ( in /home/spamassassin/.spamassassin ).  
It works if I change the permissions of /root/.spamassassin but it's  
not optimal.


Thanks for your help.


[no subject]

2006-04-13 Thread Daniel Madaoui
I want to use SA for a lot of users which don't have home directory.  
There mails are in /var/mail. The spammed mails are send to the  
recipient  in his file /var/mail/user with the addition  of SA.


The bayes and auto-whitelist database will be comun to anybody.

I use spamassassin  3.0.3 under freebsd 4.8

I use postfix and  SA through procmail.

postfix  main.cf:

mailbox_command = /usr/local/bin/procmail -t

I 've got the config file for procmail in /usr/local/etc/procmailrc

PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:.
LOGFILE=/var/log/procmail.log

:0fw: $LOGNAME.lock
*  < 256000
| /usr/local/bin/spamc

I launch spamd in this way:

/usr/local/bin/spamd -d -m10

and when I send a mail  I 've got this log:

Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded
Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user  
not specified with -u, not found, or set to root, falling back to  
nobody at /usr/local/bin/spamd line 1152,  line 4.
Apr 13 19:39:37 host spamd[48968]: spamd: processing message  
<[EMAIL PROTECTED]> for root:65534
Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.48968 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0)  
for root:65534 in 0.3 seconds, 744 bytes.
Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local 
host.example.com,raddr=127.0.0.1,rport=1645,mid=<3822750E-3444-4F34-938F 
[EMAIL PROTECTED]>,autolearn=failed



The mail was in the mailbox but the bayes was not used.

So I restart the spamd daemon whith this options

/usr/local/bin/spamd -d -m10  -u spamassassin ( spamassassin in an  
user with its directory /home/spamassassin/.spamassassin )


He try to use the .spamassassin directory who belong to root (/ 
root/.spamssassin/ )


Apr 13 19:50:53 host spamd[49552]: spamd: connection from  
localhost.example.com [127.0.0.1] at port 1982
Apr 13 19:50:53 host spamd[49552]: spamd: processing message  
<[EMAIL PROTECTED]> for root:3005
Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create  
tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 
49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- 
whitelist file failed: locker: safe_lock: cannot create tmp lockfile / 
root/.spamassassin/auto-whitelist.lock.example.com.49552 for / 
root/.spamassassin/auto-whitelist.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot  
create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552  
for /root/.spamassassin/bayes.lock: Permission denied
Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0)  
for root:3005 in 0.1 seconds, 736 bytes.
Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED  
scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh 
ost.example.com,raddr=127.0.0.1,rport=1982,mid=[EMAIL PROTECTED]>,autolearn=failed


how can I configure spamd to use another directory for using bayes  
and auto-whitelist database ( in /home/spamassassin/.spamassassin ).  
It works if I change the permissions of /root/.spamassassin but it's  
not optimal.


Thanks for your help.


Re: xxxl spam

2006-04-13 Thread Matt Kettler
mouss wrote:
> I also understand that US guys may get less encoded subjects, but at least in 
> .fr, we have that all the time (because of our accented letters, and because 
> many companies still use software that predates mime). and if I find a 
> legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. 

Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just "developed in the US by US citizens for US
markets".

However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US, and I've got plenty of users that speak
multiple languages, not all of which use plain-ascii.




Re: xxxl spam

2006-04-13 Thread mouss

John Rudd wrote:


I wouldn't do that.


Please note that I "said it the short" way. I of course don't jump to 
disable rules. I do check whether the message should have been flagged 
as spam (a "reasonable" FP). if so, that's life. If possible, I see if I 
can create a rule to make it get hammed without breaking the whole 
filter. If however, the tests that made it classify as spam are not 
clear to me, then I check if I can lower some. but some tests just get 
disabled.






Just because legitimate mail triggers some rule doesn't mean that the 
rule is flawed.  Using your example, triggering "no_real_name" does not 
mean that the message is spam, it means that the message has _some_ 
similarity to at least some spam messages (the higher the score, the 
stronger the similarity).  And, that's absolutely true: statistically, 
when looking at the corpus which was used to create the rules database, 
a higher percentage of "no_real_name" messages were spam.


As I already said in another thread, the statistics results depend on 
the attributes you are checking. the perceptron will not wake up and say 
"hey, come on, this attribute is not good". so, if you run a mass check 
with rules like:

- IP parity
- first letter of sender
- mailer: "the bat" for instance
- relay = comcast, free.fr, ...
...

then the perceptron will give you what you asked for: scores.

I also understand that US guys may get less encoded subjects, but at 
least in .fr, we have that all the time (because of our accented 
letters, and because many companies still use software that predates 
mime). and if I find a legitimate IP in a dnsbl used by SA, then I just 
remove that dnsbl.




Now, if legit messages were not just triggering those rules, but also 
triggering enough rules to be flagged as spam ... then I would lower the 
value of those rules, but not disable those rules. 


I disable the rules, and if I get false negatives, I see what I can do. 
up so far, (the very few) missed spam would have been missed anyway.


 But I would only do
that if I could see that there was a large percentage of should-be-ham 
messages being flagged as spam by that rule AND that rule wasn't being 
useful in flagging spam messages.  The reason is: if the message is 
being flagged, but it shouldn't have been, then perhaps my "corpus" of 
messages differs significantly enough from the SA internal corpus that 
my score values need to be different.  But that doesn't mean that the 
rules are so disjoint from tracking spam that they should be entirely 
disabled.  They just don't have the same weighting that my corpus needs.


If, instead, most messages passing through my mail servers, that 
triggered that rule, really did seem to be spam, then I wouldn't alter 
the score at all.  I would just pass the should-have-been-ham message 
into my bayesian learner and hope that a low bayes score for messages 
like that would offset the rules had flagged it as spam.




everybody has its own situation. I am very FP sensitive. I prefer to get 
spam than to lose an important mail. after all, I do review my spam. so 
the less FPs there are, the faster I can review my junk folder.


dbg: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler
Mysql:

SHOW VARIABLES LIKE "character%"

Variable_name   Value
character_set_clientutf8
character_set_connectionutf8
character_set_database  latin1
character_set_results   utf8
character_set_serverutf8
character_set_systemutf8
character_sets_dir  /usr/share/mysql/charsets/

SHOW VARIABLES LIKE "collation%"

Variable_name   Value
collation_connectionutf8_general_ci
collation_database  latin1_swedish_ci
collation_serverutf8_general_ci

SHOW CREATE TABLE bayes_token

Table   Create Table
bayes_token CREATE TABLE `bayes_token` (\n  `id` int(11) NOT NULL default 
'0',\n  `token` char(5) NOT NULL default '',\n  `spam_count` int(11) NOT NULL 
default '0',\n  `ham_count` int(11) NOT NULL default '0',\n  `atime` int(11) 
NOT NULL default '0',\n  PRIMARY KEY  (`id`,`token`)\n) ENGINE=MyISAM DEFAULT 
CHARSET=latin1

Can't get Bayes to work. Here is my lint output:

[23913] dbg: logger: adding facilities: all
[23913] dbg: logger: logging level is DBG
[23913] dbg: generic: SpamAssassin version 3.1.1
[23913] dbg: config: score set 0 chosen.
[23913] dbg: util: running in taint mode? no
[23913] dbg: dns: is Net::DNS::Resolver available? yes
[23913] dbg: dns: Net::DNS version: 0.53
[23913] dbg: diag: perl platform: 5.008007 linux
[23913] dbg: diag: module installed: MIME::Base64, version 3.05
[23913] dbg: diag: module installed: HTML::Parser, version 3.48
[23913] dbg: diag: module installed: Digest::SHA1, version 2.11
[23913] dbg: diag: module installed: DB_File, version 1.814
[23913] dbg: diag: module installed: Net::DNS, version 0.53
[23913] dbg: diag: module installed: Net::SMTP, version 2.29
[23913] dbg: diag: module installed: Mail::SPF::Query, version 1.998
[23913] dbg: diag: module installed: IP::Country::Fast, version 309.002
[23913] dbg: diag: module installed: Razor2::Client::Agent, version 2.80
[23913] dbg: diag: module installed: Net::Ident, version 1.20
[23913] dbg: diag: module installed: IO::Socket::INET6, version 2.51
[23913] dbg: diag: module installed: IO::Socket::SSL, version 0.97
[23913] dbg: diag: module installed: Time::HiRes, version 1.82
[23913] dbg: diag: module installed: DBI, version 1.50
[23913] dbg: diag: module installed: Getopt::Long, version 2.34
[23913] dbg: diag: module installed: LWP::UserAgent, version 2.033
[23913] dbg: diag: module installed: HTTP::Date, version 1.46
[23913] dbg: diag: module installed: Archive::Tar, version 1.28
[23913] dbg: diag: module installed: IO::Zlib, version 1.04
[23913] dbg: ignore: using a test message to lint rules
[23913] dbg: config: using "/etc/mail/spamassassin" for site rules pre files
[23913] dbg: config: read file /etc/mail/spamassassin/init.pre
[23913] dbg: config: read file /etc/mail/spamassassin/v310.pre
[23913] dbg: config: using "/var/lib/spamassassin/3.001001" for sys rules pre 
files
[23913] dbg: config: using "/var/lib/spamassassin/3.001001" for default rules 
dir
[23913] dbg: config: read file 
/var/lib/spamassassin/3.001001/updates_spamassassin_org.cf
[23913] dbg: config: using "/etc/mail/spamassassin" for site rules dir
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf
[23913] dbg: config: read file 
/etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html4.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu1.cf
[23

Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Kelson

[EMAIL PROTECTED] wrote:

s/Scripting/CSS :hover/ is perfectly reasonable, though:
http://www.meyerweb.com/eric/css/edge/menus/demo.html

(doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...)


D'oh!

I blame the coffee.  There wasn't enough of it when I wrote my last post.

On the other hand, to apply :hover rules, you need an actual stylesheet 
and a way to select the element(s) you're showing.  You could still 
apply the visibility/display rules inline, but you might as well just 
put them in the stylesheet.


That said, I'm probably guilty of using inline styles for this sort of 
thing myself -- just not in email.


--
Kelson Vibber
SpeedGate Communications 


Re: relaydb and tarpit

2006-04-13 Thread mouss

Michael Monnerie wrote:

On Donnerstag, 13. April 2006 18:15 mouss wrote:

pfff. just reading the two first paragraphs is enough to look
elsewhere. some people seem to redefine what a false positive is.


I didn't mean that, I meant the tarpitting approach. Of course you have 
to set some (much) harder policy on which systems to put on your 
tarpit-blackhole list.


But *if* you have such a "tarpit decider without FP" (not sure how to do 
that...), couldn't this be a very good countermeasure to spam?




The issue is that:
- to tarpit, you need to devote some process or thread to that. and this 
is not unix specific. however you do, you'll need something to handle 
it. even with a packet filter, this still means many unnecessary states.


- the best you can do (at user level) is have an asynchronous process 
(which can handle many connections) to do so. now, either it is the 
listener, but then it needs to pass "good" connections to "good" 
listeners (which ones support this?) or the opposite (which ones support 
this?). of course, you can tune this to the point that you'd write a 
spam-OS. just to discover that spamers found othre ways to get to you.


- the most severe problem is to find a criteria to decide who is bad. 
This is what we're all trying to do! If I knew which clients are used by 
spamers, I would need no tarpit nor DNSBL nor SA nor bayes. I would just 
block these.


- sometimes, some ideas seem fine. but they don't resist serious 
analysis. you want to protect yourself, but that's just part of your 
goal. you want to do so at a limited cost and under some (non explicit 
but real) conditions (killing all the non-white people will 
statistically reduce terrorism, but would you do that?).


I have already seen systems that get idle when I connect to them. These 
systems just make me use my resources in vain, which is not a good 
practice. And I tend to believe these systems are driven by nuts, so are 
easily attacked (I never do that, for both personal and professional 
reasons. The best way to deal with them is to ignore them. route add, 
transport_maps, ... are enough to build one's own internet:)




Re: How was this missed?

2006-04-13 Thread qqqq
!Sure, the pattern doesn't match.  "." means there has to be some (any)
!character between the numbers.  "984" has no characters between the
!numbers.

DOH!!!

Thanks. your right...




Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 9:56 AM, mouss wrote:



I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and 
if I can't find a justification, I disable it.




I wouldn't do that.

Just because legitimate mail triggers some rule doesn't mean that the 
rule is flawed.  Using your example, triggering "no_real_name" does not 
mean that the message is spam, it means that the message has _some_ 
similarity to at least some spam messages (the higher the score, the 
stronger the similarity).  And, that's absolutely true: statistically, 
when looking at the corpus which was used to create the rules database, 
a higher percentage of "no_real_name" messages were spam.


Now, if legit messages were not just triggering those rules, but also 
triggering enough rules to be flagged as spam ... then I would lower 
the value of those rules, but not disable those rules.  But I would 
only do that if I could see that there was a large percentage of 
should-be-ham messages being flagged as spam by that rule AND that rule 
wasn't being useful in flagging spam messages.  The reason is: if the 
message is being flagged, but it shouldn't have been, then perhaps my 
"corpus" of messages differs significantly enough from the SA internal 
corpus that my score values need to be different.  But that doesn't 
mean that the rules are so disjoint from tracking spam that they should 
be entirely disabled.  They just don't have the same weighting that my 
corpus needs.


If, instead, most messages passing through my mail servers, that 
triggered that rule, really did seem to be spam, then I wouldn't alter 
the score at all.  I would just pass the should-have-been-ham message 
into my bayesian learner and hope that a low bayes score for messages 
like that would offset the rules had flagged it as spam.




Re: How was this missed?

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 09:55:59AM -0700, [EMAIL PROTECTED] wrote:
> >> 2*0*6*984-2327 
> > 
> /2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8
> .?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6
> .?2.?0.?2.?2.?0.?3.?3/
> 
> Or, perhaps, better:
> 
> /2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5
> \D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D
> ?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3
> /

Now you won't catch

(206) 984-2327
[206] 984-2327
206 - 984 - 2327

etc.  FYI.

-- 
Randomly Generated Tagline:
"Thinking of using NT for your critical apps?
 Isn't there enough suffering in the world?"   - Sun Microsystems Ad


pgpWQfHPF8Wbh.pgp
Description: PGP signature


Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 09:45:13AM -0700, Kelson wrote:
> Nope.  No legit uses in email that I can think of.

Just because you can't think of a use doesn't mean people don't use them.
I see a lot of:



pgpJo5l3EnQsH.pgp
Description: PGP signature


RE: New bayes poison

2006-04-13 Thread Matthew.van.Eerde
[EMAIL PROTECTED] wrote:
> The spammer used the Yahoo! webmail infrastructure (probably via an
> automated HTTP client) to send his spam.

I've been reporting spam with good DK signatures to the mail provider:
http://add.yahoo.com/fast/help/us/mail/cgi_spam
https://services.google.com/inquiry/gmail_security2

DK and SPF are very useful in proving accountability for email sent.

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: How was this missed?

2006-04-13 Thread Tyler Nally
On Thursday 13 April 2006 11:55, [EMAIL PROTECTED] wrote:
> Theo Van Dinter wrote:
> > On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
> >> Any idea how this one got through?
> >> 
> >> body BRIAN_PHONE_NUMBERS
> >>
> /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7
> .9|2.0.6.3.3.8.6.0.6.1|2.0.6
> >> .2.0.2.2.0.3.3/ 

There's a ruleset I use from:

   http://www.emtinc.net/includes/chickenpox.cf

.. that checks for the d.i.f.f.e.r.e.n.t kinds of 
spacing like this... a lot of the spam that has
those kinds of characteristics will have several
of the CHICKENPOX_ rules that have fired positive.

It checks for some 60+ different patterns..

describe J_CHICKENPOX_12  1alpha-pock-2alpha
describe J_CHICKENPOX_13  1alpha-pock-3alpha
describe J_CHICKENPOX_14  1alpha-pock-4alpha
describe J_CHICKENPOX_15  1alpha-pock-5alpha
describe J_CHICKENPOX_16  1alpha-pock-6alpha
describe J_CHICKENPOX_17  1alpha-pock-7alpha
describe J_CHICKENPOX_18  1alpha-pock-8alpha
describe J_CHICKENPOX_19  1alpha-pock-9alpha
describe J_CHICKENPOX_110 1alpha-pock-10alpha
describe J_CHICKENPOX_111 1alpha-pock-11alpha
describe J_CHICKENPOX_21  2alpha-pock-1alpha
describe J_CHICKENPOX_22  2alpha-pock-2alpha
describe J_CHICKENPOX_23  2alpha-pock-3alpha
describe J_CHICKENPOX_24  2alpha-pock-4alpha
describe J_CHICKENPOX_25  2alpha-pock-5alpha
describe J_CHICKENPOX_26  2alpha-pock-6alpha
describe J_CHICKENPOX_27  2alpha-pock-7alpha
describe J_CHICKENPOX_28  2alpha-pock-8alpha
describe J_CHICKENPOX_29  2alpha-pock-9alpha
describe J_CHICKENPOX_210 2alpha-pock-10alpha
describe J_CHICKENPOX_31  3alpha-pock-1alpha
describe J_CHICKENPOX_32  3alpha-pock-2alpha
describe J_CHICKENPOX_33  3alpha-pock-3alpha
describe J_CHICKENPOX_34  3alpha-pock-4alpha
describe J_CHICKENPOX_35  3alpha-pock-5alpha
describe J_CHICKENPOX_36  3alpha-pock-6alpha
describe J_CHICKENPOX_37  3alpha-pock-7alpha
describe J_CHICKENPOX_38  3alpha-pock-8alpha
describe J_CHICKENPOX_39  3alpha-pock-9alpha
describe J_CHICKENPOX_41  4alpha-pock-1alpha
describe J_CHICKENPOX_42  4alpha-pock-2alpha
describe J_CHICKENPOX_43  4alpha-pock-3alpha
describe J_CHICKENPOX_44  4alpha-pock-4alpha
describe J_CHICKENPOX_45  4alpha-pock-5alpha
describe J_CHICKENPOX_46  4alpha-pock-6alpha
describe J_CHICKENPOX_47  4alpha-pock-7alpha
describe J_CHICKENPOX_48  4alpha-pock-8alpha
describe J_CHICKENPOX_51  5alpha-pock-1alpha
describe J_CHICKENPOX_52  5alpha-pock-2alpha
describe J_CHICKENPOX_53  5alpha-pock-3alpha
describe J_CHICKENPOX_54  5alpha-pock-4alpha
describe J_CHICKENPOX_55  5alpha-pock-5alpha
describe J_CHICKENPOX_56  5alpha-pock-6alpha
describe J_CHICKENPOX_57  5alpha-pock-7alpha
describe J_CHICKENPOX_61  6alpha-pock-1alpha
describe J_CHICKENPOX_62  6alpha-pock-2alpha
describe J_CHICKENPOX_63  6alpha-pock-3alpha
describe J_CHICKENPOX_64  6alpha-pock-4alpha
describe J_CHICKENPOX_65  6alpha-pock-5alpha
describe J_CHICKENPOX_66  6alpha-pock-6alpha
describe J_CHICKENPOX_71  7alpha-pock-1alpha
describe J_CHICKENPOX_72  7alpha-pock-2alpha
describe J_CHICKENPOX_73  7alpha-pock-3alpha
describe J_CHICKENPOX_74  7alpha-pock-4alpha
describe J_CHICKENPOX_75  7alpha-pock-5alpha
describe J_CHICKENPOX_81  8alpha-pock-1alpha
describe J_CHICKENPOX_82  8alpha-pock-2alpha
describe J_CHICKENPOX_83  8alpha-pock-3alpha
describe J_CHICKENPOX_84  8alpha-pock-4alpha
describe J_CHICKENPOX_91  9alpha-pock-1alpha
describe J_CHICKENPOX_92  9alpha-pock-2alpha
describe J_CHICKENPOX_93  9alpha-pock-3alpha
describe J_CHICKENPOX_101 10alpha-pock-1alpha
describe J_CHICKENPOX_102 10alpha-pock-2alpha

-- 
Tyler Nally
[EMAIL PROTECTED]
317-989-2028


Re: New bayes poison

2006-04-13 Thread William Stearns

Good afternoon, Michael,

On Thu, 13 Apr 2006, Michael Monnerie wrote:


Hi, I just received some new bayes poison attempt. I never had one so
large, maybe that could start to be a bit of problem?


	To the best of my knowledge, it isn't.  Temporarily you get more 
hapaxes (tokens seen just once) in your bayes data, but those will get 
expired sooner or later.
	There's no effect on accuracy if the tokens truly are seen once. 
If they show up again in spam, it actually helps because the phrases help 
identify the second spam.

Cheers,
- Bill

---
"Computers let you make more mistakes faster than any other
invention in human history, with the possible exception of handguns and
tequila."
-- Mitch Radcliffe
(Courtesy of Hugo van der Kooij <[EMAIL PROTECTED]>)
--
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--


Re: xxxl spam

2006-04-13 Thread mouss

John Rudd wrote:
While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients (mail/spam/virus 
scanning at the border), OR if you do not have a high percentage of XP 
clients in your domain, then your scanning systems will not receive many 
(if any) legitimate direct connections from XP clients ... because a 
legitimate mail sending process on an XP system will be directly 
connecting to their own domain's mail server, and not to YOUR mail 
scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP systems, 
on your mail scanning systems, will be from spam/virus bots trying to 
directly submit spam or virus laden messages to your mail gateways 
instead of submitting it to their own mail servers (as bots are known to 
do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where "the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation".  Kinda like QM.





I agree that statistics aren't the whole story. you can study the 
percentage of thiefs/criminals based on skin color and origin (some 
people already do it, and many jump to conclusions without studies). but 
you can do the same study based on social situation and past history of 
people. the first "researcher" will probably conclude that 
black/arabic/latin/... people are "more" criminal. the second 
"researcher" will instead conclude that criminality is more seen in poor 
communities, but that these aren't the worst criminals (killing vs 
stealing for instance).



back to xp and co. my feeling (no, I didn't run a study and won't) is 
that even if any study would show that we get more spam from XP than 
from linux, I will not use this to classify my mail.


I am certain that if you do stats on mail date, you'll find that some 
dates correspond to more spam than others. we've already seen people 
jumping to block specific mailers (the bat for instance) based on their 
stats. I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and if 
I can't find a justification, I disable it.




RE: How was this missed?

2006-04-13 Thread Matthew.van.Eerde
Theo Van Dinter wrote:
> On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
>> Any idea how this one got through?
>> 
>> body BRIAN_PHONE_NUMBERS
>>
/2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7
.9|2.0.6.3.3.8.6.0.6.1|2.0.6
>> .2.0.2.2.0.3.3/ 
>> 
>> A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!->
>> 2*0*6*984-2327 
> 
> Sure, the pattern doesn't match.  "." means there has to be some (any)
> character between the numbers.  "984" has no characters between the
> numbers.

Fixed version:

/2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8
.?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6
.?2.?0.?2.?2.?0.?3.?3/

Or, perhaps, better:

/2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5
\D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D
?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3
/

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: How was this missed?

2006-04-13 Thread Magnus Holmgren
Please start a new thread instead of replying to an unrelated message.

Thursday 13 April 2006 18:39  wrote:
> Any idea how this one got through?
>
> body BRIAN_PHONE_NUMBERS
> /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|
>2.0.6.3.3.8.6.0.6.1|2.0.6 .2.0.2.2.0.3.3/
> describe BRIAN_PHONE_NUMBERS  Phone number or address pulled from spam
> scoreBRIAN_PHONE_NUMBERS  5.5
>

A period (.) matches exactly one arbitrary character (except newline). Try 
putting a question mark (?) after each period.

> - Message -
>
> Good day,
>
>
> A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327
>
> Within 2 weeks! No Study Required!   1_0_0_% Veri.fiable!
>
> Right now the following deg.rees are being offered:
>
> B/A,   .B/S/C,.M/A,.M/S/C,.M/B/A,   .P/H/D,
>
>
> C.al_l us now_ for more information,  2*0*6*984-2327
-- 
Magnus Holmgren



pgpxO9V9dnv05.pgp
Description: PGP signature


Re: How was this missed?

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 10:39:29AM -0600,  wrote:
> Any idea how this one got through?
> 
> body BRIAN_PHONE_NUMBERS
> /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6
> .2.0.2.2.0.3.3/
> 
> A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327

Sure, the pattern doesn't match.  "." means there has to be some (any)
character between the numbers.  "984" has no characters between the
numbers.

-- 
Randomly Generated Tagline:
1-900-Tech Support...hold...all operators are busy.


pgpwSxDV5mFul.pgp
Description: PGP signature


RE: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matthew.van.Eerde
Kelson wrote:
> (3) Scripting that will show and hide sections in response to time or
>  user interaction.
... 
> #3 shouldn't even be a consideration, since HTML-capable email clients
> should have scripting disabled for safety reasons.

s/Scripting/CSS :hover/ is perfectly reasonable, though:
http://www.meyerweb.com/eric/css/edge/menus/demo.html

(doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...)

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Kelson

Matthias Keller wrote:
In my opinion you shouldn't limit it to textareas as I've seen them on 
DIVs and others too...
So to me, any visibility:hidden or display:none is suspect as I dont see 
any legitimate use in emails


Hmm... The main uses I can think of for display:none and 
visibility:hidden are:


(1) Serving the same content to different media (for instance, set a
page so that the navigation area doesn't appear when you print it)
(2) Replacing content (as in CSS techniques to replace text with
graphical headlines)
(3) Scripting that will show and hide sections in response to time or
user interaction.
(4) Creating machine-readable content that the user will not see.
(keyword stuffing, bayes poison, black-hat SEO, honeypot seeding,
etc.)

#1 isn't a good fit with email, since the main things you'd want to 
leave out of a print version are more likely to be in the mail client UI 
than part of the message body.  Though it might be useful for providing 
a handheld-friendly view.  Even so, it wouldn't work with inline styles, 
only with an attached or embedded stylesheet.


#2 is pretty much useless in email.  If you want a text alternative, 
you're better off providing a text/plain version of the message.


#3 shouldn't even be a consideration, since HTML-capable email clients 
should have scripting disabled for safety reasons.


#4 is mostly deceptive.  If you need to provide metadata in an HTML doc, 
well, that's what META tags are for.  If you need to provide metadata in 
an email message, you've got headers, you can add an XML attachment, etc.


Nope.  No legit uses in email that I can think of.

--
Kelson Vibber
SpeedGate Communications 


How was this missed?

2006-04-13 Thread qqqq
Guys,

Any idea how this one got through?

body BRIAN_PHONE_NUMBERS
/2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6
.2.0.2.2.0.3.3/
describe BRIAN_PHONE_NUMBERS  Phone number or address pulled from spam
scoreBRIAN_PHONE_NUMBERS  5.5

- Message -

Good day,


A Gen_uine Coll`ege  Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327

Within 2 weeks! No Study Required!   1_0_0_% Veri.fiable!

Right now the following deg.rees are being offered:

B/A,   .B/S/C,.M/A,.M/S/C,.M/B/A,   .P/H/D,


C.al_l us now_ for more information,  2*0*6*984-2327



TTYL,
Vilma Milton



Re: relaydb and tarpit

2006-04-13 Thread Michael Monnerie
On Donnerstag, 13. April 2006 18:15 mouss wrote:
> pfff. just reading the two first paragraphs is enough to look
> elsewhere. some people seem to redefine what a false positive is.

I didn't mean that, I meant the tarpitting approach. Of course you have 
to set some (much) harder policy on which systems to put on your 
tarpit-blackhole list.

But *if* you have such a "tarpit decider without FP" (not sure how to do 
that...), couldn't this be a very good countermeasure to spam?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpBPBcyjNZ8J.pgp
Description: PGP signature


Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 12:12 AM, Loren Wilton wrote:

I'd like to venture the suggestion that the percentage of spam from XP 
isn't
necessarily an indication of inherent buggyness.  It is more an 
indication
that it is an OS for Clueless Noobs who haven't a clue about 
maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  
Thes

are the machines that turn into zombies.



While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients 
(mail/spam/virus scanning at the border), OR if you do not have a high 
percentage of XP clients in your domain, then your scanning systems 
will not receive many (if any) legitimate direct connections from XP 
clients ... because a legitimate mail sending process on an XP system 
will be directly connecting to their own domain's mail server, and not 
to YOUR mail scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP 
systems, on your mail scanning systems, will be from spam/virus bots 
trying to directly submit spam or virus laden messages to your mail 
gateways instead of submitting it to their own mail servers (as bots 
are known to do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where "the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation".  Kinda like QM.





Re: relaydb and tarpit

2006-04-13 Thread mouss

Michael Monnerie wrote:
Sorry for x-posting, but that's a program useful to postfix and/or SA 
users.


http://www.benzedrine.cx/relaydb.html

Does anybody use or know about this program with tarpitting? It sounds 
very interesting, and for the author it seems to work, but I'd like to 
know if others made good or bad experience with it. After all, we're 
all fighting spammers, and if there are solutions really working, I'm 
ready to implement it into our servers.




pfff. just reading the two first paragraphs is enough to look elsewhere.

some people seem to redefine what a false positive is. they think that 
just because they reject mail or because the client/sender/... 
misbehaves, then it's not a false positive. This is just silly. a false 
positive is when a classifier considers a legitimate mail as spam, be 
that by rejection, by discarding, by delivering to a junk folder, ... etc.


just say no...


Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matthias Keller

Matt Kettler wrote:

Matthias Keller wrote:
  

Matt Kettler wrote:


Magnus Holmgren wrote:
 
  

I see a fair amount of spam using  to hide bayes poison. Shouldn't a rule against that, or
CSS-hidden text in general, be worthwile? I couldn't find any in the
default 3.1.1 ruleset, nor at SARE.



It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines
long):

rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and
the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.
  
  

Hi Matt

I'm using this rule for quite some time now:

rawbody MKE_HIDDEN1
/<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i

describeMKE_HIDDEN1 Contains CSS-hidden text
score   MKE_HIDDEN1 3.5




That seems to be a nicer rule. My only concern would be that <[^>]* could be
rather slow. I'd change the * to a range-limit, to prevent SA from digging
through the entire body of a message that happens to be text/plain and starts
off with a < and has no > anywhere in it.
  

Good idea
Thanks for pointing that out
Maybe a meta rule with IS_HTML or how that's called again might be a 
good idea too


Let me know your mass check results then

Matt


Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matt Kettler
Matthias Keller wrote:
> Matt Kettler wrote:
>> Magnus Holmgren wrote:
>>  
>>> I see a fair amount of spam using  to hide bayes poison. Shouldn't a rule against that, or
>>> CSS-hidden text in general, be worthwile? I couldn't find any in the
>>> default 3.1.1 ruleset, nor at SARE.
>>> 
>>
>> It certainly seems worth testing.
>>
>> Here's a rule I wrote (caution: word-wraps.. this should be 3 lines
>> long):
>>
>> rawbody L_STYLE_HIDDEN /> [^>]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i
>> describe L_STYLE_HIDDEN  has text with hidden visibility style
>> score L_STYLE_HIDDEN 0.1
>>
>> I added some allowance for other declarations in the textarea tag, and
>> the
>> insertion of whitespace at various spots...
>>
>> It may need further tweaking/tuning, but it's a first-stab.
>>   
> Hi Matt
> 
> I'm using this rule for quite some time now:
> 
> rawbody MKE_HIDDEN1
> /<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i
> describeMKE_HIDDEN1 Contains CSS-hidden text
> score   MKE_HIDDEN1 3.5
> 

That seems to be a nicer rule. My only concern would be that <[^>]* could be
rather slow. I'd change the * to a range-limit, to prevent SA from digging
through the entire body of a message that happens to be text/plain and starts
off with a < and has no > anywhere in it.


Proper use of user_prefs "whitelist"

2006-04-13 Thread Forrest Aldrich
I've been having some difficulty with the user_prefs and the whitelist_* 
fucntions.   I read the examples etc, and I believe these are correct, 
but clearly certain email is still being tagged (see below).   I wonder 
if someone can help clarify what I'm doing wrong here.


First, here are the directives in my ~/.spamassassin/user_prefs file, as 
it applies to this instance:


   whitelist_from_rcvd spamassassin.apache.org hermes.apache.org

   whitelist_from  *.apache.org


Here is the Sendmail log, showing the rejection:

   Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951:
   from=<[EMAIL PROTECTED]>,
   size=17514, class=-60, nrcpts=1, msgid=<[EMAIL PROTECTED]>,
   proto=SMTP, daemon=MTA, relay=hermes.apache.org [209.237.227.199]

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add:
   header: X-Spam-Flag: YES

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add:
   header: X-Spam-Status: Yes, score=9.0 required=5.0
   
tests=HTML_00_10,HTML_MESSAGE,\n\tJ_CHICKENPOX_12,J_CHICKENPOX_33,RCVD_IN_SORBS,SARE_BIZOP,\n\tSARE_COLLEGE_SCAM,TVD_FUZZY_DEGREE
   autolearn=no version=3.1.1

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter: data,
   reject=550 5.7.1 Blocked by SpamAssassin

   Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951:
   to=<[EMAIL PROTECTED]>, delay=00:00:02, pri=155514, stat=Blocked by
   SpamAssassin



Thanks in advance





Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 03:58:01PM +0200, Magnus Holmgren wrote:
> I see a fair amount of spam using  to 
> hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
> general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
> at SARE.

Not specific to textarea, just looking for an html tag with that style setting:

  0.878   0.9903   0.33190.749   0.001.00  TVD_VIS_HIDDEN

Specifically just looking for textarea:

  0.821   0.9903   0.1.000   1.001.00  TVD_VIS_HIDDEN

I added the second one to my sandbox.  We'll see how the nightly
mass-checks deal with it. :)

Thanks! :)

-- 
Randomly Generated Tagline:
"Do not meddle in the affairs of wizards,
 for they are subtle and quick to anger."- Lord of the Rings


pgpQ8Oyqqmvgy.pgp
Description: PGP signature


Russian Spam

2006-04-13 Thread Kristopher Austin
I have received several copies of a spam message that is in Russian (I think 
it's Russian).  I get maybe 1 or 2 a week.  I wish I could block all Russian 
messages, but we are a University and could easily have Russian students.  I am 
unable to read this message and therefore have no ideas on how to block this.  
Can anyone help me out with suggestions?

I apologize if this has been discussed in the last week.  I haven't had time to 
catch up on list messages over the last couple of days and didn't see anything 
skimming the subjects of recent threads.

Thanks,
Kris

Message with full headers below:

Microsoft Mail Internet Headers Version 2.0
Received: from gateway3.oc.edu ([205.143.222.12]) by fsmail.oc.edu with 
Microsoft SMTPSVC(6.0.3790.211);
 Thu, 13 Apr 2006 08:50:17 -0500
Received: from ip-189.net-82-216-33.toulouse.rev.numericable.fr 
([82.216.33.189])(helo=ip-189.net-82-216-33.toulouse.rev.numericable.fr)
by gateway3.oc.edu with smtp (Exim 4.54)
id 1FU2CH-0008JS-AY
for [EMAIL PROTECTED]; Thu, 13 Apr 2006 08:49:43 -0500
From: "Litvinova Elena" <[EMAIL PROTECTED]>
To: "Samusenko Tat'jana" <[EMAIL PROTECTED]>
Date: Thu, 13 Apr 2006 13:50:06 +
Message-ID: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset="koi8-r";
reply-type=original
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1441
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-SA-Exim-Connect-IP: 82.216.33.189
X-SA-Exim-Rcpt-To: [EMAIL PROTECTED]
X-SA-Exim-Mail-From: [EMAIL PROTECTED]
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on gateway3.oc.edu
X-Spam-Level: 
X-Spam-Status: No, score=0.3 required=5.0 tests=DNS_FROM_AHBL_RHSBL,RELAY_FR 
autolearn=disabled version=3.1.0
Subject: Re[6]: =?koi8-r?B?9Nkgzc7Px88gxMzRIM3FztEg2s7B3snb2A==?= davavsheju
X-SA-Exim-Version: 4.2 (built Thu, 03 Mar 2005 10:44:12 +0100)
X-SA-Exim-Scanned: Yes (on gateway3.oc.edu)
Return-Path: [EMAIL PROTECTED]
X-OriginalArrivalTime: 13 Apr 2006 13:50:17.0572 (UTC) 
FILETIME=[32A1FA40:01C65F01]

Рад Вас снова видеть!

Вы собираетесь в США? Хотите свободно работать
с технической документацией? Расширить свой кругозор?

Центр Американского Английского
приглашает выучить английский язык!!!
Все стадии обучения - от нуля до высшего. Ассоциативно-
образная методика. Преподаватели из США.

Без больших скидок не уйдёте! :)

Наши телефоны в Москве:
105 пять-один-восемь-шесть
два-три-восемь-три-три-восемь-шесть


Не хотите получать информацию от Центра? Отправьте свой адрес нам:
[EMAIL PROTECTED]



сил. Но он не мог понять того, -- вдруг как бы вырвавшимся тонким голосом
закричал князь Андрей, -- но он не мог понять, что мы в первый раз дрались
там за русскую землю, что в войсках был такой дух, какого никогда я не
видал, что мы два дня сряду отбивали французов и что этот успех удесятерял
наши силы. Он велел отступать, и все усилия и потери пропали даром. Он не
думал об измене, он старался все сделать как можно лучше, он все обдум
от этого-то он и не годится. Он не годится теперь именно потому, что он все
обдумывает очень основательно и аккуратно, как и следует всякому немцу. Как
бы тебе сказать... Ну, у отца твоего немец-лакей, и он прекрасный лакей и
удовлетворит всем его нуждам лучше тебя, и пускай он служит; но ежели отец
при смерти болен, ты прогонишь лакея и своими непривычными, неловкими 
станешь ходить за отцом и лучше успокоишь его, чем искусный, но чужой
человек. Так и сделали с Барклаем. Пока Россия была здорова, ей мог служить



Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matthias Keller

Matt Kettler wrote:

Magnus Holmgren wrote:
  
I see a fair amount of spam using  to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
at SARE.



It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long):

rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.
  

Hi Matt

I'm using this rule for quite some time now:

rawbody MKE_HIDDEN1 
/<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i

describeMKE_HIDDEN1 Contains CSS-hidden text
score   MKE_HIDDEN1 3.5

In my opinion you shouldn't limit it to textareas as I've seen them on 
DIVs and others too...
So to me, any visibility:hidden or display:none is suspect as I dont see 
any legitimate use in emails


In my spams, this rule matches around 4% of all spams, I haven't seen 
any ham matches yet
Feel free to mass check it and/or include it into your coding rules. But 
if you do please inform me that I can remove my local copy then.


Matt


Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matt Kettler
Bowie Bailey wrote:
> JD Smith wrote:
>> So, what exactly is bayes poison?
> 
> "Bayes poison" is a collection of random words or text selections that
> have nothing to do with the email subject and are only there in an
> attempt to confuse the Bayes database.  This doesn't really work the
> way the spammers would like to think it does, but they keep doing it
> anyway.


How well bayes poison works depends a lot on your "bayes" implementation. Some
"bayes" implementations are fairly susceptible to this.  (I put "bayes" in
quotes because not all bayes implementations are really Bayesian at all.
Actually, most are not, including SA.)

In particular, the choice of combining algorithm seems to matter a lot. The use
of chi-squared combining, instead of true Bayesian combining, seems to make SA's
bayes rather resistant to this.

(note: the use of chi-squared is not exclusive to SA.. many "bayes"
implementations do this, but not all.)

Another area of influence is the choice of tokens. Words vs chars, hapaxes, etc
all change how a bayes implementation reacts to poisoning attempts.

So spammers keep using bayes poison because it works in some cases. It also
doesn't really hurt them much, and sometimes even helps them, against more
resistant implementations.








SpamAssassin BZ downtime

2006-04-13 Thread Justin Mason
http://ajax.apache.org/%7ejefft/ :

  Bugzilla is moving to a new host, and is temporarily down while the
  database synchs. Apologies for the inconvenience.

--j.


Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Theo Van Dinter
On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote:
> Agreed, this rule is completely inappropriate, it penalizes valid
> encoding according to RFC 2047 and fires on any lengthier Subject
> line in non-English language. It should disappear or have a
> much reduced default score.

Says you. ;)

  1.047   1.4619   0.07920.949   0.580.89  SUBJECT_ENCODED_TWICE

So in the results used to generate scores, that rule is ~94.9% accurate,
and hits ~1.46% of all spam.  In a recent nightly mass-check run:

  1.153   1.4173   0.11510.925   0.730.89  SUBJECT_ENCODED_TWICE

So more ham seems to use encoding twice in the subject, and a little
less spam uses it.  Based on this, my guess is the generated score would
go down.

The thing to remember about rules is that they neither necessarily
look for RFC non-compliance, nor do they avoid RFC compliant mails.
They look for features that hit spam and try to avoid hitting ham.
The key there is that rule development occurs with the results people
make available.  If the people generating results don't receive ham
mails that, for instance, use multiple encodings in a Subject header,
the results won't indicate that it occurs in ham very much.

-- 
Randomly Generated Tagline:
"I protect home plate like a mormon girl on prom night."
 - Mimi on the Drew Carey show


pgp7GImSPz38Z.pgp
Description: PGP signature


RE: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Bowie Bailey
JD Smith wrote:
> 
> So, what exactly is bayes poison?

"Bayes poison" is a collection of random words or text selections that
have nothing to do with the email subject and are only there in an
attempt to confuse the Bayes database.  This doesn't really work the
way the spammers would like to think it does, but they keep doing it
anyway.

-- 
Bowie


RE: TEXTAREA style="visibility: hidden"

2006-04-13 Thread JD Smith

So, what exactly is bayes poison?

Best regards,

JD Smith
-Original Message-
From: Magnus Holmgren [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 13, 2006 8:58 AM
To: users@spamassassin.apache.org
Subject: TEXTAREA style="visibility: hidden"

I see a fair amount of spam using 
to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset,
nor 
at SARE.

-- 
Magnus Holmgren



Re: TEXTAREA style="visibility: hidden"

2006-04-13 Thread Matt Kettler
Magnus Holmgren wrote:
> I see a fair amount of spam using  to 
> hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
> general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
> at SARE.

It certainly seems worth testing.

Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long):

rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i
describe L_STYLE_HIDDEN  has text with hidden visibility style
score L_STYLE_HIDDEN 0.1

I added some allowance for other declarations in the textarea tag, and the
insertion of whitespace at various spots...

It may need further tweaking/tuning, but it's a first-stab.






TEXTAREA style="visibility: hidden"

2006-04-13 Thread Magnus Holmgren
I see a fair amount of spam using  to 
hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in 
general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor 
at SARE.

-- 
Magnus Holmgren


pgpVmoewWW2XX.pgp
Description: PGP signature


Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Michael Monnerie
On Donnerstag, 13. April 2006 13:35 Mark Martinec wrote:
> Agreed, this rule is completely inappropriate, it penalizes valid
> encoding according to RFC 2047 and fires on any lengthier Subject
> line in non-English language. It should disappear or have a
> much reduced default score.

The problem seems to be that
1) most spam is english
2) most people contributing mass-checks are english speaking
3) therefore most ham+spam tested in mass-checks are english

in order to improve the situation, more mass-check testers with 
non-english language ham+spam should contribute, see 
http://wiki.apache.org/spamassassin/MassCheck?highlight=%28mass%29

I'm not a SA dev, but I think they once wrote more supporters would be 
nice. I do mass-checks, and if somebody wants to help, I have a working 
script you can have in order to contribute to testing. It's a simple 
setup, and then your server has some work to do overnight. On mine, 
it's about 1 hour per night, so pas problem.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpRDuDm470m7.pgp
Description: PGP signature


Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Mark Martinec
Kai Schaetzl wrote:
> > I just saw that a normal Ebay outbid notice hit two high-score rules. One
> > is from sare-spoof and I already contacted the maintainer. But one is in
> > the default 3.1.1 ruleset and I think this rule should get completely
> > removed or get a score of 0. It's
> > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Alan Premselaar:
> This utterly wreaks havoc on just about all Japanese email, so I dropped
> the score to nearly nothing.

Agreed, this rule is completely inappropriate, it penalizes valid
encoding according to RFC 2047 and fires on any lengthier Subject
line in non-English language. It should disappear or have a
much reduced default score.

  Mark


Re: xxxl spam

2006-04-13 Thread Daryl C. W. O'Shea

Mark Martinec wrote:


I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark


Hmm... FWIW:

[EMAIL PROTECTED] dos]$ sudo p0f -i eth1
p0f - passive os fingerprinting utility, version 2.0.4
(C) M. Zalewski <[EMAIL PROTECTED]>, W. Stearns <[EMAIL PROTECTED]>
p0f: listening (SYN) on 'eth1', 223 sigs (12 generic), rule: 'all'.
24.141.168.241:4218 - Windows XP Pro SP1, 2000 SP3
  -> 66.98.221.156:25 (distance 1, link: ethernet/modem)
66.98.221.156:2602 - Windows 2000 SP4, XP SP1
  -> 24.141.168.241:783 (distance 19, link: ethernet/modem)


24.141.168.241 is Windows XP Pro SP1
66.98.221.156 is Windows Server 2003 SP1 (Standard Edition)


Daryl



Re: xxxl spam

2006-04-13 Thread Mark Martinec
Wolfgang, Loren,
> > real mail servers (those that deliver the ham part of mail) rarely ever
> > run XP but that this OS is the best candidate for creating a spam zombie

> Not completely unreasonable.  XP is targeted within MS as a personal or
> very small company OS.  The equivalent of a linux/unix system used by more
> than a single person would typically be some version of Server 2003.  Which
> was probably identified in the stats as Windows 2000.
>
> I'd like to venture the suggestion that the percentage of spam from XP
> isn't necessarily an indication of inherent buggyness.  It is more an
> indication that it is an OS for Clueless Noobs who haven't a clue about
> maintaining a system, avoiding a virus, or even able to tell if they have a
> viruis.  Thes are the machines that turn into zombies.

I fully agree.

In this view the following two lines should be seen as well:

p0f OS guessham :   spam
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %

Linux is used by masses (compared to other Unix OS types) because it is
considered to be easier to set up. Eventually this also means that less care
is invested in prevention of being used to propagate spam.

Still, a "score  L_P0F_Unix  -1.0" seems to be doing a good job here.


Daryl,
> I'm not sure the ham hit rate from the Windows-XP category scales (to
> other installations) very well.  The last time I looked into using p0f
> to fingerprint connecting hosts, last spring, I seem to recall that
> Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint
> identically.
>
> While it'd be nice to be score "Windows-XP" hosts harshly, there's a lot
> of mail coming from Windows Server 2003 hosts that would get hit.

There is indeed a handful of valid small sites classified by p0f as Windows XP 
from which we do receive regular mail (well, newsletters and such, but still,
should be treated mostly as ham). I don't see adding few score points to them
much different than other (some quite arbitrary) rules - each rule tries to
have low FP rate, but it often is not zero. Only a collection of all rules has
merit.

> I know for some of my systems 1:99 would be really low if Windows Server
> 2003 and XP are identified the same.  40:60 (and in some cases 80:20)
> would be closer to what I often see if I were to assume that all spam
> came from Windows XP hosts.
> Maybe you don't receive much, if any, mail from Windows Server 2003 hosts?

I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark


relaydb and tarpit

2006-04-13 Thread Michael Monnerie
Sorry for x-posting, but that's a program useful to postfix and/or SA 
users.

http://www.benzedrine.cx/relaydb.html

Does anybody use or know about this program with tarpitting? It sounds 
very interesting, and for the author it seems to work, but I'd like to 
know if others made good or bad experience with it. After all, we're 
all fighting spammers, and if there are solutions really working, I'm 
ready to implement it into our servers.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpX4owGiqKRK.pgp
Description: PGP signature


Re: sa missed to scan some of email

2006-04-13 Thread martin
David B Funk  engineering.uiowa.edu> writes:
> Exactly so.
> Usually you can find the related message by matching the time-stamp
> from your maillog to your spamd log. You can also do some detective work,
> eliminate maillog entries that have an incoming msgid (IE one from the
> sending MTA) and just concentrate on those that have a locally added
> msgid.
> 
> Dave
> 

  thx help, it seem ur correct, as based on the timestamp search, most of
unknown msgid at spam.log had a msgid like '[EMAIL PROTECTED]'
at maillog. 




Re: xxxl spam

2006-04-13 Thread Loren Wilton
> to read this in other words: while certain analysts (and definitlely
microsoft marketing)
> claim that about 50 % of all servers is running windows, these figures
tend to say that
> real mail servers (those that deliver the ham part of mail) rarely ever
run XP
> but that this OS is the best candidate for creating a spam zombie

Not completely unreasonable.  XP is targeted within MS as a personal or very
small company OS.  The equivalent of a linux/unix system used by more than a
single person would typically be some version of Server 2003.  Which was
probably identified in the stats as Windows 2000.

I'd like to venture the suggestion that the percentage of spam from XP isn't
necessarily an indication of inherent buggyness.  It is more an indication
that it is an OS for Clueless Noobs who haven't a clue about maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  Thes
are the machines that turn into zombies.

If there were as many linux machines in the hands of Clueless Noobs, I'd bet
that the number of infected linux systems would be in the similar percentage
range.  Remember, these XP systems are virtually all run with Administrator
(aka root) privs all the time, by people that haven't a clue what that
means.  What would happen if all linux-like systems ran that way?)

Loren



Re: Rawbody rules information

2006-04-13 Thread Matt Kettler
Nigel Marshall wrote:
> Hi List,
>
> I am looking to understand more about the raw body rules, and examples
> of them that I could follow to hopefully write a few for myself. Can
> someone point in a good place to start or a good tutorial on this sort
> of thing?
A rawbody rule is pretty much the same as a body rule. The difference
being that HTML tags are still present, and newlines are present.

http://wiki.apache.org/spamassassin/WritingRules

That said, do you really need to write a rawbody rule? Are you sure a
body or uri rule won't do instead?

I generally try to avoid writing rawbody rules unless I need to write
something that falls into one of these tow categories:

1) a examines HTML tags directly (and not just the target of a URI)
2) examines newline insertion patterns.




Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

2006-04-13 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kai Schaetzl wrote:
> I just saw that a normal Ebay outbid notice hit two high-score rules. One 
> is from sare-spoof and I already contacted the maintainer. But one is in 
> the default 3.1.1 ruleset and I think this rule should get completely 
> removed or get a score of 0. It's
> 
> 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
> 
> From grepping the rules it does what it says: it checks if there are two 
> B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or 
> at all? This is absolutely valid Q/B encoding and actually *required* by 
> RFC if your subject line is longer than 80 (or was it 72?) characters 
> (minus the encoding, so it's actually more like a 60 raw character limit).
> This rule will hit on *lots* of non-ASCII mail and on almost all mail 
> coming from Ebay Germany.
> 
> There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which 
> are "similar". QP scores 0 and BASE64 scores 0.449. This is much more 
> reasonable.
> 
> Kai
> 

This utterly wreaks havoc on just about all Japanese email, so I dropped
the score to nearly nothing.

alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEPfgmE2gsBSKjZHQRAt82AKDAY4xTmST0kaY5cje1xH1ScDajOACg6fMH
msifLKqJuv1IpudxbKGDcfQ=
=ZDQE
-END PGP SIGNATURE-