Re: translation help please

2006-11-24 Thread Charlie Clark


Am 24.11.2006 um 04:22 schrieb Chris:


This was tossed into my spam folder tonight but it was during my NANAS
report run. I'm not sure if its a reply from abuse@ or just a spam:


Neither. It's instructions on how to use the website galeon.com  
configuring the browser to work with cookies, etc.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Interesting text content in the new spams

2006-11-23 Thread Charlie Clark
Looks like there are some pretty impressive self-learning systems out  
there. I'm enclosing the content of the text part of a new spam. I  
think it's quite an interesting vocabulary that they are using,  
presumably from their own trained ham database. This spam got through  
four different checks (postfix + blacklisting, spamassassin,  
spambayes and Opera's own spam system)! Given them a couple of years  
and we can finally close slashdot et al. and actually start reading  
this stuff! ;-)


Charlie

Raquo Areas Bugs. Open total a bug Tracking Support or Requests in  
Tech Patches.

Release archive is raquo of Areas?
Framework gd Engine Details Developers Beta Intended Audience. In  
Create Newscreate Farm Mapcreate or Projectnew am Wantedmy?  
Statistics currently Browse Most!
Of feeds available for this About by or the from. Activity Percentile  
last week View list of feeds available is.
Language a License gnu of. Patches Patch Feature a Request. Details  
Developers Beta Intended Audience Education Technology.
Education Technology or Other Topic English Unix name Registered.  
Language License gnu?

Va Software Ostg Source Group all Rights Reserved or Find.
Projectnew Wantedmy Statussite is.
Areas in Bugs open total bug Tracking Support. Va Software Ostg  
Source Group all Rights Reserved or Find.

Bug or Tracking Support Requests or Tech Patches am Patch in.
Audience or Education Technology Other Topic English Unix.
Support in Requests Tech Patches Patch Feature Request. Kolmafia sw  
Test Automation Framework gd. System of os Written an language of  
License gnu General Public.
License gnu General Public gpl. Create Newscreate is Farm of  
Mapcreate Projectnew am Wantedmy Statussite Status web!

Sprites a Release archive raquo of Areas Bugs?
Open total a bug Tracking Support or Requests in Tech Patches. Book  
Search is Advanced log in Create is. Va Software Ostg Source Group in  
all Rights.
Latest a News new or Graphics and Sprites Release archive. Va  
Software Ostg Source Group in all Rights.

Intended Audience Education.

--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Greylisting

2006-11-21 Thread Charlie Clark


Am 21.11.2006 um 01:12 schrieb John Andersen:


On Monday 20 November 2006 15:08, Rick Macdougall wrote:
It's possible that they could send it all twice but I've never  
seen it.
  Remember that some unbelievable number of infected Windows  
clients are

the main source of spam and it would just be too much trouble for the
spammer to try every address twice after a 15 minute interval.


Oh come on!  It costs the spammer NOTHING to make that adjustment
to his bot net.  Its someone else's bandwidth, and someone else's
cpu cycles.

They are reading this list and planning the changes already.


Of course! Spam and Spamassassin is the ultimate cops  robbers! I'm  
sure the best spammers continually update the rules and run their own  
tests against them to develop new mails which get through. Despite  
everyone's best efforts we are fighting a losing battle with a  
solution that does not tackle the botnet problem at source but for  
that to happen things might have to get a whole lot worst! :-/


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: blarsbl

2006-11-21 Thread Charlie Clark


Am 21.11.2006 um 17:53 schrieb Thomas Lindell:


Att mail servers use his service.

Which means I can't send to mediacom which is an att partner

I couldn't believe att used his service.

What's odd is that my company uses att backhaul bandwidth in the  
form of 4

t1's

Grr the whole thing is frustrating


The guy's a moron but I think his disclaimer lets him off:
The BlarsBL is maintained by Blars at his wim. Use for any purpouse  
should be done at your own risk, and Blars is not responsible for use  
by anyone but himself.


While he is under no compunction to remove an address I think his  
demand for money is ludicrous.


If this is held under the right nose at ATT or Mediacom it should  
produce the right reaction.


But this and other issues do pose the question: how easy is it going  
to be for spammers to start using blocking list against normal users?


Charlie

--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: would SA benefit from port to Java

2006-11-19 Thread Charlie Clark


Am 17.11.2006 um 20:36 schrieb Eric A. Hall:



Thinking about the GPL Java announcement some, and trying to  
imagine the
kinds of opportunities this allows for, it occurs to me that  
SpamAssassin

might be a natural fit for Java.


Why on earth do you come to that conclusion and what does Java going  
GPL have anything to do with it?


I'm just thinking out loud here, not advocating anything...



At best you are speculating rather thank thinking.

Would it run better? Would it be faster, have smaller memory  
footprint,
better reclamation, better hooks for plugins etc? OTOH, would it be  
harder

to build, given the dependence of SA on perl modules?


Please do some research on progam languages and domains because one  
size almost never fits all. While I personally very much dislike  
perl, it is extremely well-suited to this task: text-centric, rapidly  
changing. SA was the first out there, has a large body of active  
developers and is extensible by rules.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Endusers and spam

2006-11-16 Thread Charlie Clark

Dear list,

this is an obvious question but not part of the FAQs or at least I  
couldn't find it! What is the best way of getting end users to  
identify spam getting through so that it can be learned? I have so  
far set up an extra account and forward the e-mail and then tell  
Spamassassin to learn from this but I'm worried about the extra  
headers and formatting that are added when forwarding. The mail  
accounts are all mbox format so it also isn't possible to pass the  
individual messages into be learnt.


Thanks

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Distributed Bayes DB?

2006-11-11 Thread Charlie Clark


Am 11.11.2006 um 10:48 schrieb Matthias Leisi:



I already took a look at using SQL, but this quote:

| NB:  This should be considered BETA, and the interface, schema, or
| overall operation of SQL support may change at any time with future
| releases of SA.

stops me from using it. Unfortunately, I can not run software  
officially

considered Beta on this system.


I suppose you could use something like NFS so that all systems share  
the same DB, config files, etc.




Use a SQL server backend. If you must have a no-failure option for  
the

bayes DB, use a  cluster of SQL servers.

Example with mysql:

http://www.howtoforge.com/loadbalanced_mysql_cluster_debian


I suppose that every message passed through SpamAssassin will issue at
least on query and one update statement to the DB. How does a MySQL
cluster perform with 500'000 messages per day, considering that
replication must also take place?


How long is a piece of string? 500,000 queries per day shouldn't  
cause any problems for an RDBMS but the architecture of such a system  
should be given a bit of consideration - connection pooling et al.


There is in fact a mail system that uses PostgreSQL to store all the  
mails. If you want more information on requirements, speed, etc. I'm  
pretty sure you could run Spamassassin on the top of it.





What is the best practice in that
regard with Spamassassin?


Using SQL is by far the best practice here.


I do not see many mentions of the SQL approach - either because it is
not used much or because it works so well?


Probably the former. And you're right not to use something like the  
SQL backend for a large volume production system. Not because it's  
unreliable but because it's still in development and keeping the  
schema up to date could become a real headache.


I suspect that at some point it might make sense to use something  
like SQLite for persistence (because it's relatively easy to  
distribute) which would make using alternative backends relatively easy.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Distributed Bayes DB?

2006-11-11 Thread Charlie Clark


Am 11.11.2006 um 11:47 schrieb Matt Kettler:


I suppose you could use something like NFS so that all systems share
the same DB, config files, etc.

NFS would be HIGHLY not -recommended.

http://article.gmane.org/gmane.mail.spam.spamassassin.general/72362/ 
match=sql


In fact, I personally would suggest never using NFS for anything at  
all,

and I'm shocked that you'd even consider using it for any production
purpose.


NFS or equivalent has its place and can be made safe enough if  
required but I think other issues like concurrent access suggest that  
the SQL approach is the way to go.


Besides, the point here is to eliminate any single-point-of- 
failure. NFS

would offer no redundancy at all. If the server hosting the NFS share
went down, the bayes DB would be unavailable.


Agreed.

I do not see many mentions of the SQL approach - either because  
it is

not used much or because it works so well?


Probably the former. And you're right not to use something like the
SQL backend for a large volume production system. Not because it's
unreliable but because it's still in development and keeping the
schema up to date could become a real headache.

But it's not still in development.. It's the recommended configuration
as of 3.1.0.

SA's SQL support is solid. I personally don't use it, but many here  
do.


Yes, sorry I should have read all e-mails relating to the thread first.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: sa-update rules for SA 3.1.7 have been updated but they fail lint

2006-11-10 Thread Charlie Clark


Am 11.11.2006 um 01:18 schrieb Daryl C. W. O'Shea:


Justin Mason wrote:

Randal, Phil writes:

I've just run sa-update -D and it's failed with return code 4.

update 473327:

config: warning: score set for non-existent rule PART_CID_STOCK
config: warning: score set for non-existent rule PART_CID_STOCK_LESS

As a result, the rules get rolled back.

oops.  now fixed.
OK, try it soonish (it may take a few minutes for the mirrors to
update and the cached DNS txt record to expire).


Ha!  Remember what I said about feeling unlucky?  ;)


Rule #1 - Let someone else ask the really stupid question for you first!

Thanks, this was biting me, too and I saw it had been fixed in the  
past! tsk, tsk ;-)


re. my own problems: it looks like things have settled down.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-09 Thread Charlie Clark


Am 09.11.2006 um 02:10 schrieb Daryl C. W. O'Shea:


Charlie Clark wrote:

Looks like I'm on top of the resources problem but I am getting  
421 delivery errors even though the e-mails are coming through.  
This looks very similar to bug 3828 (which is Spamassassin +  
Exim). Except this bug should have been closed a long time ago.


Without looking at the bug, it sounds like you're saying that Exim  
temp fails messages when a filter (SA) isn't available to filter  
the message in time.  If that's the case it's sensible for that to  
happen.


Indeed it is. I just don't understand why it is happening on this  
machine which has a very low load.




The strange thing is these errors never occurred before last week  
and having just upgraded to 3.1.7 I would hope to have a system  
including all relevant bug fixes.
Of course, as Theo said it might simply be easier to stop using  
spamd and just call spamassassin but it might also be helpful to  
track down the problem. Should I jump on the back of the old bug  
or make a new submission?


Have you actually looked into making sure that you're not  
experiencing an expiry issue (like the expiry being times out and  
never completed) like Theo inferred you do off the bat?



No, and I'll admit to not really understanding exactly what you mean.  
Where can I check and if necessary change this?


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-09 Thread Charlie Clark


Am 09.11.2006 um 19:27 schrieb Daryl C. W. O'Shea:



If your one and only child is busy doing an expire it can't scan  
messages too.


ah, so I could increase the number of children running to do this?



The strange thing is these errors never occurred before last  
week and having just upgraded to 3.1.7 I would hope to have a  
system including all relevant bug fixes.
Of course, as Theo said it might simply be easier to stop using  
spamd and just call spamassassin but it might also be helpful to  
track down the problem. Should I jump on the back of the old bug  
or make a new submission?


Have you actually looked into making sure that you're not  
experiencing an expiry issue (like the expiry being times out and  
never completed) like Theo inferred you do off the bat?
No, and I'll admit to not really understanding exactly what you  
mean. Where can I check and if necessary change this?


Disable bayes_auto_expire in your local.cf and run an expire  
manually (and then set it up as a cron job) by running sa-learn -- 
force-expire as the user that SA normally runs as (if SA runs as  
more than one user, run it for all the users it runs as).  It's  
probably going to take a considerable amount of time for it to  
run... let it finish, it will eventually.


bayes_auto-expire isn't actually in my local.cf so I've added it as
bayes_auto-expire   0

It also strikes me that I can probably enable trusting the localhost  
on this machine - does this mean that spamassassin will not bother  
checking e-mail sent via the local SMTP?


Thank you very much for your help!

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-09 Thread Charlie Clark


Am 09.11.2006 um 20:35 schrieb Daryl C. W. O'Shea:


Charlie Clark wrote:

Am 09.11.2006 um 19:27 schrieb Daryl C. W. O'Shea:


If your one and only child is busy doing an expire it can't scan  
messages too.

ah, so I could increase the number of children running to do this?


You could, running at least 2 children if you've got the resources  
to do it isn't a bad idea), but it sounds like you've either got a  
lot of individual bayes databases to expire or the expiry of one or  
more of the databases is never being allowed to complete.


I think I have the resources for more children. It's not a lot of  
mail going through the system but I think the network connection  
often seems to have problems.



You're best off disabling auto expire and doing it manually.


I ran it manually and it only took a couple of seconds so I think  
that now that the performance issue has gone, this will hopefully  
eventually go away.




Disable bayes_auto_expire in your local.cf and run an expire  
manually (and then set it up as a cron job) by running sa-learn -- 
force-expire as the user that SA normally runs as (if SA runs as  
more than one user, run it for all the users it runs as).  It's  
probably going to take a considerable amount of time for it to  
run... let it finish, it will eventually.

bayes_auto-expire isn't actually in my local.cf so I've added it as
bayes_auto-expire 0


Yeah, it's on by default, that's how you disable it.


It also strikes me that I can probably enable trusting the  
localhost on this machine - does this mean that spamassassin will  
not bother checking e-mail sent via the local SMTP?


That's not at all what it means.  If you need help configuring your  
that search the archives or start another thread.


Got it sussed now all I need to do is tell Exim to unfreeze it's  
queue...


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Problem with spamd

2006-11-08 Thread Charlie Clark

Hi,

about a week ago my server started experiencing load problems and  
eventually closed all connections. It is running at an ISP and has  
lots of software preconfigured including spam assassin configured by  
the ISP. There are currently two problems: spamd is nearly  
monopolising the CPU but also the tcprcvbuf eventually get used up;  
but I suspect the two are related. As I did not configure the system  
I am have to working my way through but it looks like a default  
install. I could not find anything on the FAQ relating to this  
specifically apart from the reference to max-children (set to 1 in  
this case).


It doesn't look like there are a lot of e-mails to process. The setup  
is Debian with spamd being called by as an Exim transport.


These are the active rules
vs171127:/usr/share/spamassassin# ls -l
total 552
-rw-r--r--  1 root root   6013 Jun 30  2005 10_misc.cf
-rw-r--r--  1 root root   1600 Jun 30  2005 20_anti_ratware.cf
-rw-r--r--  1 root root   8193 Jun 30  2005 20_body_tests.cf
-rw-r--r--  1 root root   1608 Jun 30  2005 20_compensate.cf
-rw-r--r--  1 root root  12078 Jun 30  2005 20_dnsbl_tests.cf
-rw-r--r--  1 root root  15695 Jun 30  2005 20_drugs.cf
-rw-r--r--  1 root root  11263 Jun 30  2005 20_fake_helo_tests.cf
-rw-r--r--  1 root root  27706 Jun 30  2005 20_head_tests.cf
-rw-r--r--  1 root root  15482 Jun 30  2005 20_html_tests.cf
-rw-r--r--  1 root root  10934 Jun 30  2005 20_meta_tests.cf
-rw-r--r--  1 root root  22094 Jun 30  2005 20_phrases.cf
-rw-r--r--  1 root root   4961 Jun 30  2005 20_porn.cf
-rw-r--r--  1 root root  14134 Jun 30  2005 20_ratware.cf
-rw-r--r--  1 root root   5027 Jun 30  2005 20_uri_tests.cf
-rw-r--r--  1 root root   2329 Jun 30  2005 23_bayes.cf
-rw-r--r--  1 root root   9112 Jun 30  2005 25_body_tests_es.cf
-rw-r--r--  1 root root   2733 Jun 30  2005 25_hashcash.cf
-rw-r--r--  1 root root   2299 Jun 30  2005 25_spf.cf
-rw-r--r--  1 root root   4698 Jun 30  2005 25_uribl.cf
-rw-r--r--  1 root root  52288 Jun 30  2005 30_text_de.cf
-rw-r--r--  1 root root  40677 Jun 30  2005 30_text_fr.cf
-rw-r--r--  1 root root  57934 Jun 30  2005 30_text_nl.cf
-rw-r--r--  1 root root  34798 Jun 30  2005 30_text_pl.cf
-rw-r--r--  1 root root  29369 Jun 30  2005 50_scores.cf
-rw-r--r--  1 root root   6882 Jun 30  2005 60_whitelist.cf
-rw-r--r--  1 root root939 Jun 30  2005 65_debian.cf
-rw-r--r--  1 root root 101479 Jun 30  2005 languages
-rw-r--r--  1 root root  18944 Jun 30  2005 triplets.txt
-rw-r--r--  1 root root   1531 Jun 30  2005 user_prefs.template

This is from top:
3796 web1p239  15 46764  42m 4252 R 65.2  0.7 144:06.98 spamd
and this is a check of the tcprc use
tcprcvbuf481548189607218840243681759 148967

(machine was rebooted this morning)

Is it possible to get more information from spamd about why it's  
taking so long? Thanks for any help.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-08 Thread Charlie Clark


Am 08.11.2006 um 18:43 schrieb Theo Van Dinter:


On Wed, Nov 08, 2006 at 06:38:19PM +0100, Charlie Clark wrote:

2006-11-08 17:31:00 [9733] i: debug: refresh: 9733 refresh /home/
confixx/web1p2/.spamassassin/bayes.lock

Is this standard behaviour? It seemed okay when the lock is acquired
but seems to spend most of its time actually refreshing the lock.


It's ok if it's doing something to the DB, you want the lock  
refreshed.  I'm

guessing you're seeing a bayes expiry.


Okay, seems to have calmed down now. i wonder if that's related to  
the fact that I seem to  be having problems sending e-mail:


The address to which the message has not yet been delivered is:

  [EMAIL PROTECTED]
Delay reason: Connection timed out

Presumably because my buffers have been filled. I'v restarted Exim in  
the hope that will help but I wonder what's causing this in the first  
place - what is screwing my SMTP server? It really doesn't look like  
it should be that busy but I don't really know where I should be  
looking!


Charlie

--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-08 Thread Charlie Clark


Am 08.11.2006 um 20:51 schrieb François Rousseau:


max-children (set to 1 in this case).

Why 1???


That's the default for servers run by this ISP. Do you have a  
suggestion?



How many email to you received by day? (or by minute???)


Excluding spam it's probably less than 50 per day for all accounts on  
this server! So there shouldn't ever be a problem. I *think* that the  
changes I've made today including restarting Exim seem to be working.  
The problem may have been related to one account getting full and not  
accepting any new mail but I don't find this particularly convincing  
for the mail server running out of resources,


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-08 Thread Charlie Clark


Am 08.11.2006 um 22:45 schrieb Theo Van Dinter:


On Wed, Nov 08, 2006 at 10:18:53PM +0100, Charlie Clark wrote:

How many email to you received by day? (or by minute???)


Excluding spam it's probably less than 50 per day for all accounts on
this server! So there shouldn't ever be a problem. I *think* that the
changes I've made today including restarting Exim seem to be working.


If you only receive 2-3 messages per hour, just run spamassassin  
and don't

bother with spamc/spamd.  Why have another daemon?


I didn't set this up originally and I generally try and follow the  
rule of messing with the system as little as possible as it is. That  
said I've extended the local.cf file which had virtually no  
directives and am in the process of upgrading from 3.0.3 to 3.1.7.  
I'm not pleased with my ISP for taking over a week to investigate the  
initial complaint and me actually using the trouble ticket to  
annotate the changes I make!


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Problem with spamd

2006-11-08 Thread Charlie Clark


Am 08.11.2006 um 23:00 schrieb Charlie Clark:



Am 08.11.2006 um 22:45 schrieb Theo Van Dinter:


On Wed, Nov 08, 2006 at 10:18:53PM +0100, Charlie Clark wrote:

How many email to you received by day? (or by minute???)


Excluding spam it's probably less than 50 per day for all  
accounts on
this server! So there shouldn't ever be a problem. I *think* that  
the
changes I've made today including restarting Exim seem to be  
working.


If you only receive 2-3 messages per hour, just run spamassassin  
and don't

bother with spamc/spamd.  Why have another daemon?


I didn't set this up originally and I generally try and follow the  
rule of messing with the system as little as possible as it is.  
That said I've extended the local.cf file which had virtually no  
directives and am in the process of upgrading from 3.0.3 to 3.1.7.  
I'm not pleased with my ISP for taking over a week to investigate  
the initial complaint and me actually using the trouble ticket to  
annotate the changes I make!



Looks like I'm on top of the resources problem but I am getting 421  
delivery errors even though the e-mails are coming through. This  
looks very similar to bug 3828 (which is Spamassassin + Exim). Except  
this bug should have been closed a long time ago.


The strange thing is these errors never occurred before last week and  
having just upgraded to 3.1.7 I would hope to have a system including  
all relevant bug fixes.


Of course, as Theo said it might simply be easier to stop using spamd  
and just call spamassassin but it might also be helpful to track down  
the problem. Should I jump on the back of the old bug or make a new  
submission?


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: Rule for raw HTML

2006-11-08 Thread Charlie Clark


Am 09.11.2006 um 01:18 schrieb Ron:


A few spams have slipped by that contain HTML that is appearing as
normal text (due to them not getting something right).

For example:

and you may haveBRcontempt seemed abundantly increasing with the
length of his second speech, and at the end of it heBRand the
mortification of kitty

Is there a rule that will catch HTML like tags that are not in the
right MIME type section?   I also see this a lot with A HREF=...
links.



I can't see the need for an extra rule for this as it should be  
caught by the Bayesian rules after the very briefest of training.  
That the HTML doesn't display correctly is par for the course for  
spam which almost by definition does not play by the rules.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226