Re: Spam from Gmail & Blogspot

2008-05-28 Thread Bob Proulx
Joseph Brennan wrote:
> Just a few months ago we did not get much spam at all from gmail.
> Something changed.

One change seems to be that Google's captcha has been broken.

  http://www.google.com/search?q=google+captcha+broken

Bob


Re: Spam from Gmail & Blogspot

2008-05-28 Thread Sahil Tandon
Joseph Brennan <[EMAIL PROTECTED]> wrote:
  
>> I have checked the Received: headers several times and the messages  are
>> coming from  or .  Maybe they  should  be  listet  in
>> .
>
> We have 200 complaints a day about spam that really comes from gmail.
> It's the biggest source of spam that gets through.  Obviously it's also
> a very big source of legit mail.  Just a few months ago we did not get
> much spam at all from gmail.  Something changed.

Agreed, and people need to realize that spam delivered via gmail servers *is* 
Google's responsibility.  No exceptions.

-- 
Sahil Tandon <[EMAIL PROTECTED]>


Re: Spam from Gmail & Blogspot

2008-05-28 Thread Joseph Brennan



I have checked the Received: headers several times and the messages  are
coming from  or .  Maybe they  should  be  listet  in
.



We have 200 complaints a day about spam that really comes from gmail.
It's the biggest source of spam that gets through.  Obviously it's also
a very big source of legit mail.  Just a few months ago we did not get
much spam at all from gmail.  Something changed.

Joseph Brennan
Columbia University Information Technology





Re: rDNS none in stats with IPv6

2008-05-28 Thread Steve Bertrand

Greg Troxel wrote:

  In my SA stats, the majority (+90%) of email inbound is classified as
  rdns_none.



(I presume you are trying to make this server IPv6 only instead of dual
stack.  When my machine had a globally routable v6 address I got some
mail over v6 and some over v4, but didn't used mapped addresses.)


When I get a few more minutes, I will go over the reply again, and reply 
properly.


I couldn't believe the response (on and off list) regarding help with 
IPv6 issues and issues in general.


I think that I'll be happy here ;)

Steve





Re: MySQL and Size Of bayes_expiry_max_db_size

2008-05-28 Thread Kris Deugau

Larry Nedry wrote:

Of course.  But how would I figure out what works best?  How can I tell if
it is working poorly or very well?


Results.Customer/user complaints are always useful (if perhaps 
not really desireable);  customer/user *feedback* is critical on 
anything bigger than a trivial personal or very-small-business system. 
You have to feed in a variety of legitimate email - finding spam to feed 
in shouldn't be a problem.



I'm looking for a way to calculate or experimentally find the sweet spot
for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
What happens if the size is too high?


I've found 600,000 works pretty well on a smallish filter server (about 
the same hardware class as your system, AKA "overkill" );  for the 
larger cluster serving between high single-digit and low double-digit 
thousands of accounts, plus filtering outbound mail, I've been playing 
with various settings on and off for several months now.  I still 
haven't found a happy balance.


(Side note - This question in various forms has been asked 3 or 4 times 
in the past month or so - could someone who really knows the Bayes 
innards please speak up?  As noted near the beginning of this thread, 
the default number of tokens is too small for anything much bigger than 
purely personal/per-user Bayes.)


Benny Pedersen's reply a few messages back includes a few points that 
made my own experiments become a lot more coherent;  I'll be doing 
further tuning based on that.  At the moment, for my usage, I'm looking 
at ~2M tokens as a floor.


-kgd


RE: MailChannels Traffic Control

2008-05-28 Thread Dan Barker
Robin: 
 
Of course we are not interested. This is a one-man shop with an annual
budget of about $50K. Why you put out the teaser on the Spamassassin User
list is totally beyond me.
 
Your "entry" level definition is very poorly defined. You'd get a lot more
small shops if you found a pricing structure that made some sense. For the
1,000 good email messages I receive each day, there are probably 25,000
connections attempted. Most small shops are going to be in this range.
 
You should either have a reasonable cut-off for the freebie, structure your
pricing reasonably, or not entice sysops (like myself) to go through all the
trouble of installing and testing your product, just to find it can not be
licensed at any reasonable level. When I read the 10K limit, I thought "Hey,
that's ten times what I receive - I'll try it!"
 
Dan Barker, President
Software Projects, Inc.

  _  

From: Robin Pollak [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 28, 2008 3:30 PM
To: 'Dan Barker'
Subject: RE: MailChannels Traffic Control



Hi Dan,

 

Traffic Control is enterprise software and as a result we charge a minimum
of $2500.00.   This is so that we can provide the level of support that
enterprise customers expect.  Please let me know if you are still
interested.  Thanks.

 

Sincerely,

 

Robin Pollak

Executive Assistant

 

MailChannels - Email Traffic Control

www.mailchannels.com

604 685 7488

 

From: Dan Barker [mailto:[EMAIL PROTECTED] 
Sent: May-28-08 11:23 AM
To: [EMAIL PROTECTED]
Subject: RE: MailChannels Traffic Control

 

I have about 17K connections (exclusive of Dictionary attacks culled by my
MTA) resulting in about  1 K valid messages per day. You told me that was
over the "limit" for the freebie and you'd get back to me with pricing info.
Should I contact Ken?

 

Dan

 

  _  

From: Robin Pollak [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 28, 2008 2:20 PM
To: 'Dan Barker'
Subject: RE: MailChannels Traffic Control

Hi Dan,

 

This will provide Traffic Control for up to 10,000 connections a day.  Any
pricing is set to individual needs.  If you have a client with more than
10,000 connections a day, I would be happy to have our CEO contact you to
discuss specific needs.  Thanks!

 

Sincerely,

 

Robin Pollak

Executive Assistant

 

MailChannels - Email Traffic Control

www.mailchannels.com

604 685 7488



Re: can we make AWL ignore mail from self to self?

2008-05-28 Thread Jo Rhett

On May 23, 2008, at 3:45 AM, Jonas Eckerman wrote:
1: Just read it as of when I said "your own users" I meant the users  
of the host in question (the ones you mention above). More  
specifically, the users using your host as a MSA (authenticated or  
locally).


I don't trust "my users" in this context.

2: I never suggested disabling the AWL entirely. I suggested  
disabling it for the above mentioned users.


I also suggested (and this is prefferable to disabling it in my  
opinion) to separate the AWL so that you use one AWL for mail from  
the above mentioned users and another for unathenticated mail from  
external relays.


Is there any specific reason you do not want to use two different  
AWLs for those two different types of traffic?


Non-standard configuration/setup I would have to maintain
  *AND*
A lot of work to hack around a simple problem.  The AWL works just  
fine for mail from "my users" to other "my users".  In fact, it works  
exceedingly well for that.  What value is there in separating them?


A more involved change would be to have the AWL store the  
authentication state as well as mail address and relay IP/16. When  
scanning mail from your own users using the same AWL database as  
for for mail to your users, this seems necessary to me.


Again, this seems to be a lot of work for no real gain.  What I  
have proposed makes sense for widespread use.  Why hack/slash/burn  
when a good fix would improve it for everyone?


In case you haven't noticed it, your suggestion is not seen as a  
"good fix" for the problem by everyone. I was merely suggesting  
other ways to go about this.


Actually, that's not true.  Nobody has suggested that this fix would  
be bad.  Matt was querying me thinking I had screwed up my trusted  
hosts, but not a single person has suggested that this change would be  
bad.


If you wish other peoiple to implement/accept something that fixes  
your problem and you can't convince them that your own ideas are  
good, it may be that alternative means of fixing the problem are  
seen as better and therefore stand a bigger chance of being  
implemented/eccepted.


What alternatives?  So far I've only heard (a) disable the AWL (b)  
don't use AWL it sucks and (c) hack the system to use different AWLs.   
None of which really make any logical sense to solve the problem.


If you do implement your fix and submit it, please make it an  
option. I for one would turn it off since it would not improve  
things here.


You are the first person to say so.  Can you explain why?

--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness





Re: rDNS none in stats with IPv6

2008-05-28 Thread Greg Troxel
  In my SA stats, the majority (+90%) of email inbound is classified as
  rdns_none.

  I have a suspicion that this is due to the IPv6-IPv4 mapped address
  being written into the headers when I am speaking to a non-native IPv6
  MTA:

  Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)
  by pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -

(I presume you are trying to make this server IPv6 only instead of dual
stack.  When my machine had a globally routable v6 address I got some
mail over v6 and some over v4, but didn't used mapped addresses.)

It seems that your SMTP listener is not correctly doing reverse dns
lookups of mapped addresses, and I'm not sure what the right fix is.
Either the SMTP code should notice the mapped address, pull out the v4
address, and look it up, or the resolver should do this automatically.

On my NetBSD 4 system (generally pretty hard core about this sort of
thing), "dig -x :::140.211.11.2" returns NXDOMAIN on a query of

;2.0.b.0.3.d.c.8.f.f.f.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. IN 
PTR

so I'd guess that it's not a normal expectation for a resolver to
extract the mapped address.

After the lookup issue is fixed, the received header would have the hostname.

>From looking at Received.pm, I don't see that SA is trying to do DNS
lookups; rnds_none seems to be about the MTA not having succeeded at
rdns lookup, not SA checking it later.  But if SA does look it up,
teaching it about mapped addresses might be needed too.




Re: rDNS none in stats with IPv6

2008-05-28 Thread SM

Hi Steve,
At 06:28 28-05-2008, Steve Bertrand wrote:

This may not be the appropriate list, but I'm hoping someone can help me.


It is the appropriate list.

I have an email server based on Matt Simerson's mail toaster 
(http://www.tnpi.biz/internet/mail/toaster/) that I've managed to 
get IPv6 compliant.


However, I'm having a very hard time determining exactly where the 
DNS checks are performed, and how to correct an issue.


In my SA stats, the majority (+90%) of email inbound is classified 
as rdns_none.


I have a suspicion that this is due to the IPv6-IPv4 mapped address 
being written into the headers when I am speaking to a non-native IPv6 MTA:


Received: from unknown (HELO mail.apache.org) 
(:::140.211.11.2)  by pearl.ibctech.ca with SMTP; 28 May 2008 
09:13:00 -


Can someone inform me if this is an SA thing, and if so, where to 
begin looking/testing with the source to correct this issue?


According to your header, there is no reverse DNS for that mail server.

If it is within a part of SpamAssassin, I will gladly submit any 
patches that identify/rectify my problem.


The Received headers are parsed in Received.pm.

Regards,
-sm 



Re: SQL DB schema issue

2008-05-28 Thread Michael Parker


On May 28, 2008, at 10:38 AM, Rocco Scappatura wrote:



Hello,

I'm using SA with SQL support under Amavid-new. My DBMS is MySQL.

I 'm preparing one another Antispam server and I ve installed the  
latest

stable software available.

I ve dumped bayes DB (schema + data) from an already working machine  
and

I ve restore them on the new machine.



How did you do this dump?  Which tables did you get?




But when I try to start amavisd in debug mode I get the following
errors:

May 28 17:37:29.010 av8.stt.vir /usr/local/sbin/amavisd[17102]:
SpamAssassin debug facilities: info
bayes: database version 0 is different than we understand (3),  
aborting!

at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
bayes: database version 0 is different than we understand (3),  
aborting!

at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
May 28 17:37:30.155 av8.stt.vir /usr/local/sbin/amavisd[17102]:
(!!)TROUBLE in pre_loop_hook: check: no loaded plugin implements
'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.
Suicide () TROUBLE in pre_loop_hook: check: no loaded plugin  
implements

'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.



The Check plugin is a worse problem and suggest something is really  
wonky with your install.






While the version specified in the database is really '3'.

What it could be the source of this error?




It looks for the version in the bayes_global_vars table, check to see  
what value is in there.


Michael


Thanks,

rocsca





RE: What are some of the most frequently used strings?...

2008-05-28 Thread Bowie Bailey
On Wed, 28 May 2008, Don Saklad wrote:
> a.
> What are a dozen or so of the most frequently used
> strings of characters in spam messages?... like rolex, maxgain, ...?

Common spam strings change constantly and are frequently obfuscated to
avoid simple string matches.  There is a rule set called "Sought" that
is automatically generated every 4 hours based on strings found in the
current spam traffic.  It has worked pretty well for me.

http://wiki.apache.org/spamassassin/SoughtRules

-- 
Bowie


Re: Spam from Gmail & Blogspot

2008-05-28 Thread Michelle Konzack
Am 2008-05-25 23:44:33, schrieb Sahil Tandon:
> I have tried contacting postmaster@ but their auto-response is uselessly 
> disingenuous.  In it, they insist that spam is never sent from Google 
> servers, and only from "miscreants" who forge @gmail.com addresses.
 END OF REPLIED MESSAGE 

I have checked the Received: headers several times and the messages  are
coming from  or .  Maybe they  should  be  listet  in
.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: What are some of the most frequently used strings?...

2008-05-28 Thread Chris St. Pierre

On Wed, 28 May 2008, Don Saklad wrote:


a.
What are a dozen or so of the most frequently used
strings of characters in spam messages?... like rolex, maxgain, ...?


Define "string."  If you mean "word," then here are the 12 most common
words in the TREC 2005 corpus, with the number of times they appear:

enron 94799
message 38187
subject 34751
please 31261
company 31257
original 29529
energy 28476
would 28449
power 23643
about 20734
which 19533
there 16392

The data's a little old, but it's sufficient to make the point of why
SpamAssassin doesn't just do naive word matching (and why you
shouldn't, either).

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University



SQL DB schema issue

2008-05-28 Thread Rocco Scappatura

Hello,

I'm using SA with SQL support under Amavid-new. My DBMS is MySQL.

I 'm preparing one another Antispam server and I ve installed the latest
stable software available.

I ve dumped bayes DB (schema + data) from an already working machine and
I ve restore them on the new machine.

But when I try to start amavisd in debug mode I get the following
errors:

May 28 17:37:29.010 av8.stt.vir /usr/local/sbin/amavisd[17102]:
SpamAssassin debug facilities: info
bayes: database version 0 is different than we understand (3), aborting!
at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
bayes: database version 0 is different than we understand (3), aborting!
at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
May 28 17:37:30.155 av8.stt.vir /usr/local/sbin/amavisd[17102]:
(!!)TROUBLE in pre_loop_hook: check: no loaded plugin implements
'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.
Suicide () TROUBLE in pre_loop_hook: check: no loaded plugin implements
'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.

While the version specified in the database is really '3'.

What it could be the source of this error?

Thanks,

rocsca


Re: uri rules

2008-05-28 Thread mouss

Randy Ramsdell wrote:

Matt Kettler wrote:

Joseph Brennan wrote:


I was surprised that this rule...

 uri CU_CN_LINK  /http:..\w+\.cn\b/

matches not only this...

 http://foobar.cn";>

but also this...

 http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn 
Domain



First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.
Actually, it doesn't.. your second example has two URIs as far as 
SpamAssassin is concerned. "http://www.columbia.edu/foo.html"; and 
"http://Kuxun.cn";. Two separate URIs.


Since many email clients "auto-link" domains in text portions, like 
www.google.com, SpamAssassin tries to find text strings that clients 
will treat as URIs and use them in the URI tests as well.




How so? How does spamassassin URI check determine Kuxun.cn  in a URI 
as opposed to someone who forgot to add a "space" after a sentence 
end? Is it because it is located within the "a" tag?


try putting this
   "I often forget spaces.it happens to me all the time..."
in a message and run with -D. you'll see:

...
[74536] dbg: uridnsbl: domains to query: spaces.it
...
[74536] dbg: rules: ran uri rule __LOCAL_PP_NONPPURL ==> got hit: 
"http://spaces.it";

...

As you see, SA can't guess that a space is missing, so it checks the 
"resulting" URI anyway.



Things get "tricky" when you want to hit things like
   Did you visit http://www.example.com/foo/bar?if so...
and you are looking for specific patterns in the "bar" part...





Second, I can't figure out how \w+ matches the punctuation and spaces!

It doesn't. :)








Re: uri rules

2008-05-28 Thread mouss

Joseph Brennan wrote:


Thanks, Mouss and Matt.

So a uri regexp will match a "http://"; that is not there.  OK, well...



SA tries to check based on what MUAs do. if you write
   "please visit www.example.com"
then so-called "modern" MUAs will highlight www.example.com and if you 
bring your mouse over it, you'll see that it points to 
http://www.example.com.


even in the browser address bar, you can omit the "http://"; part (it is 
the default "scheme" for URIs).


While this is sometimes annoying (and/or surprising), it works as 
intended most of the time. and this is what really matters.




Re: uri rules

2008-05-28 Thread Randy Ramsdell

Matt Kettler wrote:

Joseph Brennan wrote:


I was surprised that this rule...

 uri CU_CN_LINK  /http:..\w+\.cn\b/

matches not only this...

 http://foobar.cn";>

but also this...

 http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn 
Domain



First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.
Actually, it doesn't.. your second example has two URIs as far as 
SpamAssassin is concerned. "http://www.columbia.edu/foo.html"; and 
"http://Kuxun.cn";. Two separate URIs.


Since many email clients "auto-link" domains in text portions, like 
www.google.com, SpamAssassin tries to find text strings that clients 
will treat as URIs and use them in the URI tests as well.




How so? How does spamassassin URI check determine Kuxun.cn  in a URI as 
opposed to someone who forgot to add a "space" after a sentence end? Is 
it because it is located within the "a" tag?


Second, I can't figure out how \w+ matches the punctuation and spaces!

It doesn't. :)






Re: uri rules

2008-05-28 Thread Joseph Brennan


Thanks, Mouss and Matt.

So a uri regexp will match a "http://"; that is not there.  OK, well...

Joe Brennan





SARE RULES bugs

2008-05-28 Thread jdow
[12734] dbg: rules: meta test DIGEST_MULTIPLE has undefined dependency 
'DCC_CHECK'
[12734] dbg: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 
'SARE_XMAIL_SUSP2'
[12734] dbg: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 
'X_AUTH_WARN_FAKED'
[12734] dbg: rules: meta test SARE_HEAD_XORIP_NOTIP has undefined dependency 
'X_ORIG_IPNOT_IPV4'
[12734] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 
'SARE_RD_SAFE_MKSHRT'
[12734] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 
'SARE_RD_SAFE_GT'
[12734] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 
'SARE_RD_SAFE_TINY'





Oops - have you guys given up maintaining the rules?

{^_^} 



Possible denial of service bug ?

2008-05-28 Thread Rick Macdougall

Hi,

I've got have a message that seems to tie up spamd forever.

I'm not sure if it's my setup or spamd itself.  I run a very generic 
stable release setup with bayes in mysql, although the hang up does not 
appear to be bayes.


Would one of the developers like to contact me off list to get a copy of 
my debugging and possibly a copy of the email in question ?


Regards,

Rick


Re: uri rules

2008-05-28 Thread Matt Kettler

Joseph Brennan wrote:


I was surprised that this rule...

 uri CU_CN_LINK  /http:..\w+\.cn\b/

matches not only this...

 http://foobar.cn";>

but also this...

 http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn 
Domain



First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.
Actually, it doesn't.. your second example has two URIs as far as 
SpamAssassin is concerned. "http://www.columbia.edu/foo.html"; and 
"http://Kuxun.cn";. Two separate URIs.


Since many email clients "auto-link" domains in text portions, like 
www.google.com, SpamAssassin tries to find text strings that clients 
will treat as URIs and use them in the URI tests as well.




Second, I can't figure out how \w+ matches the punctuation and spaces!

It doesn't. :)




rDNS none in stats with IPv6

2008-05-28 Thread Steve Bertrand

Hi everyone,

This may not be the appropriate list, but I'm hoping someone can help me.

I have an email server based on Matt Simerson's mail toaster 
(http://www.tnpi.biz/internet/mail/toaster/) that I've managed to get 
IPv6 compliant.


However, I'm having a very hard time determining exactly where the DNS 
checks are performed, and how to correct an issue.


In my SA stats, the majority (+90%) of email inbound is classified as 
rdns_none.


I have a suspicion that this is due to the IPv6-IPv4 mapped address 
being written into the headers when I am speaking to a non-native IPv6 MTA:


Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)  by 
pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -


Can someone inform me if this is an SA thing, and if so, where to begin 
looking/testing with the source to correct this issue?


If it is within a part of SpamAssassin, I will gladly submit any patches 
that identify/rectify my problem.


Thanks, and regards,

Steve






Re: uri rules

2008-05-28 Thread mouss

Joseph Brennan wrote:


I was surprised that this rule...

 uri CU_CN_LINK  /http:..\w+\.cn\b/

matches not only this...

 http://foobar.cn";>

but also this...

 http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn 
Domain



First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.



it actually hits "Kuxun.cn" (not the href part). The reason is that some 
spammers put uris without the http part (and without href).


the drawback is that uri checks may hit things that are not really 
domains. this includes ldap strings, program names (program.com), ... etc.



  That's
useful but not real clear in Mail::SpamAssassin::Conf.

Second, I can't figure out how \w+ matches the punctuation and spaces!


see above. just run with -D and you'll see
...
[73674] dbg: rules: ran uri rule CU_CN_LINK ==> got hit: 
"http://Kuxun.cn";

...




Joseph Brennan
Columbia University I T






uri rules

2008-05-28 Thread Joseph Brennan


I was surprised that this rule...

 uri CU_CN_LINK  /http:..\w+\.cn\b/

matches not only this...

 http://foobar.cn";>

but also this...

 http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn Domain


First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.

Second, I can't figure out how \w+ matches the punctuation and spaces!

Joseph Brennan
Columbia University I T




What are some of the most frequently used strings?...

2008-05-28 Thread Don Saklad
a.
What are a dozen or so of the most frequently used
strings of characters in spam messages?... like rolex, maxgain, ...?

b.
Around the web where are they any such lists?...


Re: Spam from Gmail & Blogspot

2008-05-28 Thread Matus UHLAR - fantomas
> Sahil Tandon wrote:
> > Jonathan Nichols <[EMAIL PROTECTED]> wrote:
> >> I've been getting quite a lot of spam that's coming *directly* from 
> >> Google, 
> >> using Google servers and referencing blogspot.com (also a Google property) 
> >> URLs. I've been submitting them to URIBL but naturally, they're constantly 
> >> changing.
> > 
> > Same problem here.  For us, the amount of spam originating directly from 
> > Google has increased noticeably over the last two months.

On 27.05.08 18:16, AxisInternet wrote:
> Yes, unfortunately, Google has joined Hotmail and Yahoo lately as being a
> part of the problem and no the solution. It's unfortunate.
> 
> I do find that I block the bulk of it though with some good manual bayes
> training...

I think it's more about spammers, law and security of (mostly) home
computers. In my employer's company we also notice spam increase from our
network etc. and google as free mail provider is also just the victim.

Yes, providers can do much with the spam problem. But not all.

-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.


Re: MySQL and Size Of bayes_expiry_max_db_size

2008-05-28 Thread Benny Pedersen

On Wed, May 28, 2008 00:04, Larry Nedry wrote:

> I'm looking for a way to calculate or experimentally find the sweet spot
> for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
> What happens if the size is too high?

what happen is when the size is to big the more ham/spam training needs to be
performed to have effect on bayes

the lower bayes size, faster learning, but olso a bit unstable

to get it:

1: if you want manual training keep sizes low
2: otherwize raise bayes size to be bigger to compensate for no manuel training

always monitor bayes anyway will spot if it works or not, for the bayes
autolearn one can make the range bigger to get more static laerning olso, so
if bayes updates takes lots of time pr msg, this is how to make it more
silence

most important is that bayes is doing it right eg only give bayes_99 for spam,
and bayes_00 for ham

last but not least make sure there is equal learned ham / spam signatures



Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098