RE: Rule for Russian character sets (=?koi8-r? not quite a charset)

2008-02-15 Thread Karsten Bräckelmann
On Fri, 2008-02-15 at 17:10 +1300, Michael Hutchinson wrote:
  From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED]

  Why are you guys now trying to re-invent the wheel in the special case
  of a gray asphalt street? What about a dirt track, grass, and anything
  else a wheel works on?
  
  I've pointed it out before. Just use ok_locales, which is all about
  these char sets. No REs, almost no thinking required, no headache. A
  single line, and you're done.
 
 We don't want to only allow the English locale, because we (here at
 my work) do not want our international clients (non Russian) to be
 denied email service. 

ok_locales  en ja ko th zh

This will allow anything but Cyrillic char sets. Please note that en
does *not* mean English locale despite its name. It applies to all
Western charsets, including German Umlauts, Swedisch, French, Turkish,
etc. Basically everything that uses the characters in this post, plus
language specific chars.


 That aside, I really don't think getting detailed with Regular
 Expressions is re-inventing the wheel. Rather, it is expanding
 knowledge that will help write better rules in the future. (More
 flexible wheels, in your context).
 
 Although I appreciated your earlier post of 'ok_locales', and
 understood it, I did not appreciate your Troll.

Sorry, I did not mean to troll nor any kind of offense.

However, you missed my point. Getting detailed with REs is a good thing,
sure. I was not about that -- but the RE in question does not properly
handle charset encoding. See the Subject for an example which is not
encoding, but will be matched by your rule.

My point was, that the rule discussed aims at being something that it
unfortunately is not, because charset encoding is slightly more complex
and definitely requires a closing part. A Regular Expression that does
this can be found in check_for_faraway_charset_in_headers() in
HeaderEval.pm:
  $hdr =~ /=\?(.+?)\?.\?.*?\?=/g

Hence, the my re-inventing the wheel analogy. And these wheels are quite
flexible, too. ;-)

Also, your rule applies to the Subject only, whereas ok_locales does
check all MIME parts and will trigger on Russian spam with a western
Subject.


Hope this clarifies my previous posts and is appreciated again...

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for Russian character sets

2008-02-15 Thread Paul Douglas Franklin


I believe that what you are asking for is
meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY  __OTHER_RULE)
That requires first that you have set up ok_locales.
--Paul

Rosenbaum, Larry M. wrote:

From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED]

I've pointed it out before. Just use ok_locales, which is all about
these char sets. No REs, almost no thinking required, no headache. A
single line, and you're done.



What's the best way to test the character set for use in a meta rule?  We don't 
want to reject all messages with the Russian (Cyrillic) character set, but we 
may want to use something like

if (character set is Russian)  (body contains 'xyzzy')

for instance.  How would we test the character set?
  


--
Paul Douglas Franklin
Computer Manager, Union Gospel Mission of Yakima, Washington
Husband of Danette
Father of Laurene, Miriam, Tycko, Timothy, Sarabeth, Marie, Dawnita, Anna Leah, 
Alexander, and Caleb



Re: Rule for Russian character sets

2008-02-15 Thread jidanni
KB If you want to trigger on Russian only, list all but ru.
What if to catch Ms. Ba'loney  Margar'ine, airport security had to keep a
current list of all the other people in the world. So this is the
wrong approach, as we've been thru before. OK, bye.


Re: Rule for Russian character sets

2008-02-15 Thread McDonald, Dan

On Fri, 2008-02-15 at 11:04 -0800, Paul Douglas Franklin wrote:
 I believe that what you are asking for is
 meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY  __OTHER_RULE)
 That requires first that you have set up ok_locales.

If you have TextCat enabled, then the X-Language: meta header will be
added and can be used with rules, although it doesn't show up in the
output.

I don't think that there is an equivalent X-Locales: 


 --Paul
 
 Rosenbaum, Larry M. wrote:
  From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED]
 
  I've pointed it out before. Just use ok_locales, which is all about
  these char sets. No REs, almost no thinking required, no headache. A
  single line, and you're done.
  
 
  What's the best way to test the character set for use in a meta rule?  We 
  don't want to reject all messages with the Russian (Cyrillic) character 
  set, but we may want to use something like
 
  if (character set is Russian)  (body contains 'xyzzy')
 
  for instance.  How would we test the character set?

 
-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com



signature.asc
Description: This is a digitally signed message part


v3.2.4 scan times slow

2008-02-15 Thread Sean Kennedy
I recently upgraded from v3.1.9 to v3.2.4 and I've noticed a substantial 
increase in scan times.  The general average scantime with v3.1 was 
about 1.2s and now with v3.2 it's about 2.2s.  It's enough of a slow 
down so that my mail queue backs quite easily now.


So I'm trying to debug SA and figure out whats going on by doing -D 
--lint and I've got a couple questions about some of the output.


1) Why am I getting lines like the following and how do I correct it?

[14896] dbg: rules: SARE_HTML_ALT_WAIT1 merged duplicates: 
SARE_HTML_ALT_WAIT2 SARE_HTML_A_NULL SARE_HTML_BADOPEN 
SARE_HTML_BAD_FG_CLR SARE_HTML_COLOR_NWHT3 SARE_HTML_FONT_INVIS2 
SARE_HTML_FSIZE_1ALL SARE_HTML_GIF_DIM SARE_HTML_H2_CLK 
SARE_HTML_HTML_AFTER SARE_HTML_INV_TAGA SARE_HTML_JSCRIPT_ENC 
SARE_HTML_JVS_HREF SARE_HTML_MANY_BR10 SARE_HTML_NO_BODY 
SARE_HTML_NO_HTML1 SARE_HTML_P_JUSTIFY SARE_HTML_URI_2SLASH 
SARE_HTML_URI_AXEL SARE_HTML_URI_BADQRY SARE_HTML_URI_BUG 
SARE_HTML_URI_FORMPHP SARE_HTML_URI_HREF SARE_HTML_URI_MANYP2 
SARE_HTML_URI_MANYP3 SARE_HTML_URI_NUMPHP3 SARE_HTML_URI_OBFU4 
SARE_HTML_URI_OBFU4a SARE_HTML_URI_OPTPHP SARE_HTML_URI_REFID 
SARE_HTML_URI_RID SARE_HTML_URI_RM SARE_HTML_USL_MULT



2) It hangs for like 30 seconds on the following line, what exactly is 
it doing and is it necessary?


[14924] dbg: rules: running uri tests; score so far=1.5


It takes about 5s to run -D --lint on my boxes running v3.1, but about 
50s to 1m10s using v3.2 (same hardware on all boxes).


Any info is greatly appreciated!

Sean


RE: Rule for Russian character sets

2008-02-15 Thread Rosenbaum, Larry M.
 From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED]

 I've pointed it out before. Just use ok_locales, which is all about
 these char sets. No REs, almost no thinking required, no headache. A
 single line, and you're done.

What's the best way to test the character set for use in a meta rule?  We don't 
want to reject all messages with the Russian (Cyrillic) character set, but we 
may want to use something like

if (character set is Russian)  (body contains 'xyzzy')

for instance.  How would we test the character set?


RE: Rule for Russian character sets

2008-02-15 Thread Karsten Bräckelmann
On Fri, 2008-02-15 at 11:49 -0500, Rosenbaum, Larry M. wrote:
  From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED]
 
  I've pointed it out before. Just use ok_locales, which is all about
  these char sets. No REs, almost no thinking required, no headache. A
  single line, and you're done.
 
 What's the best way to test the character set for use in a meta rule? 
 We don't want to reject

SA doesn't reject anyway. It merely classifies and tags mail.

 all messages with the Russian (Cyrillic)
 character set, but we may want to use something like
 
 if (character set is Russian)  (body contains 'xyzzy')

Well, it depends...

If it is ok for you to treat all char sets, which you did not set in
ok_locales, the same way, then it is just a regular meta rule -- and
based on my understanding of your description re-scoring of the few
CHARSET_FARAWY rules.

 for instance.  How would we test the character set?

This I believe can not be done with the current HeaderEval plugin, since
it does not report the char set, but treats all unwanted char sets the
same. However, if you need fine grained rules per char set, it should be
fairly easy to alter the existing plugin or to write custom rules or
plugin based on this.

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: v3.2.4 scan times slow

2008-02-15 Thread Sean Kennedy
Sorry for replying to my own topic, but I've figured out what's causing 
it to go so slow.


It's the rules in sa-blacklist.current.uri.cf from 
http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf.


This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any 
insight?


Thanks,
Sean


Sean Kennedy wrote:
I recently upgraded from v3.1.9 to v3.2.4 and I've noticed a substantial 
increase in scan times.  The general average scantime with v3.1 was 
about 1.2s and now with v3.2 it's about 2.2s.  It's enough of a slow 
down so that my mail queue backs quite easily now.


So I'm trying to debug SA and figure out whats going on by doing -D 
--lint and I've got a couple questions about some of the output.


1) Why am I getting lines like the following and how do I correct it?

[14896] dbg: rules: SARE_HTML_ALT_WAIT1 merged duplicates: 
SARE_HTML_ALT_WAIT2 SARE_HTML_A_NULL SARE_HTML_BADOPEN 
SARE_HTML_BAD_FG_CLR SARE_HTML_COLOR_NWHT3 SARE_HTML_FONT_INVIS2 
SARE_HTML_FSIZE_1ALL SARE_HTML_GIF_DIM SARE_HTML_H2_CLK 
SARE_HTML_HTML_AFTER SARE_HTML_INV_TAGA SARE_HTML_JSCRIPT_ENC 
SARE_HTML_JVS_HREF SARE_HTML_MANY_BR10 SARE_HTML_NO_BODY 
SARE_HTML_NO_HTML1 SARE_HTML_P_JUSTIFY SARE_HTML_URI_2SLASH 
SARE_HTML_URI_AXEL SARE_HTML_URI_BADQRY SARE_HTML_URI_BUG 
SARE_HTML_URI_FORMPHP SARE_HTML_URI_HREF SARE_HTML_URI_MANYP2 
SARE_HTML_URI_MANYP3 SARE_HTML_URI_NUMPHP3 SARE_HTML_URI_OBFU4 
SARE_HTML_URI_OBFU4a SARE_HTML_URI_OPTPHP SARE_HTML_URI_REFID 
SARE_HTML_URI_RID SARE_HTML_URI_RM SARE_HTML_USL_MULT



2) It hangs for like 30 seconds on the following line, what exactly is 
it doing and is it necessary?


[14924] dbg: rules: running uri tests; score so far=1.5


It takes about 5s to run -D --lint on my boxes running v3.1, but about 
50s to 1m10s using v3.2 (same hardware on all boxes).


Any info is greatly appreciated!

Sean


Re: Rule for Russian character sets

2008-02-15 Thread Karsten Bräckelmann
On Sat, 2008-02-16 at 04:26 +0800, [EMAIL PROTECTED] wrote:
 KB If you want to trigger on Russian only, list all but ru.
 What if to catch Ms. Ba'loney  Margar'ine, airport security had to keep a
 current list of all the other people in the world. So this is the
 wrong approach, as we've been thru before. OK, bye.

Thank you for your most valuable contribution.

Yes, we've been through this before. However, it seems you still don't
understand. There IS NO negated counterpart to ok_locales. Also, this is
not about languages, but character sets -- and there are exactly 6. So,
listing all but one in this context doesn't seem to be asking too much.

Instead of ranting, just try to understand ok_locales as an option to
list all character sets you can read. For most people, this boils down
to one or two anyway. Thus, the general usecase is to list just these.

Also, the OP specifically asked to catch Russian only. Listing 5 locales
is the only way to do this currently. If you know about a better way,
please let me know.

Otherwise, you just wasted everyone's time. Had a bad day, eh?

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: v3.2.4 scan times slow

2008-02-15 Thread John Hardin

On Fri, 15 Feb 2008, Sean Kennedy wrote:

It's the rules in sa-blacklist.current.uri.cf from 
http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf.


This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any 
insight?


Don't use it. It's a huge list of URIs that are better caught by URBIL 
rules.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  End users want eye candy and the ooo's and hhh's experience
  when reading mail. To them email isn't a tool, but an entertainment
  form. -- Steve Lake
---
 7 days until George Washington's 276th Birthday


Re: Getting ? in spam scores.

2008-02-15 Thread fchan

Hello,
Here is a complete sample without a link (because 
apache.org bounced the message due the spam 
content) with logs relevant to the message. I 
have tar.gz/tgz the message to hopefully pass the 
spam filter.


Here is the message:
Return-Path: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
X-Spam-Status: No, hits=? required=?
Message-ID: [EMAIL PROTECTED]
From: Rita Gore [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED]
Subject: Size Genetics Warning
Date: Fri, 15 Feb 2008 17:39:26 -0100
Content-Type: text/plain;
format=flowed;
reply-type=original
Content-Transfer-Encoding: 7bit

Gain 3.5+ Inches In Length 100% Safe To Take, With NO Side Effects.



Here is the qmail-queue.log:
Fri, 15 Feb 2008 08:39:54 PST:21158: SA: finished 
scan in 50.013946 secs - hits=?/?

Fri, 15 Feb 2008 08:39:54 PST:21158: p_s: finished scan in 0.007968 secs
Fri, 15 Feb 2008 08:39:54 PST:21158: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309354376421158...
Fri, 15 Feb 2008 08:39:54 PST:21158: -- 
Process 21158 finished. Total of 50.174236 secs
Fri, 15 Feb 2008 08:39:55 PST:21298: +++ starting 
debugging for process 21298 (ppid=21271) by 
uid=509
Fri, 15 Feb 2008 08:39:55 PST:21298: c_a_g: found 
URL in message - maybe phishy - better scan it
Fri, 15 Feb 2008 08:39:55 PST:21298: w_c: Total 
time between DATA command and . was 0.000196 
secs

Fri, 15 Feb 2008 08:39:55 PST:21298: w_c: elapsed time from start 0.000177 secs
Fri, 15 Feb 2008 08:39:55 PST:21298: g_e_h: 
return-path='[EMAIL PROTECTED]', 
recips='[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED]'
Fri, 15 Feb 2008 08:39:55 PST:21298: from='Rita 
Gore [EMAIL PROTECTED]', 
subj='Size Genetics Warning', via SMTP from 
79.26.135.208

Fri, 15 Feb 2008 08:39:55 PST:21298: clamdscan: finished scan in 0.014551 secs
Fri, 15 Feb 2008 08:40:45 PST:21298: SA: finished 
scan in 50.020665 secs - hits=?/?
Fri, 15 Feb 2008 08:40:46 PST:21298: p_s: 
finished scan in 0.008445004 secs
Fri, 15 Feb 2008 08:40:46 PST:21298: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309359576421298...
Fri, 15 Feb 2008 08:40:46 PST:21298: -- 
Process 21298 finished. Total of 50.133095 secs


But notices these also at right after this message:
Fri, 15 Feb 2008 08:40:45 PST:21298: SA: finished 
scan in 50.020665 secs - hits=?/?
Fri, 15 Feb 2008 08:40:46 PST:21298: p_s: 
finished scan in 0.008445004 secs
Fri, 15 Feb 2008 08:40:46 PST:21298: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309359576421298...
Fri, 15 Feb 2008 08:40:46 PST:21298: -- 
Process 21298 finished. Total of 50.133095 secs
Fri, 15 Feb 2008 08:40:46 PST:21299: SA: finished 
scan in 50.01334 secs - hits=?/?

Fri, 15 Feb 2008 08:40:46 PST:21299: p_s: finished scan in 0.009365 secs
Fri, 15 Feb 2008 08:40:46 PST:21299: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309359676421299...
Fri, 15 Feb 2008 08:40:46 PST:21299: -- 
Process 21299 finished. Total of 50.215451 secs
Fri, 15 Feb 2008 08:41:01 PST:21376: SA: finished 
scan in 50.061759 secs - hits=?/?

Fri, 15 Feb 2008 08:41:01 PST:21376: p_s: finished scan in 0.102243 secs
Fri, 15 Feb 2008 08:41:01 PST:21376: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309361076421376...
Fri, 15 Feb 2008 08:41:02 PST:21376: -- 
Process 21376 finished. Total of 50.796067 secs
Fri, 15 Feb 2008 08:41:02 PST:21395: SA: finished 
scan in 50.014535 secs - hits=?/?

Fri, 15 Feb 2008 08:41:02 PST:21395: p_s: finished scan in 0.008081 secs
Fri, 15 Feb 2008 08:41:02 PST:21395: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309361276421395...
Fri, 15 Feb 2008 08:41:02 PST:21391: SA: finished 
scan in 50.102585 secs - hits=?/?

Fri, 15 Feb 2008 08:41:02 PST:21391: p_s: finished scan in 0.012847 secs
Fri, 15 Feb 2008 08:41:03 PST:21391: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309361276421391...
Fri, 15 Feb 2008 08:41:03 PST:21395: -- 
Process 21395 finished. Total of 50.430792 secs
Fri, 15 Feb 2008 08:41:03 PST:21391: -- 
Process 21391 finished. Total of 50.258332 secs
Fri, 15 Feb 2008 08:41:03 PST:21538: +++ starting 
debugging for process 21538 (ppid=21529) by 
uid=509
Fri, 15 Feb 2008 08:41:06 PST:21406: SA: finished 
scan in 50.016036 secs - hits=?/?

Fri, 15 Feb 2008 08:41:06 PST:21406: p_s: finished scan in 0.008182 secs
Fri, 15 Feb 2008 08:41:06 PST:21406: ini_sc: 
finished scan of 
/var/spool/qmailscan/tmp/s1.molsci.org120309361376421406...
Fri, 15 Feb 2008 08:41:07 PST:21406: -- 
Process 21406 finished. Total of 50.81682 secs



Here is the maillog for that period of time:
Feb 15 08:38:39 s1 spamd[19278]: spamd: checking 
message [EMAIL PROTECTED] 
for qscand:510
Feb 15 08:40:47 s1 spamd[19278]: spamd: 
identified spam (44.9/8.5) for qscand:510 in 

Whois info?

2008-02-15 Thread Marc Perkel
Is there any place to easily query whois information to determine on a 
mass scale how old a domain is?




Re: Whois info?

2008-02-15 Thread Duane Hill
On Fri, 15 Feb 2008 17:34:09 -0800
Marc Perkel [EMAIL PROTECTED] wrote:

 Is there any place to easily query whois information to determine on
 a mass scale how old a domain is?

Don't know myself. All I can say is don't query whois on Network
Solutions for possible availability of a domain to register. They will
lock the domain for a short period of time where you MUST register with
them during that period.

---
  _|_
 (_| |


Re: v3.2.4 scan times slow

2008-02-15 Thread Matt Kettler

Sean Kennedy wrote:
Sorry for replying to my own topic, but I've figured out what's 
causing it to go so slow.


It's the rules in sa-blacklist.current.uri.cf from 
http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf. 



This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, 
any insight?

Quite frankly, I'm surprised it worked in 3.1.

My guess is that something about the URI processing code changed in 3.2 
to be less efficient when massively overloaded with rules. That's not 
too surprising, as a lot of ways of optimizing performance for a 
moderate sized set of operations perform horribly when presented with 
absurdly large ones (and generally vice versa.. algorithms best at large 
sets tend to have lots of setup, and perform poorly with small sets..).


ie: the shell sort is one of the fastest sorting algorithms for small 
sets, but is really slow for large sets.


Of course, I'm purely pontificating here, however it would not surprise 
me to discover an optimization of SA causes worse performance when 
strained this way.


besides, sa-blacklist is 100% redundant with the WS list of surbl.org, 
which is supported over DNS in SA by default.








Re: v3.2.4 scan times slow

2008-02-15 Thread Jeff Chan

Quoting Sean Kennedy [EMAIL PROTECTED]:


Sorry for replying to my own topic, but I've figured out what's causing
it to go so slow.

It's the rules in sa-blacklist.current.uri.cf from
http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf.

This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any
insight?


DO NOT USE sa-blacklist.current.uri.cf.  Use multi.surbl.org instead:

  http://www.surbl.org/lists.html#ws

Specifically, enable network tests and SURBLs are used by default:

  http://www.surbl.org/faq.html#nettest

Jeff C.