RE: Rule for Russian character sets (=?koi8-r? not quite a charset)
On Fri, 2008-02-15 at 17:10 +1300, Michael Hutchinson wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] Why are you guys now trying to re-invent the wheel in the special case of a gray asphalt street? What about a dirt track, grass, and anything else a wheel works on? I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. That aside, I really don't think getting detailed with Regular Expressions is re-inventing the wheel. Rather, it is expanding knowledge that will help write better rules in the future. (More flexible wheels, in your context). Although I appreciated your earlier post of 'ok_locales', and understood it, I did not appreciate your Troll. Sorry, I did not mean to troll nor any kind of offense. However, you missed my point. Getting detailed with REs is a good thing, sure. I was not about that -- but the RE in question does not properly handle charset encoding. See the Subject for an example which is not encoding, but will be matched by your rule. My point was, that the rule discussed aims at being something that it unfortunately is not, because charset encoding is slightly more complex and definitely requires a closing part. A Regular Expression that does this can be found in check_for_faraway_charset_in_headers() in HeaderEval.pm: $hdr =~ /=\?(.+?)\?.\?.*?\?=/g Hence, the my re-inventing the wheel analogy. And these wheels are quite flexible, too. ;-) Also, your rule applies to the Subject only, whereas ok_locales does check all MIME parts and will trigger on Russian spam with a western Subject. Hope this clarifies my previous posts and is appreciated again... guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for Russian character sets
I believe that what you are asking for is meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY __OTHER_RULE) That requires first that you have set up ok_locales. --Paul Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set? -- Paul Douglas Franklin Computer Manager, Union Gospel Mission of Yakima, Washington Husband of Danette Father of Laurene, Miriam, Tycko, Timothy, Sarabeth, Marie, Dawnita, Anna Leah, Alexander, and Caleb
Re: Rule for Russian character sets
KB If you want to trigger on Russian only, list all but ru. What if to catch Ms. Ba'loney Margar'ine, airport security had to keep a current list of all the other people in the world. So this is the wrong approach, as we've been thru before. OK, bye.
Re: Rule for Russian character sets
On Fri, 2008-02-15 at 11:04 -0800, Paul Douglas Franklin wrote: I believe that what you are asking for is meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY __OTHER_RULE) That requires first that you have set up ok_locales. If you have TextCat enabled, then the X-Language: meta header will be added and can be used with rules, although it doesn't show up in the output. I don't think that there is an equivalent X-Locales: --Paul Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set? -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
v3.2.4 scan times slow
I recently upgraded from v3.1.9 to v3.2.4 and I've noticed a substantial increase in scan times. The general average scantime with v3.1 was about 1.2s and now with v3.2 it's about 2.2s. It's enough of a slow down so that my mail queue backs quite easily now. So I'm trying to debug SA and figure out whats going on by doing -D --lint and I've got a couple questions about some of the output. 1) Why am I getting lines like the following and how do I correct it? [14896] dbg: rules: SARE_HTML_ALT_WAIT1 merged duplicates: SARE_HTML_ALT_WAIT2 SARE_HTML_A_NULL SARE_HTML_BADOPEN SARE_HTML_BAD_FG_CLR SARE_HTML_COLOR_NWHT3 SARE_HTML_FONT_INVIS2 SARE_HTML_FSIZE_1ALL SARE_HTML_GIF_DIM SARE_HTML_H2_CLK SARE_HTML_HTML_AFTER SARE_HTML_INV_TAGA SARE_HTML_JSCRIPT_ENC SARE_HTML_JVS_HREF SARE_HTML_MANY_BR10 SARE_HTML_NO_BODY SARE_HTML_NO_HTML1 SARE_HTML_P_JUSTIFY SARE_HTML_URI_2SLASH SARE_HTML_URI_AXEL SARE_HTML_URI_BADQRY SARE_HTML_URI_BUG SARE_HTML_URI_FORMPHP SARE_HTML_URI_HREF SARE_HTML_URI_MANYP2 SARE_HTML_URI_MANYP3 SARE_HTML_URI_NUMPHP3 SARE_HTML_URI_OBFU4 SARE_HTML_URI_OBFU4a SARE_HTML_URI_OPTPHP SARE_HTML_URI_REFID SARE_HTML_URI_RID SARE_HTML_URI_RM SARE_HTML_USL_MULT 2) It hangs for like 30 seconds on the following line, what exactly is it doing and is it necessary? [14924] dbg: rules: running uri tests; score so far=1.5 It takes about 5s to run -D --lint on my boxes running v3.1, but about 50s to 1m10s using v3.2 (same hardware on all boxes). Any info is greatly appreciated! Sean
RE: Rule for Russian character sets
From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set?
RE: Rule for Russian character sets
On Fri, 2008-02-15 at 11:49 -0500, Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject SA doesn't reject anyway. It merely classifies and tags mail. all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') Well, it depends... If it is ok for you to treat all char sets, which you did not set in ok_locales, the same way, then it is just a regular meta rule -- and based on my understanding of your description re-scoring of the few CHARSET_FARAWY rules. for instance. How would we test the character set? This I believe can not be done with the current HeaderEval plugin, since it does not report the char set, but treats all unwanted char sets the same. However, if you need fine grained rules per char set, it should be fairly easy to alter the existing plugin or to write custom rules or plugin based on this. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: v3.2.4 scan times slow
Sorry for replying to my own topic, but I've figured out what's causing it to go so slow. It's the rules in sa-blacklist.current.uri.cf from http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf. This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any insight? Thanks, Sean Sean Kennedy wrote: I recently upgraded from v3.1.9 to v3.2.4 and I've noticed a substantial increase in scan times. The general average scantime with v3.1 was about 1.2s and now with v3.2 it's about 2.2s. It's enough of a slow down so that my mail queue backs quite easily now. So I'm trying to debug SA and figure out whats going on by doing -D --lint and I've got a couple questions about some of the output. 1) Why am I getting lines like the following and how do I correct it? [14896] dbg: rules: SARE_HTML_ALT_WAIT1 merged duplicates: SARE_HTML_ALT_WAIT2 SARE_HTML_A_NULL SARE_HTML_BADOPEN SARE_HTML_BAD_FG_CLR SARE_HTML_COLOR_NWHT3 SARE_HTML_FONT_INVIS2 SARE_HTML_FSIZE_1ALL SARE_HTML_GIF_DIM SARE_HTML_H2_CLK SARE_HTML_HTML_AFTER SARE_HTML_INV_TAGA SARE_HTML_JSCRIPT_ENC SARE_HTML_JVS_HREF SARE_HTML_MANY_BR10 SARE_HTML_NO_BODY SARE_HTML_NO_HTML1 SARE_HTML_P_JUSTIFY SARE_HTML_URI_2SLASH SARE_HTML_URI_AXEL SARE_HTML_URI_BADQRY SARE_HTML_URI_BUG SARE_HTML_URI_FORMPHP SARE_HTML_URI_HREF SARE_HTML_URI_MANYP2 SARE_HTML_URI_MANYP3 SARE_HTML_URI_NUMPHP3 SARE_HTML_URI_OBFU4 SARE_HTML_URI_OBFU4a SARE_HTML_URI_OPTPHP SARE_HTML_URI_REFID SARE_HTML_URI_RID SARE_HTML_URI_RM SARE_HTML_USL_MULT 2) It hangs for like 30 seconds on the following line, what exactly is it doing and is it necessary? [14924] dbg: rules: running uri tests; score so far=1.5 It takes about 5s to run -D --lint on my boxes running v3.1, but about 50s to 1m10s using v3.2 (same hardware on all boxes). Any info is greatly appreciated! Sean
Re: Rule for Russian character sets
On Sat, 2008-02-16 at 04:26 +0800, [EMAIL PROTECTED] wrote: KB If you want to trigger on Russian only, list all but ru. What if to catch Ms. Ba'loney Margar'ine, airport security had to keep a current list of all the other people in the world. So this is the wrong approach, as we've been thru before. OK, bye. Thank you for your most valuable contribution. Yes, we've been through this before. However, it seems you still don't understand. There IS NO negated counterpart to ok_locales. Also, this is not about languages, but character sets -- and there are exactly 6. So, listing all but one in this context doesn't seem to be asking too much. Instead of ranting, just try to understand ok_locales as an option to list all character sets you can read. For most people, this boils down to one or two anyway. Thus, the general usecase is to list just these. Also, the OP specifically asked to catch Russian only. Listing 5 locales is the only way to do this currently. If you know about a better way, please let me know. Otherwise, you just wasted everyone's time. Had a bad day, eh? guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: v3.2.4 scan times slow
On Fri, 15 Feb 2008, Sean Kennedy wrote: It's the rules in sa-blacklist.current.uri.cf from http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf. This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any insight? Don't use it. It's a huge list of URIs that are better caught by URBIL rules. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- End users want eye candy and the ooo's and hhh's experience when reading mail. To them email isn't a tool, but an entertainment form. -- Steve Lake --- 7 days until George Washington's 276th Birthday
Re: Getting ? in spam scores.
Hello, Here is a complete sample without a link (because apache.org bounced the message due the spam content) with logs relevant to the message. I have tar.gz/tgz the message to hopefully pass the spam filter. Here is the message: Return-Path: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] X-Spam-Status: No, hits=? required=? Message-ID: [EMAIL PROTECTED] From: Rita Gore [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Size Genetics Warning Date: Fri, 15 Feb 2008 17:39:26 -0100 Content-Type: text/plain; format=flowed; reply-type=original Content-Transfer-Encoding: 7bit Gain 3.5+ Inches In Length 100% Safe To Take, With NO Side Effects. Here is the qmail-queue.log: Fri, 15 Feb 2008 08:39:54 PST:21158: SA: finished scan in 50.013946 secs - hits=?/? Fri, 15 Feb 2008 08:39:54 PST:21158: p_s: finished scan in 0.007968 secs Fri, 15 Feb 2008 08:39:54 PST:21158: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309354376421158... Fri, 15 Feb 2008 08:39:54 PST:21158: -- Process 21158 finished. Total of 50.174236 secs Fri, 15 Feb 2008 08:39:55 PST:21298: +++ starting debugging for process 21298 (ppid=21271) by uid=509 Fri, 15 Feb 2008 08:39:55 PST:21298: c_a_g: found URL in message - maybe phishy - better scan it Fri, 15 Feb 2008 08:39:55 PST:21298: w_c: Total time between DATA command and . was 0.000196 secs Fri, 15 Feb 2008 08:39:55 PST:21298: w_c: elapsed time from start 0.000177 secs Fri, 15 Feb 2008 08:39:55 PST:21298: g_e_h: return-path='[EMAIL PROTECTED]', recips='[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED]' Fri, 15 Feb 2008 08:39:55 PST:21298: from='Rita Gore [EMAIL PROTECTED]', subj='Size Genetics Warning', via SMTP from 79.26.135.208 Fri, 15 Feb 2008 08:39:55 PST:21298: clamdscan: finished scan in 0.014551 secs Fri, 15 Feb 2008 08:40:45 PST:21298: SA: finished scan in 50.020665 secs - hits=?/? Fri, 15 Feb 2008 08:40:46 PST:21298: p_s: finished scan in 0.008445004 secs Fri, 15 Feb 2008 08:40:46 PST:21298: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309359576421298... Fri, 15 Feb 2008 08:40:46 PST:21298: -- Process 21298 finished. Total of 50.133095 secs But notices these also at right after this message: Fri, 15 Feb 2008 08:40:45 PST:21298: SA: finished scan in 50.020665 secs - hits=?/? Fri, 15 Feb 2008 08:40:46 PST:21298: p_s: finished scan in 0.008445004 secs Fri, 15 Feb 2008 08:40:46 PST:21298: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309359576421298... Fri, 15 Feb 2008 08:40:46 PST:21298: -- Process 21298 finished. Total of 50.133095 secs Fri, 15 Feb 2008 08:40:46 PST:21299: SA: finished scan in 50.01334 secs - hits=?/? Fri, 15 Feb 2008 08:40:46 PST:21299: p_s: finished scan in 0.009365 secs Fri, 15 Feb 2008 08:40:46 PST:21299: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309359676421299... Fri, 15 Feb 2008 08:40:46 PST:21299: -- Process 21299 finished. Total of 50.215451 secs Fri, 15 Feb 2008 08:41:01 PST:21376: SA: finished scan in 50.061759 secs - hits=?/? Fri, 15 Feb 2008 08:41:01 PST:21376: p_s: finished scan in 0.102243 secs Fri, 15 Feb 2008 08:41:01 PST:21376: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309361076421376... Fri, 15 Feb 2008 08:41:02 PST:21376: -- Process 21376 finished. Total of 50.796067 secs Fri, 15 Feb 2008 08:41:02 PST:21395: SA: finished scan in 50.014535 secs - hits=?/? Fri, 15 Feb 2008 08:41:02 PST:21395: p_s: finished scan in 0.008081 secs Fri, 15 Feb 2008 08:41:02 PST:21395: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309361276421395... Fri, 15 Feb 2008 08:41:02 PST:21391: SA: finished scan in 50.102585 secs - hits=?/? Fri, 15 Feb 2008 08:41:02 PST:21391: p_s: finished scan in 0.012847 secs Fri, 15 Feb 2008 08:41:03 PST:21391: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309361276421391... Fri, 15 Feb 2008 08:41:03 PST:21395: -- Process 21395 finished. Total of 50.430792 secs Fri, 15 Feb 2008 08:41:03 PST:21391: -- Process 21391 finished. Total of 50.258332 secs Fri, 15 Feb 2008 08:41:03 PST:21538: +++ starting debugging for process 21538 (ppid=21529) by uid=509 Fri, 15 Feb 2008 08:41:06 PST:21406: SA: finished scan in 50.016036 secs - hits=?/? Fri, 15 Feb 2008 08:41:06 PST:21406: p_s: finished scan in 0.008182 secs Fri, 15 Feb 2008 08:41:06 PST:21406: ini_sc: finished scan of /var/spool/qmailscan/tmp/s1.molsci.org120309361376421406... Fri, 15 Feb 2008 08:41:07 PST:21406: -- Process 21406 finished. Total of 50.81682 secs Here is the maillog for that period of time: Feb 15 08:38:39 s1 spamd[19278]: spamd: checking message [EMAIL PROTECTED] for qscand:510 Feb 15 08:40:47 s1 spamd[19278]: spamd: identified spam (44.9/8.5) for qscand:510 in
Whois info?
Is there any place to easily query whois information to determine on a mass scale how old a domain is?
Re: Whois info?
On Fri, 15 Feb 2008 17:34:09 -0800 Marc Perkel [EMAIL PROTECTED] wrote: Is there any place to easily query whois information to determine on a mass scale how old a domain is? Don't know myself. All I can say is don't query whois on Network Solutions for possible availability of a domain to register. They will lock the domain for a short period of time where you MUST register with them during that period. --- _|_ (_| |
Re: v3.2.4 scan times slow
Sean Kennedy wrote: Sorry for replying to my own topic, but I've figured out what's causing it to go so slow. It's the rules in sa-blacklist.current.uri.cf from http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf. This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any insight? Quite frankly, I'm surprised it worked in 3.1. My guess is that something about the URI processing code changed in 3.2 to be less efficient when massively overloaded with rules. That's not too surprising, as a lot of ways of optimizing performance for a moderate sized set of operations perform horribly when presented with absurdly large ones (and generally vice versa.. algorithms best at large sets tend to have lots of setup, and perform poorly with small sets..). ie: the shell sort is one of the fastest sorting algorithms for small sets, but is really slow for large sets. Of course, I'm purely pontificating here, however it would not surprise me to discover an optimization of SA causes worse performance when strained this way. besides, sa-blacklist is 100% redundant with the WS list of surbl.org, which is supported over DNS in SA by default.
Re: v3.2.4 scan times slow
Quoting Sean Kennedy [EMAIL PROTECTED]: Sorry for replying to my own topic, but I've figured out what's causing it to go so slow. It's the rules in sa-blacklist.current.uri.cf from http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf. This ruleset works fine in 3.1, I'm not sure why it doesn't in 3.2, any insight? DO NOT USE sa-blacklist.current.uri.cf. Use multi.surbl.org instead: http://www.surbl.org/lists.html#ws Specifically, enable network tests and SURBLs are used by default: http://www.surbl.org/faq.html#nettest Jeff C.