RE: Good ruleset

2006-04-21 Thread Jeremy Fowler
Here is my /etc/rulesdujour/config, its a modified version of the file from 
Gentoo Portage.

As you can see, I use them all. I've had very little, if any, false positives 
at my location. It doesn't really matter how high the spam scores, just keep an 
eye out for false positives.  

I too am using Amavisd-new

# $Header$
# Gentoo configuration for the RulesDuJour script.

# - the following rulesets _will_ cause some false-positives
# SARE_HTML1 SARE_HTML2 SARE_HTML3 SARE_HTML4
# SARE_HEADER1 SARE_HEADER2 SARE_HEADER3
# SARE_GENLSUBJ1 SARE_GENLSUBJ2 SARE_GENLSUBJ3

# - the following rulesets are no longer supported upstream, and should not be 
used:
# SARE_CODING_HTML SARE_HEADER_ABUSE MRWIGGLY BACKHAIR WEEDS1 WEEDS2 CHICKENPOX

# - the following rulesets _should_ be safe, but you should be careful still.
# Read about them here: http://www.rulesemporium.com/rules.htm
# SARE_REDIRECT SARE_HTML0 SARE_HEADER0 SARE_GENLSUBJ0 SARE_ADULT SARE_FRAUD
# SARE_BML SARE_RATWARE SARE_SPOOF SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM
# SARE_SPECIFIC BIGEVIL EVILNUMBERS

# - NOT recommented due to massive memory usage:
# BLACKLIST BLACKLIST_URI

# - these were merged with spamassassin upstream 3.x, but older versions of
# spamassassin may want to use them:
# SARE_GENLSUBJ_X30 SARE_HEADER_X30 SARE_HTML_X30 SARE_REDIRECT

# - for SpamAssassin versions _AFTER_ 3.00 
# SARE_REDIRECT_POST300

# - for SpamAssassin versions _BEFORE_ 2.5x
# SARE_FRAUD_PRE25X SARE_BML_PRE25X

TRUSTED_RULESETS_SAFE= ANTIDRUG \
BOGUSVIRUS \
RANDOMVAL \
SARE_ADULT \
SARE_BAYES_POISON_NXM \
SARE_BML \
SARE_EVILNUMBERS0 \
SARE_FRAUD \
SARE_GENLSUBJ \
SARE_GENLSUBJ0 \
SARE_GENLSUBJ_ENG \
SARE_HEADER \
SARE_HEADER0 \
SARE_HEADER_ENG \
SARE_HIGHRISK \
SARE_HTML \
SARE_HTML0 \
SARE_HTML_ENG \
SARE_OBFU \
SARE_OBFU0 \
SARE_OEM \
SARE_RANDOM \
SARE_RATWARE \
SARE_REDIRECT_POST300 \
SARE_SPAMCOP_TOP200 \
SARE_SPECIFIC \
SARE_SPOOF \
SARE_STOCKS \
SARE_UNSUB \
SARE_URI \
SARE_URI0 \
SARE_URI_ENG \
SARE_WHITELIST \
SARE_WHITELIST_RCVD \
SARE_WHITELIST_SPF \
TRIPWIRE 

TRUSTED_RULESETS_FAIRLY_SAFE=  SARE_EVILNUMBERS1 \
SARE_HTML1 \
SARE_HEADER1 \
SARE_GENLSUBJ1 \
SARE_OBFU1 \
SARE_URI1 


TRUSTED_RULESETS_DANGEROUS=SARE_EVILNUMBERS2 \
SARE_HTML2 \
SARE_HTML3 \
SARE_HTML4 \
SARE_HEADER2 \
SARE_HEADER3 \
SARE_GENLSUBJ2 \
SARE_GENLSUBJ3 \
SARE_OBFU2 \
SARE_OBFU3 \
SARE_URI2 \
SARE_URI3 

#TRUSTED_RULESETS=${TRUSTED_RULESETS_SAFE}
#TRUSTED_RULESETS=${TRUSTED_RULESETS_SAFE} ${TRUSTED_RULESET_FAIRLY_SAFE}
TRUSTED_RULESETS=${TRUSTED_RULESETS_SAFE} ${TRUSTED_RULESETS_FAIRLY_SAFE} 
${TRUSTED_RULESETS_DANGEROUS}

# do NOT change anything below this point
TAIL=tail -n1
HEAD=head -n1
SA_RESTART=/etc/init.d/amavisd restart
# read in extra rulesets
[ -s /etc/rulesdujour/rulesets ]  source /etc/rulesdujour/rulesets


dbg: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler
Mysql:

SHOW VARIABLES LIKE character%

Variable_name   Value
character_set_clientutf8
character_set_connectionutf8
character_set_database  latin1
character_set_results   utf8
character_set_serverutf8
character_set_systemutf8
character_sets_dir  /usr/share/mysql/charsets/

SHOW VARIABLES LIKE collation%

Variable_name   Value
collation_connectionutf8_general_ci
collation_database  latin1_swedish_ci
collation_serverutf8_general_ci

SHOW CREATE TABLE bayes_token

Table   Create Table
bayes_token CREATE TABLE `bayes_token` (\n  `id` int(11) NOT NULL default 
'0',\n  `token` char(5) NOT NULL default '',\n  `spam_count` int(11) NOT NULL 
default '0',\n  `ham_count` int(11) NOT NULL default '0',\n  `atime` int(11) 
NOT NULL default '0',\n  PRIMARY KEY  (`id`,`token`)\n) ENGINE=MyISAM DEFAULT 
CHARSET=latin1

Can't get Bayes to work. Here is my lint output:

[23913] dbg: logger: adding facilities: all
[23913] dbg: logger: logging level is DBG
[23913] dbg: generic: SpamAssassin version 3.1.1
[23913] dbg: config: score set 0 chosen.
[23913] dbg: util: running in taint mode? no
[23913] dbg: dns: is Net::DNS::Resolver available? yes
[23913] dbg: dns: Net::DNS version: 0.53
[23913] dbg: diag: perl platform: 5.008007 linux
[23913] dbg: diag: module installed: MIME::Base64, version 3.05
[23913] dbg: diag: module installed: HTML::Parser, version 3.48
[23913] dbg: diag: module installed: Digest::SHA1, version 2.11
[23913] dbg: diag: module installed: DB_File, version 1.814
[23913] dbg: diag: module installed: Net::DNS, version 0.53
[23913] dbg: diag: module installed: Net::SMTP, version 2.29
[23913] dbg: diag: module installed: Mail::SPF::Query, version 1.998
[23913] dbg: diag: module installed: IP::Country::Fast, version 309.002
[23913] dbg: diag: module installed: Razor2::Client::Agent, version 2.80
[23913] dbg: diag: module installed: Net::Ident, version 1.20
[23913] dbg: diag: module installed: IO::Socket::INET6, version 2.51
[23913] dbg: diag: module installed: IO::Socket::SSL, version 0.97
[23913] dbg: diag: module installed: Time::HiRes, version 1.82
[23913] dbg: diag: module installed: DBI, version 1.50
[23913] dbg: diag: module installed: Getopt::Long, version 2.34
[23913] dbg: diag: module installed: LWP::UserAgent, version 2.033
[23913] dbg: diag: module installed: HTTP::Date, version 1.46
[23913] dbg: diag: module installed: Archive::Tar, version 1.28
[23913] dbg: diag: module installed: IO::Zlib, version 1.04
[23913] dbg: ignore: using a test message to lint rules
[23913] dbg: config: using /etc/mail/spamassassin for site rules pre files
[23913] dbg: config: read file /etc/mail/spamassassin/init.pre
[23913] dbg: config: read file /etc/mail/spamassassin/v310.pre
[23913] dbg: config: using /var/lib/spamassassin/3.001001 for sys rules pre 
files
[23913] dbg: config: using /var/lib/spamassassin/3.001001 for default rules 
dir
[23913] dbg: config: read file 
/var/lib/spamassassin/3.001001/updates_spamassassin_org.cf
[23913] dbg: config: using /etc/mail/spamassassin for site rules dir
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf
[23913] dbg: config: read file 
/etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html1.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html2.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html3.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html4.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu0.cf
[23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu1.cf
[23913] dbg: 

RE: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '

2006-04-13 Thread Jeremy Fowler

Fixed the problem. Backed up the bayes tables with sa-learn --backup, and save 
the userpref and awl tables with mysqldump. Then deleted out the entire 
database, set everything to utf8 in my.cnf, recreated the database and tables 
using utf8 as the default character set. Then restored from backup with 
sa-learn --restore and created the awl and userpref tables with the mysqldump 
files (after editing them to use utf8 as the default character set).

Just in cases anyone else has this problem in the future...